[Bug regex/23393] Handle [a-z] and [A-Z] in consistent portable fashion regardless of locale.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23393] Handle [a-z] and [A-Z] in consistent portable fashion regardless of locale.

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23393

--- Comment #28 from Carlos O'Donell <carlos at redhat dot com> ---
Created attachment 11145
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11145&action=edit
reorder.c

The following program takes as input the LATIN script range and reorder it
based on rules that allow [a-z] and [A-Z] to work as expected.

glibc implements ISO POSIX-2:1993 requirements and so uses collation element
order (CEO) as the basis for range expressions. Therefore we can re-arrange the
range elements while still keeping the ISO 14651 sorting.

This means we can meet the range requirements we want and the sorting we want.
I don't see any downside to this. Further we can create 3 distinct ranges in
iso14651_t1_common wihtout breaking collation and those ranges would only
contain a-z, A-Z, and 0-9 and implement what Florian and Rich are suggesting
for all locales that use iso14651_t1_common collation.

The 15 locales that don't use iso14651_t1_common will not be changed since
that's the most conservative solution.

--
You are receiving this mail because:
You are on the CC list for the bug.