[Bug regex/23393] Handle [a-z] and [A-Z] in consistent portable fashion regardless of locale.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23393] Handle [a-z] and [A-Z] in consistent portable fashion regardless of locale.

alahay01 at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23393

--- Comment #41 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Aurelien Jarno from comment #40)

> (In reply to Carlos O'Donell from comment #35)
> > As a temprary measure I have committed the deinterleaving of upper and lower
> > cases in iso14651_t1_common for glibc 2.28 to fix the surprises caused to
> > en_US.UTF-8 users who do not want to have [a-z] match A-Y.
> >
> > This fixes the regression for 2.28, but doesn't fix this issue.
>
> There is a user report [1] that shows that the cyrillic ranges are also
> affected by the iso14651_t1_common update. The deinterleaving changes only
> fix the latin ranges.
>
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926906

This user is expecting the range to be made up of collation ordering, and that
expectation is not valid. The range in any non-POSIX/C locale is undefined.

Therefore the bug you reference is not a bug, but it's still difficult for
users to use ranges without problems, and that makes them relatively useless
and we'd like to fix that. The plan is to fix this with rational ranges that
use UTF-8 code-point ordering for all ranges.

The deinterleaving for LATIN is consciously to fix only the ASCII ranges and
fix POSIC/C ranges. All other ranges are undefined. If we deinterlace non-LATIN
ranges we'd have to duplicate all the data into the individual locales and list
them in collation order (so collation order matches collation element
ordering). Such a change would be quite drastic, and still not solve the
problem of having collation changes change range expressions. It also wouldn't
solve the broader problem that everyone still expects [a-z] to work all the
time (code-point ordering).

I would close your debian bug as an unsupported configuration for ranges, but
point out that we are trying to make this better.

--
You are receiving this mail because:
You are on the CC list for the bug.