[Bug regex/23393] Handle [a-z] and [A-Z] in consistent portable fashion regardless of locale.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23393] Handle [a-z] and [A-Z] in consistent portable fashion regardless of locale.

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23393

--- Comment #32 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Eric Blake from comment #30)

> (In reply to Florian Weimer from comment #12)
> > I find it very dubious that the current implementation of ranges is useful
> > for anything at all, exception implementation convenience (as it's what we
> > have today).
> >
> > Two possible improvements come to my mind:
> >
> > (a) If the both ranges are ASCII, match only ASCII characters.
> >
> > (b) Ranges include all characters with the same primary collation weight as
> > the endpoints.
> >
> > It's possible to implement both, with (a) superseding (b).  I'm not sure if
> > today, range expressions can match collating elements consisting of multiple
> > characters, in which case the following variant might be less surprising:
> >
> > (b') Ranges include all collating elements with the same primary weight as
> > the endpoints.
> >
> > Both approaches are conforming to POSIX because ranges in other locales are
> > undefined anyway.  As far as I can see, available user feedback suggests
> > that (a) is the expected behavior.
>
> Well, close to (a), at any rate.  You're looking for Rational Range
> Interpretation, which has been picked up by several GNU tools already (awk,
> coreutils, sed, bash, ...)

Yes, I finally found the old discussion, see comment 26. 8-)

Do you know why bash doesn't default to it, and you have to enable it using
“shopt -s globasciiranges”?

--
You are receiving this mail because:
You are on the CC list for the bug.