[Bug libc/17318] New: [RFE] Provide a C.UTF-8 locale by default

classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Bug libc/17318] New: [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

            Bug ID: 17318
           Summary: [RFE] Provide a C.UTF-8 locale by default
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: ncoghlan at gmail dot com
                CC: drepper.fsp at gmail dot com

Fedora doesn't currently provide the C.UTF-8 locale. In the RFE requesting it
(https://bugzilla.redhat.com/show_bug.cgi?id=902094), it was suggested that a
more appropriate would be for it to be provided as part of upstream glibc, at
which point Fedora would inherit it by default.

Hence, this RFE to request the inclusion of a C.UTF-8 locale by default.

My personal interest relates to Python 3, where "LANG=C" misconfigures a few
aspects to use ASCII, when they really should be using UTF-8. While I'd
actually like to fix that on the Python side in the long run, being able to set
"LANG=C.UTF-8" instead is a solution that already works for existing versions
of Python 3.

Bug #16621 suggests that C.UTF-8 may actually require special casing in glibc
in order to be handled correctly. If that's accurate, then it would strengthen
the case for including the locale in the upstream library.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Peter Robinson <pbrobinson at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pbrobinson at gmail dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Bastien Nocera <hadess at hadess dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hadess at hadess dot net

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Matěj Cepl <mcepl at cepl dot eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mcepl at cepl dot eu

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #1 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Nick Coghlan from comment #0)

> Fedora doesn't currently provide the C.UTF-8 locale. In the RFE requesting
> it (https://bugzilla.redhat.com/show_bug.cgi?id=902094), it was suggested
> that a more appropriate would be for it to be provided as part of upstream
> glibc, at which point Fedora would inherit it by default.
>
> Hence, this RFE to request the inclusion of a C.UTF-8 locale by default.
>
> My personal interest relates to Python 3, where "LANG=C" misconfigures a few
> aspects to use ASCII, when they really should be using UTF-8. While I'd
> actually like to fix that on the Python side in the long run, being able to
> set "LANG=C.UTF-8" instead is a solution that already works for existing
> versions of Python 3.
>
> Bug #16621 suggests that C.UTF-8 may actually require special casing in
> glibc in order to be handled correctly. If that's accurate, then it would
> strengthen the case for including the locale in the upstream library.

I agree that this is a good idea. Someone needs to do the work and submit it to
libc-alpha. It's not all that easy, and consensus needs to be reached about the
inclusion of ~1.5MB of UTF-8 data into the runtime.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Mike Frysinger <vapier at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vapier at gentoo dot org

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

--- Comment #2 from Nick Coghlan <ncoghlan at gmail dot com> ---
Reference to the glic-alpha mailing list discussion with additional technical
details: https://sourceware.org/ml/libc-alpha/2015-02/msg00247.html

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Marko Myllynen <myllynen at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |myllynen at redhat dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Joseph Myers <jsm28 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|libc                        |locale

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Mike Frysinger <vapier at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |16621


Referenced Bugs:

https://sourceware.org/bugzilla/show_bug.cgi?id=16621
[Bug 16621] C.UTF-8 locales should be regarded like C w.r.t. $LANGUAGE
precedence
--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Vincent Lefèvre <vincent-srcware at vinc17 dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vincent-srcware at vinc17 dot net

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Mikaela Suomalainen <mikaela at mikaela dot info> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mikaela at mikaela dot info

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugzilla.redhat.com
                   |                            |/show_bug.cgi?id=902094

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugzilla.redhat.com
                   |                            |/show_bug.cgi?id=1361965

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Mingye Wang <arthur200126 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |arthur200126 at gmail dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Filipe Brandenburger <filbranden at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |filbranden at gmail dot com

--- Comment #3 from Filipe Brandenburger <filbranden at gmail dot com> ---
Just wanted to point out that Fedora includes C.UTF-8 since circa 2015...

Patch used by them is here (and, in fact, seems to come from a Red Hat employee
who contributes often to glibc):

https://src.fedoraproject.org/rpms/glibc/blob/0457f649e3fe6299efe384da13dfc923bbe65707/f/glibc-c-utf8-locale.patch

The discussion in the e-mail threads was somewhat about *optimizing* C.UTF-8 so
that it takes less space... While I think that's great (and very advisable!) I
think it's a separate step from starting to *ship* C.UTF-8 by default.

So... ship first, optimize later?

At this point, most major distros seem to be shipping it anyways, so why not
include it upstream so that at some point in the near future we know we can
count on it on all distros?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

--- Comment #4 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Filipe Brandenburger from comment #3)
> So... ship first, optimize later?
>
> At this point, most major distros seem to be shipping it anyways, so why not
> include it upstream so that at some point in the near future we know we can
> count on it on all distros?

The major distros ship a non-functioning C.UTF-8 for the purposes required by
upstream. The code-point sorting order requirement fails, and it's not clear
why. This is what I'm trying to fix right now.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/17318] [RFE] Provide a C.UTF-8 locale by default

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |21302


Referenced Bugs:

https://sourceware.org/bugzilla/show_bug.cgi?id=21302
[Bug 21302] strcoll does not correctly follow locale-specified order in some
cases
--
You are receiving this mail because:
You are on the CC list for the bug.
12