[Bug locale/24808] New: -wc value of character in non-utf-8 locales can be invalid if the base locale definition uses UTF-8

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug locale/24808] New: -wc value of character in non-utf-8 locales can be invalid if the base locale definition uses UTF-8

cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=24808

            Bug ID: 24808
           Summary: -wc value of character in non-utf-8 locales can be
                    invalid if the base locale definition uses UTF-8
           Product: glibc
           Version: 2.29
            Status: UNCONFIRMED
          Severity: minor
          Priority: P2
         Component: locale
          Assignee: unassigned at sourceware dot org
          Reporter: lea.gris at noiraude dot net
  Target Milestone: ---

locale returns entries with invalid character code when the reference locale
definition uses UTF-8 but not the destination locale.

Example with the fr_FR.iso88591 or fr_FR.iso88591@euro locale:

The locale command returns an integer code for the numeric-thousands-sep-wc
that is invalid for the iso-8859-1 or the iso-8859-15 charset.

You can verify the behavior in a Bash shell:

{ LC_NUMERIC=fr_FR.iso88591 locale -k numeric-thousands-sep-wc; }

Answer:
> numeric-thousands-sep-wc=8239

witch is Unicode U+202F NARROW NO-BREAK SPACE
that is not allowed and inconsistent with the actual character returned
with the thousands_sep key.

printf 'expected-value=%d\n' \
"'$( { LC_NUMERIC=fr_FR.iso88591 locale thousands_sep; } )"

Answer:
> expected-value=160

If the returned value numeric-thousands-sep-wc represents a wide character code
that is invalid in the locale's charset, maybe it should be set to -1 to flag
that it is not available or undefined.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/24808] -wc value of character in non-utf-8 locales can be invalid if the base locale definition uses UTF-8

cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=24808

--- Comment #1 from Andreas Schwab <[hidden email]> ---
wchar_t always uses Unicode.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/24808] -wc value of character in non-utf-8 locales can be invalid if the base locale definition uses UTF-8

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=24808

Léa Gris <lea.gris at noiraude dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #2 from Léa Gris <lea.gris at noiraude dot net> ---
(In reply to Andreas Schwab from comment #1)
> wchar_t always uses Unicode.

Thank you!
Now it looks crystal clear.

--
You are receiving this mail because:
You are on the CC list for the bug.