[Bug localedata/23857] New: Esperanto has no country

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] New: Esperanto has no country

cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

            Bug ID: 23857
           Summary: Esperanto has no country
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: carmenbianca at fedoraproject dot org
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Since glibc 2.24, Esperanto has been available as the `eo.utf8` locale.  It was
added as more-or-less the only locale not to have an associated country.  For
translations, this works sufficiently well.  The problem, however, is that a
lot
of projects don't handle the no-country locale very well.

- In GNOME's gnome-control-center, the user is given a choice to pick a
language
  and a "format" (locale).  Esperanto is a language choice, but not a locale
  choice.  Instead, it defaults to "United States (English)".

- In Python's `locale`, unsetting all LC_* variables and running `LANG=eo
  python3`, you get `locale.getlocale() == ('eo_XX', 'ISO8853-3')`.

- In a lot of packages, you'll see something like `*_*` to match all locales.
  Esperanto has to be separately mentioned such that the expression becomes `eo
  *_*`.  See <https://bugzilla.redhat.com/show_bug.cgi?id=1643756>.

I still need to file bug reports for the first two examples, and there are more
examples that I haven't recorded in long-term memory.  The recurring problem,
however, is that Esperanto is the exception.  It's a special case that a lot of
projects don't account for, because what language could possibly not have a
country?

A simple, satisfactory solution would be to no longer make Esperanto a special
case.  Make it the same as all the other locales, and the problems will sort of
go away.  There are a couple of approaches to this:

1. Create "eo_NL" just like Interlingua---an auxiliary conlang similar to
   Esperanto--- has "ia_FR".  Separate locales might need to be created for
   different countries.

2. Create "eo_XX" or "eo_EO" as an exact copy of the current "eo" locale,
   excluding a lot of LC_ADDRESS information.

3. Create "eo_XX" or "eo_EO" with a fake "Esperantujo" country and currency.

4. Add a fake "Esperantujo" country and currency to the current "eo" locale,
   which might solve some problems, maybe?

5. Some combination of the above.

I have a slight preference for the first solution.  Users would be able to use
Esperanto while retaining their local currency, date formatting, etc etc etc.
It is also preferable in the sense that Interlangua already does this, thus
precedence has been set.

Alternative #6 is to keep the status quo and fix all the bugs in third party
projects that do not account for the special case of Esperanto.  This doesn't
scale very well, though.  If another no-country language comes along, it will
have to be added as exception to these other projects again.  It's also
cumulatively just a lot of work for a special case that not so many people use,
anyway.

I've briefly talked to Rafal about this issue on Fedora's trans list.  I think
we agree that it's not really a glibc bug, thus I felt hesitant reporting it
here, but a lot of tiny bugs in a lot of projects that use glibc.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] Esperanto has no country

cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com
              Flags|                            |security-

--- Comment #1 from Florian Weimer <fweimer at redhat dot com> ---
I'm not really convinced this is a glibc bug.

Wouldn't it make sense to fix applications bugs instead?

There are other artificial languages which may face the same issue once we add
it to glibc.  Yiddish currently has a US locale, but isn't this a bit odd?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] Esperanto has no country

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

--- Comment #2 from Carmen Bianca Bakker <carmenbianca at fedoraproject dot org> ---
(In reply to Florian Weimer from comment #1)
> I'm not really convinced this is a glibc bug.
>
> Wouldn't it make sense to fix applications bugs instead?

I agree that it isn't, and I agree that it would make sense to fix application
bugs. The problem is that those application bugs happen because glibc presents
a special case, and one could undo all these application bugs simultaneously by
making sure that the special case isn't special anymore.

Even something so simple as my proposed solution 2 would get rid of a lot of
bugs in programs that expect all locales to look like lang_COUNTRY.

> There are other artificial languages which may face the same issue once we
> add it to glibc.  Yiddish currently has a US locale, but isn't this a bit
> odd?

If there's a sizeable population of Yiddish speakers in the US, then that
probably makes sense. It wouldn't make sense for Yiddish speakers outside of
the US, though.  Problem is: Do you want to create a glibc locale for every
possible country where Yiddish is spoken in some capacity? That would
ultimately be the best solution for users, but might cause an annoying
maintenance burden on glibc.

Ideally I'd like to see language and country completely separated from each
other instead of combined in locales, because that would ultimately make the
most sense, but that would be a super big redesign that I am not comfortable
with proposing.  I'm currently limiting my scope to making Esperanto (more)
usable on Fedora Workstation, and I think some of my above suggestions could
significantly improve the status of Esperanto with relatively little effort
(i.e., fixing all application bugs).

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] Esperanto has no country

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

--- Comment #3 from Dmitry V. Levin <ldv at sourceware dot org> ---
(In reply to Florian Weimer from comment #1)
> There are other artificial languages which may face the same issue once we
> add it to glibc.  Yiddish currently has a US locale, but isn't this a bit
> odd?

The comment is confusing.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] Esperanto has no country

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

--- Comment #4 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Dmitry V. Levin from comment #3)
> (In reply to Florian Weimer from comment #1)
> > There are other artificial languages which may face the same issue once we
> > add it to glibc.  Yiddish currently has a US locale, but isn't this a bit
> > odd?
>
> The comment is confusing.

Sorry, the two sentences are really separate.  I did not want to imply that
Yiddish is an artificial language.  I think the majority of Yiddish speakers is
*not* located in the United States.  I suspect the locale was added under “US”
because there was no precedent for a locale without a country at the time.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] Esperanto has no country

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

Rafal Luzynski <digitalfreak at lingonborough dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |digitalfreak@lingonborough.
                   |                            |com

--- Comment #5 from Rafal Luzynski <digitalfreak at lingonborough dot com> ---
Hello Carmen, thank you for filing this bug report.

(In reply to Carmen Bianca Bakker from comment #0)
> [...]
> I still need to file bug reports for the first two examples, and there are
> more
> examples that I haven't recorded in long-term memory.  [...]

I encourage you to file those bug reports.  Are they maybe caused by the
previous bug in glibc packaging in Fedora?

> [...]
> 1. Create "eo_NL" just like Interlingua---an auxiliary conlang similar to
>    Esperanto--- has "ia_FR".  Separate locales might need to be created for
>    different countries.

I was not aware of this case with Interlingua.  I would rather go for renaming
"ia_FR" to "ia" so that "eo" would not be alone anymore :-) but my knowledge
about Interlingua is too little to enforce it now.

> [...]
> Alternative #6 is to keep the status quo and fix all the bugs in third party
> projects that do not account for the special case of Esperanto.

This is my preferred choice and therefore I agree with Florian (comment 1) that
this is not a bug here.  Also, I think it's good if we approach other projects
and explain them how to fix the issue correctly.

(In reply to Carmen Bianca Bakker from comment #2)
> (In reply to Florian Weimer from comment #1)
> > There are other artificial languages which may face the same issue once we
> > add it to glibc.  Yiddish currently has a US locale, but isn't this a bit
> > odd?
>
> If there's a sizeable population of Yiddish speakers in the US, then that
> probably makes sense.

As far as I know yes, there is a large population of Yiddish speakers in the
US, they are about 160,000 people and I'm not sure but likely they are the
largest Yiddish population in the world.

> It wouldn't make sense for Yiddish speakers outside of
> the US, though.  Problem is: Do you want to create a glibc locale for every
> possible country where Yiddish is spoken in some capacity? [...]

Most of the time this makes sense if two (or more) populations speaking the
same language in two countries develop their languages to the extent that they
differ little and actually make two variants of a language.  Good examples are
US English vs. British English or Brazilian Portuguese vs. European Portuguese.

A secondary reason is when we want to provide other locale-dependent settings
for multiple countries speaking the same language.

So adding a locale makes sense if there is a population needing that.
Existence of a locale in CLDR and an official recognition of a language by the
local authorities are good argument for adding a locale variant.

> Ideally I'd like to see language and country completely separated from each
> other instead of combined in locales, because that would ultimately make the
> most sense,

Multiple environment variables (LC_MESSAGES, LC_MEASUREMENT, etc.) solve this
problem to some extent.  That means, you don't have every combination of
language/country but you can choose a separate locale for different purposes
and that should be usually sufficient.

> but that would be a super big redesign that I am not comfortable
> with proposing.

+1

> I'm currently limiting my scope to making Esperanto (more)
> usable on Fedora Workstation, and I think some of my above suggestions could
> significantly improve the status of Esperanto with relatively little effort
> (i.e., fixing all application bugs).

Thank you for your effort, please continue.

Again, I think this is not a bug but I don't mind if we discuss this here.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] Esperanto has no country

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

--- Comment #6 from Carmen Bianca Bakker <carmenbianca at fedoraproject dot org> ---
Hi Rafal,

(In reply to Rafal Luzynski from comment #5)
> I encourage you to file those bug reports.  Are they maybe caused by the
> previous bug in glibc packaging in Fedora?

https://gitlab.gnome.org/GNOME/gnome-control-center/issues/260 - Appears
glibc-related, because the languages and locales/formats map directly to glibc
options.  I wish I was more competent with C, and I'd try to fix it up myself.

https://bugs.python.org/issue35163 - Some weird obsolete configuration.

> I was not aware of this case with Interlingua.  I would rather go for
> renaming "ia_FR" to "ia" so that "eo" would not be alone anymore :-) but my
> knowledge about Interlingua is too little to enforce it now.

Is it okay to add the author of the original Interlingua bug report to this bug
report?  Perhaps they can add an original insight, and perhaps their motivation
for choosing "ia_FR" over "ia".

> > It wouldn't make sense for Yiddish speakers outside of
> > the US, though.  Problem is: Do you want to create a glibc locale for every
> > possible country where Yiddish is spoken in some capacity? [...]
>
> Most of the time this makes sense if two (or more) populations speaking the
> same language in two countries develop their languages to the extent that
> they differ little and actually make two variants of a language.  Good
> examples are US English vs. British English or Brazilian Portuguese vs.
> European Portuguese.
>
> A secondary reason is when we want to provide other locale-dependent
> settings for multiple countries speaking the same language.
>
> So adding a locale makes sense if there is a population needing that.
> Existence of a locale in CLDR and an official recognition of a language by
> the local authorities are good argument for adding a locale variant.

CLDR has "Unknown Region" listed under ZZ, which would work sufficiently well
for country-less languages.  i.e., proposed solution 2, or solution 3 with
"Unknown Region" as country (and "XXX" as currency).

https://unicode.org/cldr/charts/34/summary/root.html

It could also work for Yiddish, where "yi_US" is for the Yiddish population
inside the US, and "yi_ZZ" could be used by non-US Yiddish populations who are
spread across many other countries.  Though in the case of Yiddish
specifically, it might probably make sense to add an Israel entry, but that
will likely depend on a qualified volunteer doing the work.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] Esperanto has no country

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

--- Comment #7 from Rafal Luzynski <digitalfreak at lingonborough dot com> ---
Hi,

I'm sorry for the delayed reply.

(In reply to Carmen Bianca Bakker from comment #6)
> [...]
> https://gitlab.gnome.org/GNOME/gnome-control-center/issues/260 - Appears
> glibc-related, because the languages and locales/formats map directly to
> glibc options.  I wish I was more competent with C, and I'd try to fix it up
> myself.

Thank you.  I have not looked at the source code yet but my guess is that the
list of territories comes from the list of locales with language part stripped.
 This makes some sense to me: formats, units, etc. depend on the territory
rather than language.  For example, English locale may have different units,
currency, country name etc. for USA, UK, Australia, India, Ireland, and so on.
On the other hand, people living in one country probably use the same formats,
units, and currency even if they speak different languages.  Therefore, if you
want to select "Esperanto" as the locale for formats then... actually what
would you expect?  Currency, country name, address format, car plate - "as used
in (where?)"  Why "Netherlands" would not work better for you, for example?

I understand you may have some some good reasons to select Esperanto formats
but I'm trying to reflect the reasons of GNOME designers.

> https://bugs.python.org/issue35163 - Some weird obsolete configuration.

My first suggestion is that Python should not map ambiguous locales into
detailed ones but not supported by the current operating system.

Would adding "eo.ISO8859-3" help to fix this issue?  I think the reason is that
historically the locales without the encoding specified used 8-bit encoding
like ISO 8859-1 or ISO 8859-3.  Therefore often the locales map to 8-bit
encodings unless you specify "utf8" explicitly.  Later when Unicode became
popular and widely used, newly added locales in glibc used UTF-8 as their only
encoding.  This is the case of Esperanto: "eo" is an alias of "eo.UTF-8".
Somehow Python treats it as an alias of "eo_XX.ISO8859-3".

On the other hand I am not sure if adding the old encodings makes sense
nowadays.  Old encodings are preserved only in order not to break existing
systems.  Does any existing Linux system use "eo.ISO8859-3" and rely on it?  Is
it likely to be true if this locale has never existed?

> (In reply to Rafal Luzynski from comment #5)
> > I was not aware of this case with Interlingua.  I would rather go for
> > renaming "ia_FR" to "ia" so that "eo" would not be alone anymore :-) but my
> > knowledge about Interlingua is too little to enforce it now.
>
> Is it okay to add the author of the original Interlingua bug report to this
> bug report?  Perhaps they can add an original insight, and perhaps their
> motivation for choosing "ia_FR" over "ia".

The bug report is https://sourceware.org/bugzilla/show_bug.cgi?id=14879 but I
wouldn't like to bother the authors of Interlingua patch with the issues of
Esperanto.

By the way, it has been recently considered a bug by CLDR to assign Interlingua
to France:

http://unicode.org/cldr/trac/ticket/11164

This raises my motivation to rename "ia_FR" to "ia" but not to the level
sufficient to actually do it.

> [...]
> CLDR has "Unknown Region" listed under ZZ, which would work sufficiently
> well for country-less languages.  i.e., proposed solution 2, or solution 3
> with "Unknown Region" as country (and "XXX" as currency).
>
> https://unicode.org/cldr/charts/34/summary/root.html

It is possible as a workaround but I still believe we are able to handle "eo"
without a country name.  Even more: we (the glibc project) are able to handle
it and as there are projects which do not (yet) handle it correctly I think we
should rather approach them and tell them how to fix it.  So far I don't think
we have found any project where the issue exists and cannot be fixed.

> It could also work for Yiddish, where "yi_US" is for the Yiddish population
> inside the US, and "yi_ZZ" could be used by non-US Yiddish populations who
> are spread across many other countries.  Though in the case of Yiddish
> specifically, it might probably make sense to add an Israel entry, but that
> will likely depend on a qualified volunteer doing the work.

Definitely no, Yiddish is not an artificial language and definitely is related
with some territories where it is actually spoken.  It seems to me that Israel
could make sense and I don't mind adding it if needed, probably also USA makes
sense.  I don't think that calling Yiddish "worldwide" or "non-US" or "unknown"
(in terms of territory) makes sense because we can tell the same about any
random language.

And please, if possible let's focus on Esperanto here rather than discussing
possible changes in other languages.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] Esperanto has no country

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

--- Comment #8 from Carmen Bianca Bakker <carmenbianca at fedoraproject dot org> ---
(In reply to Rafal Luzynski from comment #7)

> Thank you.  I have not looked at the source code yet but my guess is that
> the list of territories comes from the list of locales with language part
> stripped.  This makes some sense to me: formats, units, etc. depend on the
> territory rather than language.  For example, English locale may have
> different units, currency, country name etc. for USA, UK, Australia, India,
> Ireland, and so on.  On the other hand, people living in one country
> probably use the same formats, units, and currency even if they speak
> different languages.  Therefore, if you want to select "Esperanto" as the
> locale for formats then... actually what would you expect?  Currency,
> country name, address format, car plate - "as used in (where?)"  Why
> "Netherlands" would not work better for you, for example?

The chief problem in selecting "Netherlands" is that LC_DATE won't have the
correct language.  I would much rather individually select each LC_* option,
but GNOME does not support that in its graphical interface.

> > https://bugs.python.org/issue35163 - Some weird obsolete configuration.
>
> My first suggestion is that Python should not map ambiguous locales into
> detailed ones but not supported by the current operating system.
>
> Would adding "eo.ISO8859-3" help to fix this issue?  I think the reason is
> that historically the locales without the encoding specified used 8-bit
> encoding like ISO 8859-1 or ISO 8859-3.  Therefore often the locales map to
> 8-bit encodings unless you specify "utf8" explicitly.  Later when Unicode
> became popular and widely used, newly added locales in glibc used UTF-8 as
> their only encoding.  This is the case of Esperanto: "eo" is an alias of
> "eo.UTF-8".  Somehow Python treats it as an alias of "eo_XX.ISO8859-3".
>
> On the other hand I am not sure if adding the old encodings makes sense
> nowadays.  Old encodings are preserved only in order not to break existing
> systems.  Does any existing Linux system use "eo.ISO8859-3" and rely on it?
> Is it likely to be true if this locale has never existed?

I don't think anything needs to be changed from glibc's end for this bug.  This
appears to be a Python-only oddity---I have never encountered eo.ISO8859-3
anywhere else.

> It is possible as a workaround but I still believe we are able to handle
> "eo" without a country name.  Even more: we (the glibc project) are able to
> handle it and as there are projects which do not (yet) handle it correctly I
> think we should rather approach them and tell them how to fix it.  So far I
> don't think we have found any project where the issue exists and cannot be
> fixed.

I don't disagree, but wouldn't changing this in glibc be a much easier solution
compared to the laborious process of opening bug reports everywhere to handle a
special case?

For instance, if we assume for a moment that "ia_FR" will become "ia", then a
lot of packages in a lot of distributions will need to change their for-loops
to `for eo ia *_*`.  This is cumulatively a lot of work for minority languages.
 A simple "ia_ZZ/eo_ZZ" would remove the special case and save a lot of work.

> Definitely no, Yiddish is not an artificial language and definitely is
> related with some territories where it is actually spoken.  It seems to me
> that Israel could make sense and I don't mind adding it if needed, probably
> also USA makes sense.  I don't think that calling Yiddish "worldwide" or
> "non-US" or "unknown" (in terms of territory) makes sense because we can
> tell the same about any random language.

I didn't imply that Yiddish is an artificial language.  I implied that having a
catch-all "yi_ZZ" would save a lot of work over creating individual locales for
all the countries in the world where Yiddish is spoken in some capacity, which
is a lot of countries.  In that capacity, Yiddish is an excellent comparison to
Esperanto, because both languages have a diaspora across the globe rather than
a defined nation state.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] Esperanto has no country

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

--- Comment #9 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Carmen Bianca Bakker from comment #8)
> The chief problem in selecting "Netherlands" is that LC_DATE won't have the
> correct language.  I would much rather individually select each LC_* option,
> but GNOME does not support that in its graphical interface.

The problem is that GNOME (and KDE) removed the previously existing
functionality for separate category selection, without considering its
implications.  I don't think we can work around the lack of such configuration
options in glibc because it would increase the number of locales to a
ridiculous amount.  This is less of an issue for systems like Debian which
generate locale data files on demand, but it's hard on those systems which use
a pre-computed locale archive, such as Fedora.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug localedata/23857] Esperanto has no country

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23857

Pander <pander at users dot sourceforge.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pander at users dot sourceforge.ne
                   |                            |t

--- Comment #10 from Pander <pander at users dot sourceforge.net> ---
As long as very local languages such as fy_DE, fy_NL, li_NL, nds_NL, nds_DE et
cetera are being supported here while these don't even have a spell checker and
lack translations in for many main applications, I think it support of locales
for Esperanto and English for countries is validated.

Mixing of locale categories simply doesn't work as it should. Configuration
tools of operating systems don't support it and doing it manually in
configuration files or start-up scripts is of most users way too complex. But
it is even more subtle:

Looking at existing locales such as en_DK, en_SE, en_DE and en_NL, about 50% to
75% of such locale can be accomplished with reuse of existing definitions via
copy. However, the remaining part, are custom definitions for certain
categories that cannot be realized by copy alone. The mixing of definitions are
within the specific category.

--
You are receiving this mail because:
You are on the CC list for the bug.