use of Yy+0/Nn-1/etc... in LC_MESSAGES yesexpr/noexpr

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

use of Yy+0/Nn-1/etc... in LC_MESSAGES yesexpr/noexpr

Mike Frysinger
https://sourceware.org/bugzilla/show_bug.cgi?id=15262
do we have policy/guidance on the use of english chars in the yes/no
regexes ?  of the 202 locales that define yesexpr/noexpr, 195 of them
include [Yy]/[Nn], most of which aren't english.

my take: at the risk of being called anglocentric, we should add
[Yy] & [Nn] to all locales

related, what about locales that are in territories that are frequently
bilingual ?  en_CA for example allows Yes/Oui/No/Non.  CLDR only lists
one option per language.  it doesn't (currently) define things on a
per-locale basis.  this is a semi-moot point depending on the Yy/Nn
question above.

my take: only list the main language (so en_CA would drop Oui).
if we can get CLDR to list more, it would be easy to support.

related, what about langs that have multiple scripts ?  this comes up
with all the locales that have @latin or @devanagari or @cyrillic.
for yesexpr, sr_RS uses [ДдDd] and sr_RS@latin uses [Dd].

my take: i can go either way: we could have every lang support all the
alternative scripts (so sr_RS@latin would add Дд), or we could try and
figure out which script is the "main" one and have it import all its
alternatives (so the sr_RS examples would stay the same).

https://sourceware.org/bugzilla/show_bug.cgi?id=15263
what about [+1]/[-0] ?  this is what the i18n definition uses, and what
about 7 others do as well.  should we include those everywhere too ?

my take: we should add [+1]/[-0] to all locales
-mike

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: use of Yy+0/Nn-1/etc... in LC_MESSAGES yesexpr/noexpr

Rafal Luzynski
18.04.2016 04:40 Mike Frysinger <[hidden email]> wrote:
>
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=15262
> do we have policy/guidance on the use of english chars in the yes/no
> regexes ? of the 202 locales that define yesexpr/noexpr, 195 of them
> include [Yy]/[Nn], most of which aren't english.
>
> my take: at the risk of being called anglocentric, we should add
> [Yy] & [Nn] to all locales

Sounds reasonable to me. From my own experience and from my contacts
with other developers from neighboring countries it seems to me that
hardcore computer users (software engineers, admins, scientists, long
time users) are used to English and sometimes even prefer English over
their own native languages and sometimes may involuntary press Y/N
instead of their native version even if the software is localized.
My point is that adding [Yy] and [Nn] can make some people happy and
will not hurt anybody.

Note: Make sure that [Yy] and [Nn] are not already used for the opposite
meaning (that [Yy] does not mean "no" or [Nn] does not mean "yes")
because in that case this change would be harmful.

> related, what about locales that are in territories that are frequently
> bilingual ? en_CA for example allows Yes/Oui/No/Non. CLDR only lists
> one option per language. it doesn't (currently) define things on a
> per-locale basis. this is a semi-moot point depending on the Yy/Nn
> question above.
>
> my take: only list the main language (so en_CA would drop Oui).
> if we can get CLDR to list more, it would be easy to support.
>
> related, what about langs that have multiple scripts ? this comes up
> with all the locales that have @latin or @devanagari or @cyrillic.
> for yesexpr, sr_RS uses [ДдDd] and sr_RS@latin uses [Dd].
>
> my take: i can go either way: we could have every lang support all the
> alternative scripts (so sr_RS@latin would add Дд), or we could try and
> figure out which script is the "main" one and have it import all its
> alternatives (so the sr_RS examples would stay the same).

I should not speak on behalf of Canadian or Serbian users but my humble
opinion is: use as many ways to express "yes" and "no" as possible
unless it causes conflicts. This would mean:

- no, please don't drop [Oo] (Oui) from en_CA unless preserving [Oo]
  causes some technical issues;
- yes, please use [ДдDdYy] and if possible for both sr_RS and sr_RS@latin;
  I guess that some Serbian users may use different keyboard layouts and
  switch between them, it would be easier for them if the software could
  read their intention correctly even if they forget to switch their
  keyboard layout to Latin.

Regards,

Rafal
Reply | Threaded
Open this post in threaded view
|

Re: use of Yy+0/Nn-1/etc... in LC_MESSAGES yesexpr/noexpr

Keld Simonsen-2
In reply to this post by Mike Frysinger
On Sun, Apr 17, 2016 at 10:40:58PM -0400, Mike Frysinger wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=15262
> do we have policy/guidance on the use of english chars in the yes/no
> regexes ?  of the 202 locales that define yesexpr/noexpr, 195 of them
> include [Yy]/[Nn], most of which aren't english.
>
> my take: at the risk of being called anglocentric, we should add
> [Yy] & [Nn] to all locales
>
> related, what about locales that are in territories that are frequently
> bilingual ?  en_CA for example allows Yes/Oui/No/Non.  CLDR only lists
> one option per language.  it doesn't (currently) define things on a
> per-locale basis.  this is a semi-moot point depending on the Yy/Nn
> question above.
>
> my take: only list the main language (so en_CA would drop Oui).
> if we can get CLDR to list more, it would be easy to support.
>
> related, what about langs that have multiple scripts ?  this comes up
> with all the locales that have @latin or @devanagari or @cyrillic.
> for yesexpr, sr_RS uses [????Dd] and sr_RS@latin uses [Dd].
>
> my take: i can go either way: we could have every lang support all the
> alternative scripts (so sr_RS@latin would add ????), or we could try and
> figure out which script is the "main" one and have it import all its
> alternatives (so the sr_RS examples would stay the same).
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=15263
> what about [+1]/[-0] ?  this is what the i18n definition uses, and what
> about 7 others do as well.  should we include those everywhere too ?
>
> my take: we should add [+1]/[-0] to all locales
> -mike

My take we should always add Yy/Nn as long as it is unambigeous.
I personally have benefiits from this as I sometimes run  in Danish locale
and sometimes in an English locale.

Also for bilinggual countries you should allow languages, as in Canada
both the English and french values, even for the en_CA locale.
The yes/no answers sit in the fingers, so it is a convenience to
users to allow theses values, and it is also a cultural convention.

best regards
Keld

And also 1/0 as this is a banking standard for yes and no.

best regards
Keld
Reply | Threaded
Open this post in threaded view
|

Re: use of Yy+0/Nn-1/etc... in LC_MESSAGES yesexpr/noexpr

Mike Frysinger
On 18 Apr 2016 13:21, Keld Simonsen wrote:

> On Sun, Apr 17, 2016 at 10:40:58PM -0400, Mike Frysinger wrote:
> > related, what about locales that are in territories that are frequently
> > bilingual ?  en_CA for example allows Yes/Oui/No/Non.  CLDR only lists
> > one option per language.  it doesn't (currently) define things on a
> > per-locale basis.  this is a semi-moot point depending on the Yy/Nn
> > question above.
> >
> > my take: only list the main language (so en_CA would drop Oui).
> > if we can get CLDR to list more, it would be easy to support.
>
> Also for bilinggual countries you should allow languages, as in Canada
> both the English and french values, even for the en_CA locale.
> The yes/no answers sit in the fingers, so it is a convenience to
> users to allow theses values, and it is also a cultural convention.
[focusing on this sub-thread since it seems to be most debatable]

the issue is that we don't have a way of determining this automatically.
what this request boils down is for certain languages to have higher
visibility in some territories than others.  CA currently has 5 langs
defined for its territory in glibc: en fr ik iu shs.  arguably, there
should be even more as en+fr covers only ~75% of the country (mother
tongue wise).  the others are a fairly long tail.

so do we try to do a union of all the langs in a territory ?  this is a
bad idea imo as all will simply saturate to the full set -- imo forcing
a list of "approved" langs on a per-territory basis is kind of backwards
and there's no reason we wouldn't make this easier (e.g. adding pk_CA,
zh_CA, es_CA, de_CA, it_CA, etc...).

so do we maintain a list of "primary" langs in a territory and then add
those to all other langs in that same territory ?  how do we determine
the "primary" langs ?  based on what the gov't has marked as official
langs ?  that'll still cause havoc in IN & NE at least :).  so do we do
it based on speaking population and pick an arbitrary limit ?  if the
lang is spoken by >10%, then it'll get deployed to all langs in that
territory ?
-mike

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: use of Yy+0/Nn-1/etc... in LC_MESSAGES yesexpr/noexpr

Mike Frysinger
In reply to this post by Rafal Luzynski
On 18 Apr 2016 11:10, Rafal Luzynski wrote:

> 18.04.2016 04:40 Mike Frysinger wrote:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=15262
> > do we have policy/guidance on the use of english chars in the yes/no
> > regexes ? of the 202 locales that define yesexpr/noexpr, 195 of them
> > include [Yy]/[Nn], most of which aren't english.
> >
> > my take: at the risk of being called anglocentric, we should add
> > [Yy] & [Nn] to all locales
>
> Sounds reasonable to me. From my own experience and from my contacts
> with other developers from neighboring countries it seems to me that
> hardcore computer users (software engineers, admins, scientists, long
> time users) are used to English and sometimes even prefer English over
> their own native languages and sometimes may involuntary press Y/N
> instead of their native version even if the software is localized.
> My point is that adding [Yy] and [Nn] can make some people happy and
> will not hurt anybody.
>
> Note: Make sure that [Yy] and [Nn] are not already used for the opposite
> meaning (that [Yy] does not mean "no" or [Nn] does not mean "yes")
> because in that case this change would be harmful.
a very good point.  there's 5 languages where this comes up:
 yo.xml:  <yesstr>Bẹẹni :N</yesstr>
 sw.xml:  <yesstr>Ndiyo:N</yesstr>
 guz.xml: <nostr>Yaya:Y</nostr>
 az.xml:  <nostr>yox:y</nostr>
 uz.xml:  <nostr>yo‘q:y</nostr>

at least yo_NG is broken on our side.

> - no, please don't drop [Oo] (Oui) from en_CA unless preserving [Oo]
>   causes some technical issues;

i split this out into the other reply by Keld

> - yes, please use [ДдDdYy] and if possible for both sr_RS and sr_RS@latin;
>   I guess that some Serbian users may use different keyboard layouts and
>   switch between them, it would be easier for them if the software could
>   read their intention correctly even if they forget to switch their
>   keyboard layout to Latin.

script variants are much easier to handle as we have that data :).
-mike

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: use of Yy+0/Nn-1/etc... in LC_MESSAGES yesexpr/noexpr

Rafal Luzynski
In reply to this post by Mike Frysinger
18.04.2016 19:44 Mike Frysinger <[hidden email]> wrote:

> On 18 Apr 2016 13:21, Keld Simonsen wrote:
> > [...]
> > Also for bilinggual countries you should allow languages, as in Canada
> > both the English and french values, even for the en_CA locale.
> > The yes/no answers sit in the fingers, so it is a convenience to
> > users to allow theses values, and it is also a cultural convention.
>
> [focusing on this sub-thread since it seems to be most debatable]
>
> the issue is that we don't have a way of determining this automatically.
> what this request boils down is for certain languages to have higher
> visibility in some territories than others. CA currently has 5 langs
> defined for its territory in glibc: en fr ik iu shs. arguably, there
> should be even more as en+fr covers only ~75% of the country (mother
> tongue wise). the others are a fairly long tail.

I guess there are only few such countries so why not to fix the issue
manually?  I assume this is one-time task: you run some script, it
introduces some changes and you have a chance to review what has been
changed and reject some changes before making them public.  I can see
you are making many changes in locale data, I guess you will soon reach
the point where glibc will be up-to-date with CLDR and will not need
many updates.

> so do we try to do a union of all the langs in a territory ? this is a
> bad idea imo as all will simply saturate to the full set -- imo forcing
> a list of "approved" langs on a per-territory basis is kind of backwards
> and there's no reason we wouldn't make this easier (e.g. adding pk_CA,
> zh_CA, es_CA, de_CA, it_CA, etc...).

My suggestion is not to remove what has already been added to glibc locales
and add new language/territory combos (pk_CA, zh_CA and so on) on the users'
demand.

Regards,

Rafal
Reply | Threaded
Open this post in threaded view
|

Re: use of Yy+0/Nn-1/etc... in LC_MESSAGES yesexpr/noexpr

Mike Frysinger
On 18 Apr 2016 23:21, Rafal Luzynski wrote:

> 18.04.2016 19:44 Mike Frysinger <[hidden email]> wrote:
> > On 18 Apr 2016 13:21, Keld Simonsen wrote:
> > > [...]
> > > Also for bilinggual countries you should allow languages, as in Canada
> > > both the English and french values, even for the en_CA locale.
> > > The yes/no answers sit in the fingers, so it is a convenience to
> > > users to allow theses values, and it is also a cultural convention.
> >
> > [focusing on this sub-thread since it seems to be most debatable]
> >
> > the issue is that we don't have a way of determining this automatically.
> > what this request boils down is for certain languages to have higher
> > visibility in some territories than others. CA currently has 5 langs
> > defined for its territory in glibc: en fr ik iu shs. arguably, there
> > should be even more as en+fr covers only ~75% of the country (mother
> > tongue wise). the others are a fairly long tail.
>
> I guess there are only few such countries so why not to fix the issue
> manually?  I assume this is one-time task: you run some script, it
> introduces some changes and you have a chance to review what has been
> changed and reject some changes before making them public.  I can see
> you are making many changes in locale data, I guess you will soon reach
> the point where glibc will be up-to-date with CLDR and will not need
> many updates.
CLDR is not a fixed entity, nor does the data/languages it represent stay
fixed.  customs/baselines change which means the data changes.  we cannot
assume that because we set some value to X at ver V it shall forever stay
that way.  hence i'd like to get this automated in some way.

> > so do we try to do a union of all the langs in a territory ? this is a
> > bad idea imo as all will simply saturate to the full set -- imo forcing
> > a list of "approved" langs on a per-territory basis is kind of backwards
> > and there's no reason we wouldn't make this easier (e.g. adding pk_CA,
> > zh_CA, es_CA, de_CA, it_CA, etc...).
>
> My suggestion is not to remove what has already been added to glibc locales
> and add new language/territory combos (pk_CA, zh_CA and so on) on the users'
> demand.

users make requests all the time, and we *always* must evaluate whether
they are correct appropriate.  it's not that we're worried about malice
from users, but sometimes they are simply mistaken, or they're making a
request that is not what we would consider the majority/normal.  this is
why i'm pushing for any deviation (which this is) to have some method to
the madness.  by just using the CLDR, we have a simple position: if you
think your change request is correct, then follow/convince CLDR.  their
processes/direction seems to line up with what we want as well.
-mike

signature.asc (836 bytes) Download Attachment