[PATCH] Fix decimal_point and thousands_sep in es_MX locale

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Fix decimal_point and thousands_sep in es_MX locale

Aurelien Jarno
The following patch fixes the value of decimal_point and thousands_sep
for the es_MX locale, which have been broken by commit 4b19cd7a.

Rationale:
- for decimal_point it's basically reverting to the previous version
  before commit 4b19cd7a.
- the change is consistent with mon_decimal_point and mon_thousands_sep
- http://en.wikipedia.org/wiki/Decimal_mark
- For those speaking Spanish, the official norm is available:
  http://www.ine.gob.mx/publicaciones/download/008scfi.pdf (page 57)
  amended by:
  http://www.dof.gob.mx/documentos/3837/seeco/seeco.htm
 
  These links shows that it was officially a coma instead of a dot
  starting from 2002, but given that nobody used it, it has been
  switch to coma *or* dot. The dot is the one used in practice.


diff --git a/localedata/ChangeLog b/localedata/ChangeLog
index 975e59f..a1b5971 100644
--- a/localedata/ChangeLog
+++ b/localedata/ChangeLog
@@ -1,3 +1,8 @@
+2012-06-02  Aurelien Jarno  <[hidden email]>
+
+ * locales/es_MX (LC_NUMERIC): Correctly set decimal_point,
+ thousands_sep and grouping.
+
 2012-04-20  Chandan Kumar  <[hidden email]>
 
  [BZ#13968]
diff --git a/localedata/locales/es_MX b/localedata/locales/es_MX
index 7a1cccc..21715b1 100644
--- a/localedata/locales/es_MX
+++ b/localedata/locales/es_MX
@@ -78,7 +78,9 @@ n_sign_posn          1
 END LC_MONETARY
 
 LC_NUMERIC
-copy "es_ES"
+decimal_point        "<U002E>"
+thousands_sep        "<U002C>"
+grouping             3;3
 END LC_NUMERIC
 
 LC_TIME

--
Aurelien Jarno                          GPG: 1024D/F1BCDB73
[hidden email]                 http://www.aurel32.net
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Carlos O'Donell-4
On 6/2/2012 3:53 PM, Aurelien Jarno wrote:
> The following patch fixes the value of decimal_point and thousands_sep
> for the es_MX locale, which have been broken by commit 4b19cd7a.

In the future please provide a more descriptive problem definition.

For example stating the current character used for the decimal point
and that of the thousands separator, and then stating what you
change them to (along with a verbal definition e.g. "full stop"
for <U002E>) would help the reviewer.

I know that writing clear and concise problem descriptions is time
consuming, but so is review :-)

> Rationale:
> - for decimal_point it's basically reverting to the previous version
>   before commit 4b19cd7a.

OK.

> - the change is consistent with mon_decimal_point and mon_thousands_sep
> - http://en.wikipedia.org/wiki/Decimal_mark
> - For those speaking Spanish, the official norm is available:
>   http://www.ine.gob.mx/publicaciones/download/008scfi.pdf (page 57)

Spanish is my first language and I read the document.

Call this [1]

>   amended by:
>   http://www.dof.gob.mx/documentos/3837/seeco/seeco.htm

I also read this document.

Call this [2]
 
>   These links shows that it was officially a coma instead of a dot
>   starting from 2002, but given that nobody used it, it has been
>   switch to coma *or* dot. The dot is the one used in practice.
 
Correct, [2] amends [1] and allows comma or full-stop to be used.
 

> diff --git a/localedata/ChangeLog b/localedata/ChangeLog
> index 975e59f..a1b5971 100644
> --- a/localedata/ChangeLog
> +++ b/localedata/ChangeLog
> @@ -1,3 +1,8 @@
> +2012-06-02  Aurelien Jarno  <[hidden email]>
> +
> + * locales/es_MX (LC_NUMERIC): Correctly set decimal_point,
> + thousands_sep and grouping.
> +
>  2012-04-20  Chandan Kumar  <[hidden email]>
>  
>   [BZ#13968]
> diff --git a/localedata/locales/es_MX b/localedata/locales/es_MX
> index 7a1cccc..21715b1 100644
> --- a/localedata/locales/es_MX
> +++ b/localedata/locales/es_MX
> @@ -78,7 +78,9 @@ n_sign_posn          1
>  END LC_MONETARY
>  
>  LC_NUMERIC
> -copy "es_ES"
> +decimal_point        "<U002E>"

This is OK.

> +thousands_sep        "<U002C>"

This is not correct (and it was previously not correct either).

According to [2] this must be a "small space" (pequeño espacio), and must never be a comma, point, or other symbol.

There is a `thin space' <U+2009> which probably serves the best purpose here.

What do other OSs do?

> +grouping             3;3

This is correct according to [2], but is the same as es_ES and so doesn't change.

>  END LC_NUMERIC
>  
>  LC_TIME

Cheers,
Carlos.
--
Carlos O'Donell
Mentor Graphics / CodeSourcery
[hidden email]
[hidden email]
+1 (613) 963 1026
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Aurelien Jarno
On Sun, Jun 03, 2012 at 08:37:06PM -0400, Carlos O'Donell wrote:

> On 6/2/2012 3:53 PM, Aurelien Jarno wrote:
> > The following patch fixes the value of decimal_point and thousands_sep
> > for the es_MX locale, which have been broken by commit 4b19cd7a.
>
> In the future please provide a more descriptive problem definition.
>
> For example stating the current character used for the decimal point
> and that of the thousands separator, and then stating what you
> change them to (along with a verbal definition e.g. "full stop"
> for <U002E>) would help the reviewer.
>
> I know that writing clear and concise problem descriptions is time
> consuming, but so is review :-)

Sorry about that, after looking for documents it was clear in my mind,
and i didn't realize my mail wasn't. Thanks for the review.

> > Rationale:
> > - for decimal_point it's basically reverting to the previous version
> >   before commit 4b19cd7a.
>
> OK.
>
> > - the change is consistent with mon_decimal_point and mon_thousands_sep
> > - http://en.wikipedia.org/wiki/Decimal_mark
> > - For those speaking Spanish, the official norm is available:
> >   http://www.ine.gob.mx/publicaciones/download/008scfi.pdf (page 57)
>
> Spanish is my first language and I read the document.
>
> Call this [1]
>
> >   amended by:
> >   http://www.dof.gob.mx/documentos/3837/seeco/seeco.htm
>
> I also read this document.
>
> Call this [2]
>  
> >   These links shows that it was officially a coma instead of a dot
> >   starting from 2002, but given that nobody used it, it has been
> >   switch to coma *or* dot. The dot is the one used in practice.
>  
> Correct, [2] amends [1] and allows comma or full-stop to be used.
>  
> > diff --git a/localedata/ChangeLog b/localedata/ChangeLog
> > index 975e59f..a1b5971 100644
> > --- a/localedata/ChangeLog
> > +++ b/localedata/ChangeLog
> > @@ -1,3 +1,8 @@
> > +2012-06-02  Aurelien Jarno  <[hidden email]>
> > +
> > + * locales/es_MX (LC_NUMERIC): Correctly set decimal_point,
> > + thousands_sep and grouping.
> > +
> >  2012-04-20  Chandan Kumar  <[hidden email]>
> >  
> >   [BZ#13968]
> > diff --git a/localedata/locales/es_MX b/localedata/locales/es_MX
> > index 7a1cccc..21715b1 100644
> > --- a/localedata/locales/es_MX
> > +++ b/localedata/locales/es_MX
> > @@ -78,7 +78,9 @@ n_sign_posn          1
> >  END LC_MONETARY
> >  
> >  LC_NUMERIC
> > -copy "es_ES"
> > +decimal_point        "<U002E>"
>
> This is OK.
>
> > +thousands_sep        "<U002C>"
>
> This is not correct (and it was previously not correct either).
>
> According to [2] this must be a "small space" (pequeño espacio), and must never be a comma, point, or other symbol.
>
> There is a `thin space' <U+2009> which probably serves the best purpose here.

Yes, it should be a thin space, it's actually what the SI/ISO 31-0
standards specifies. That said a lot of other languages are following
this standard and are supposed to use a thin space there (for example
fr_FR), but are using a normal space (<U0020>) there.

I realized my intention was to put <U0020> there instead. Is it
something acceptable?

> What do other OSs do?
>

Windows XP is using a comma (<U002C>) there.

Aurelien

--
Aurelien Jarno                        GPG: 1024D/F1BCDB73
[hidden email]                 http://www.aurel32.net
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Carlos O'Donell-4
On 6/6/2012 8:32 AM, Aurelien Jarno wrote:

> On Sun, Jun 03, 2012 at 08:37:06PM -0400, Carlos O'Donell wrote:
>> On 6/2/2012 3:53 PM, Aurelien Jarno wrote:
>>> The following patch fixes the value of decimal_point and thousands_sep
>>> for the es_MX locale, which have been broken by commit 4b19cd7a.
>>
>> In the future please provide a more descriptive problem definition.
>>
>> For example stating the current character used for the decimal point
>> and that of the thousands separator, and then stating what you
>> change them to (along with a verbal definition e.g. "full stop"
>> for <U002E>) would help the reviewer.
>>
>> I know that writing clear and concise problem descriptions is time
>> consuming, but so is review :-)
>
> Sorry about that, after looking for documents it was clear in my mind,
> and i didn't realize my mail wasn't. Thanks for the review.

No worries. I hope I didn't sound too harsh. I really appreciate your efforts!

>>> Rationale:
>>> - for decimal_point it's basically reverting to the previous version
>>>   before commit 4b19cd7a.
>>
>> OK.
>>
>>> - the change is consistent with mon_decimal_point and mon_thousands_sep
>>> - http://en.wikipedia.org/wiki/Decimal_mark
>>> - For those speaking Spanish, the official norm is available:
>>>   http://www.ine.gob.mx/publicaciones/download/008scfi.pdf (page 57)
>>
>> Spanish is my first language and I read the document.
>>
>> Call this [1]
>>
>>>   amended by:
>>>   http://www.dof.gob.mx/documentos/3837/seeco/seeco.htm
>>
>> I also read this document.
>>
>> Call this [2]
>>  
>>>   These links shows that it was officially a coma instead of a dot
>>>   starting from 2002, but given that nobody used it, it has been
>>>   switch to coma *or* dot. The dot is the one used in practice.
>>  
>> Correct, [2] amends [1] and allows comma or full-stop to be used.
>>  
>>> diff --git a/localedata/ChangeLog b/localedata/ChangeLog
>>> index 975e59f..a1b5971 100644
>>> --- a/localedata/ChangeLog
>>> +++ b/localedata/ChangeLog
>>> @@ -1,3 +1,8 @@
>>> +2012-06-02  Aurelien Jarno  <[hidden email]>
>>> +
>>> + * locales/es_MX (LC_NUMERIC): Correctly set decimal_point,
>>> + thousands_sep and grouping.
>>> +
>>>  2012-04-20  Chandan Kumar  <[hidden email]>
>>>  
>>>   [BZ#13968]
>>> diff --git a/localedata/locales/es_MX b/localedata/locales/es_MX
>>> index 7a1cccc..21715b1 100644
>>> --- a/localedata/locales/es_MX
>>> +++ b/localedata/locales/es_MX
>>> @@ -78,7 +78,9 @@ n_sign_posn          1
>>>  END LC_MONETARY
>>>  
>>>  LC_NUMERIC
>>> -copy "es_ES"
>>> +decimal_point        "<U002E>"
>>
>> This is OK.
>>
>>> +thousands_sep        "<U002C>"
>>
>> This is not correct (and it was previously not correct either).
>>
>> According to [2] this must be a "small space" (pequeño espacio), and must never be a comma, point, or other symbol.
>>
>> There is a `thin space' <U+2009> which probably serves the best purpose here.
>
> Yes, it should be a thin space, it's actually what the SI/ISO 31-0
> standards specifies. That said a lot of other languages are following
> this standard and are supposed to use a thin space there (for example
> fr_FR), but are using a normal space (<U0020>) there.
>
> I realized my intention was to put <U0020> there instead. Is it
> something acceptable?

Sorry, I'm a bit confused.

Why can't we use thin space?
 
>> What do other OSs do?
>>
>
> Windows XP is using a comma (<U002C>) there.

Which is wrong according to the standard.

Cheers,
Carlos.
--
Carlos O'Donell
Mentor Graphics / CodeSourcery
[hidden email]
[hidden email]
+1 (613) 963 1026
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Aurelien Jarno
On Wed, Jun 06, 2012 at 09:49:24AM -0400, Carlos O'Donell wrote:

> On 6/6/2012 8:32 AM, Aurelien Jarno wrote:
> > On Sun, Jun 03, 2012 at 08:37:06PM -0400, Carlos O'Donell wrote:
> >> On 6/2/2012 3:53 PM, Aurelien Jarno wrote:
> >>> The following patch fixes the value of decimal_point and thousands_sep
> >>> for the es_MX locale, which have been broken by commit 4b19cd7a.
> >>
> >> In the future please provide a more descriptive problem definition.
> >>
> >> For example stating the current character used for the decimal point
> >> and that of the thousands separator, and then stating what you
> >> change them to (along with a verbal definition e.g. "full stop"
> >> for <U002E>) would help the reviewer.
> >>
> >> I know that writing clear and concise problem descriptions is time
> >> consuming, but so is review :-)
> >
> > Sorry about that, after looking for documents it was clear in my mind,
> > and i didn't realize my mail wasn't. Thanks for the review.
>
> No worries. I hope I didn't sound too harsh. I really appreciate your efforts!
>
> >>> Rationale:
> >>> - for decimal_point it's basically reverting to the previous version
> >>>   before commit 4b19cd7a.
> >>
> >> OK.
> >>
> >>> - the change is consistent with mon_decimal_point and mon_thousands_sep
> >>> - http://en.wikipedia.org/wiki/Decimal_mark
> >>> - For those speaking Spanish, the official norm is available:
> >>>   http://www.ine.gob.mx/publicaciones/download/008scfi.pdf (page 57)
> >>
> >> Spanish is my first language and I read the document.
> >>
> >> Call this [1]
> >>
> >>>   amended by:
> >>>   http://www.dof.gob.mx/documentos/3837/seeco/seeco.htm
> >>
> >> I also read this document.
> >>
> >> Call this [2]
> >>  
> >>>   These links shows that it was officially a coma instead of a dot
> >>>   starting from 2002, but given that nobody used it, it has been
> >>>   switch to coma *or* dot. The dot is the one used in practice.
> >>  
> >> Correct, [2] amends [1] and allows comma or full-stop to be used.
> >>  
> >>> diff --git a/localedata/ChangeLog b/localedata/ChangeLog
> >>> index 975e59f..a1b5971 100644
> >>> --- a/localedata/ChangeLog
> >>> +++ b/localedata/ChangeLog
> >>> @@ -1,3 +1,8 @@
> >>> +2012-06-02  Aurelien Jarno  <[hidden email]>
> >>> +
> >>> + * locales/es_MX (LC_NUMERIC): Correctly set decimal_point,
> >>> + thousands_sep and grouping.
> >>> +
> >>>  2012-04-20  Chandan Kumar  <[hidden email]>
> >>>  
> >>>   [BZ#13968]
> >>> diff --git a/localedata/locales/es_MX b/localedata/locales/es_MX
> >>> index 7a1cccc..21715b1 100644
> >>> --- a/localedata/locales/es_MX
> >>> +++ b/localedata/locales/es_MX
> >>> @@ -78,7 +78,9 @@ n_sign_posn          1
> >>>  END LC_MONETARY
> >>>  
> >>>  LC_NUMERIC
> >>> -copy "es_ES"
> >>> +decimal_point        "<U002E>"
> >>
> >> This is OK.
> >>
> >>> +thousands_sep        "<U002C>"
> >>
> >> This is not correct (and it was previously not correct either).
> >>
> >> According to [2] this must be a "small space" (pequeño espacio), and must never be a comma, point, or other symbol.
> >>
> >> There is a `thin space' <U+2009> which probably serves the best purpose here.
> >
> > Yes, it should be a thin space, it's actually what the SI/ISO 31-0
> > standards specifies. That said a lot of other languages are following
> > this standard and are supposed to use a thin space there (for example
> > fr_FR), but are using a normal space (<U0020>) there.
> >
> > I realized my intention was to put <U0020> there instead. Is it
> > something acceptable?
>
> Sorry, I'm a bit confused.
>
> Why can't we use thin space?

Probably we can. I am just confused that all the locales that are
supposed to do that are not doing it, so there might be a reason behind
that.

> >> What do other OSs do?
> >>
> >
> > Windows XP is using a comma (<U002C>) there.
>
> Which is wrong according to the standard.
>

Agreed. OTOH Windows XP is very old.

Cheers,
Aurelien

--
Aurelien Jarno                        GPG: 1024D/F1BCDB73
[hidden email]                 http://www.aurel32.net
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Carlos O'Donell-4
On 6/6/2012 12:25 PM, Aurelien Jarno wrote:

>>> Yes, it should be a thin space, it's actually what the SI/ISO 31-0
>>> standards specifies. That said a lot of other languages are following
>>> this standard and are supposed to use a thin space there (for example
>>> fr_FR), but are using a normal space (<U0020>) there.
>>>
>>> I realized my intention was to put <U0020> there instead. Is it
>>> something acceptable?
>>
>> Sorry, I'm a bit confused.
>>
>> Why can't we use thin space?
>
> Probably we can. I am just confused that all the locales that are
> supposed to do that are not doing it, so there might be a reason behind
> that.

Aurelien,

I don't know. Could you give it a try and see what happens? :-)

Petr,

Several standards say that there should be a thin space between
the thousand separator, but the glibc locales have been using an
ASCII space <U0020> instead of the thin space <U2009>.

Do you know of any reason for this?

Cheers,
Carlos.
--
Carlos O'Donell
Mentor Graphics / CodeSourcery
[hidden email]
[hidden email]
+1 (613) 963 1026
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Petr Baudis
On Wed, Jun 06, 2012 at 12:48:28PM -0400, Carlos O'Donell wrote:
> Petr,
>
> Several standards say that there should be a thin space between
> the thousand separator, but the glibc locales have been using an
> ASCII space <U0020> instead of the thin space <U2009>.
>
> Do you know of any reason for this?

My *guess* is the reason is mostly historical. It seems that before
2000, locale files did not use Unicode codepoints but markup like <SP>,
which was then mechanically rewritten to <U0020> and noone bothered to
change it further; new locale authors likely did not think much of it
and/or copied data from existing locales using <U0020>.

I would certainly approve of updating this to thin space where
appropriate. However, I think that

  (i) Some programs might get confused by this; the person doing the
change should check at least behavior of common office programs after
this change (not sure if there is any other common software using this
data?), and possibly discuss this with their developers if there are
problems.

  (ii) We should do the change en masse at least in most locales that
are appropriate to change, so that any possible bugs in handling of
Unicode characters in these fields are quickly noticed and fixed.

Hope that this does not sound as too much work; I could help with
some of it, but do not have time to do the whole change by myself.

--
                                Petr "Pasky" Baudis
        Smart data structures and dumb code works a lot better
        than the other way around.  -- Eric S. Raymond
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Carlos O'Donell-4
On 6/6/2012 1:02 PM, Petr Baudis wrote:

> On Wed, Jun 06, 2012 at 12:48:28PM -0400, Carlos O'Donell wrote:
>> Petr,
>>
>> Several standards say that there should be a thin space between
>> the thousand separator, but the glibc locales have been using an
>> ASCII space <U0020> instead of the thin space <U2009>.
>>
>> Do you know of any reason for this?
>
> My *guess* is the reason is mostly historical. It seems that before
> 2000, locale files did not use Unicode codepoints but markup like <SP>,
> which was then mechanically rewritten to <U0020> and noone bothered to
> change it further; new locale authors likely did not think much of it
> and/or copied data from existing locales using <U0020>.
>
> I would certainly approve of updating this to thin space where
> appropriate. However, I think that
>
>   (i) Some programs might get confused by this; the person doing the
> change should check at least behavior of common office programs after
> this change (not sure if there is any other common software using this
> data?), and possibly discuss this with their developers if there are
> problems.
>
>   (ii) We should do the change en masse at least in most locales that
> are appropriate to change, so that any possible bugs in handling of
> Unicode characters in these fields are quickly noticed and fixed.
>
> Hope that this does not sound as too much work; I could help with
> some of it, but do not have time to do the whole change by myself.
>

This sounds like a great plan. We need leaders to step up and make
bold suggestions! :-)

For avoidance of doubt I would like to see:

(a) A new patch that uses <U0020> to fix es_MX. I'll review this again.

and

(b) A new BZ filed to fix all the locales using <U0020> to use
    thin space <U2009>. Set the target milestone to 2.17 please.

Aurelien, would you mind doing that?

Cheers,
Carlos.
--
Carlos O'Donell
Mentor Graphics / CodeSourcery
[hidden email]
[hidden email]
+1 (613) 963 1026
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Keld Simonsen-2
On Wed, Jun 06, 2012 at 01:14:25PM -0400, Carlos O'Donell wrote:

> On 6/6/2012 1:02 PM, Petr Baudis wrote:
> > On Wed, Jun 06, 2012 at 12:48:28PM -0400, Carlos O'Donell wrote:
> >> Petr,
> >>
> >> Several standards say that there should be a thin space between
> >> the thousand separator, but the glibc locales have been using an
> >> ASCII space <U0020> instead of the thin space <U2009>.
> >>
> >> Do you know of any reason for this?
> >
> > My *guess* is the reason is mostly historical. It seems that before
> > 2000, locale files did not use Unicode codepoints but markup like <SP>,
> > which was then mechanically rewritten to <U0020> and noone bothered to
> > change it further; new locale authors likely did not think much of it
> > and/or copied data from existing locales using <U0020>.
> >
> > I would certainly approve of updating this to thin space where
> > appropriate. However, I think that
> >
> >   (i) Some programs might get confused by this; the person doing the
> > change should check at least behavior of common office programs after
> > this change (not sure if there is any other common software using this
> > data?), and possibly discuss this with their developers if there are
> > problems.
> >
> >   (ii) We should do the change en masse at least in most locales that
> > are appropriate to change, so that any possible bugs in handling of
> > Unicode characters in these fields are quickly noticed and fixed.
> >
> > Hope that this does not sound as too much work; I could help with
> > some of it, but do not have time to do the whole change by myself.
> >
>
> This sounds like a great plan. We need leaders to step up and make
> bold suggestions! :-)
>
> For avoidance of doubt I would like to see:
>
> (a) A new patch that uses <U0020> to fix es_MX. I'll review this again.
>
> and
>
> (b) A new BZ filed to fix all the locales using <U0020> to use
>     thin space <U2009>. Set the target milestone to 2.17 please.

Well, I think this is not the best way forward.
It is only some countries that prescribes thin space as the thousands separator.
This should be documented in each case.

Also the use of Unicode only characters break the universallity of locales,
as they cannot be used with 8-bit character sets.
And it may break programs that tries to parse numbers.

best regards
Keld
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Petr Baudis
  Hi!

On Wed, Jun 06, 2012 at 11:53:11PM +0200, Keld Simonsen wrote:
> On Wed, Jun 06, 2012 at 01:14:25PM -0400, Carlos O'Donell wrote:
> > (b) A new BZ filed to fix all the locales using <U0020> to use
> >     thin space <U2009>. Set the target milestone to 2.17 please.
..snip..
> Also the use of Unicode only characters break the universallity of locales,
> as they cannot be used with 8-bit character sets.
> And it may break programs that tries to parse numbers.

  Can you please elaborate on this point? Surely, if a locale is using
UTF-8 charset, it should be permitted to include UTF-8 characters? Is
there a point in making a difference between e.g. LC_TIME and LC_CTYPE?

  (In case a locale is generated with more restrictive charset, e.g.
ISO-8859-1, the thin space is automatically transliterated to normal
space.)

--
                                Petr "Pasky" Baudis
        Smart data structures and dumb code works a lot better
        than the other way around.  -- Eric S. Raymond
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Aurelien Jarno
In reply to this post by Carlos O'Donell-4
On Wed, Jun 06, 2012 at 01:14:25PM -0400, Carlos O'Donell wrote:

> On 6/6/2012 1:02 PM, Petr Baudis wrote:
> > On Wed, Jun 06, 2012 at 12:48:28PM -0400, Carlos O'Donell wrote:
> >> Petr,
> >>
> >> Several standards say that there should be a thin space between
> >> the thousand separator, but the glibc locales have been using an
> >> ASCII space <U0020> instead of the thin space <U2009>.
> >>
> >> Do you know of any reason for this?
> >
> > My *guess* is the reason is mostly historical. It seems that before
> > 2000, locale files did not use Unicode codepoints but markup like <SP>,
> > which was then mechanically rewritten to <U0020> and noone bothered to
> > change it further; new locale authors likely did not think much of it
> > and/or copied data from existing locales using <U0020>.
> >
> > I would certainly approve of updating this to thin space where
> > appropriate. However, I think that
> >
> >   (i) Some programs might get confused by this; the person doing the
> > change should check at least behavior of common office programs after
> > this change (not sure if there is any other common software using this
> > data?), and possibly discuss this with their developers if there are
> > problems.
> >
> >   (ii) We should do the change en masse at least in most locales that
> > are appropriate to change, so that any possible bugs in handling of
> > Unicode characters in these fields are quickly noticed and fixed.
> >
> > Hope that this does not sound as too much work; I could help with
> > some of it, but do not have time to do the whole change by myself.
> >
>
> This sounds like a great plan. We need leaders to step up and make
> bold suggestions! :-)
>
> For avoidance of doubt I would like to see:
>
> (a) A new patch that uses <U0020> to fix es_MX. I'll review this again.
>
> and
>
> (b) A new BZ filed to fix all the locales using <U0020> to use
>     thin space <U2009>. Set the target milestone to 2.17 please.
>
> Aurelien, would you mind doing that?
>

Yes, it's something I can do, but I won't be able to do (b) before a few
days due to lack of time, as this part needs some more testing.

--
Aurelien Jarno                        GPG: 1024D/F1BCDB73
[hidden email]                 http://www.aurel32.net
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Carlos O'Donell-4
On 6/7/2012 12:51 AM, Aurelien Jarno wrote:

> On Wed, Jun 06, 2012 at 01:14:25PM -0400, Carlos O'Donell wrote:
>> On 6/6/2012 1:02 PM, Petr Baudis wrote:
>>> On Wed, Jun 06, 2012 at 12:48:28PM -0400, Carlos O'Donell wrote:
>>>> Petr,
>>>>
>>>> Several standards say that there should be a thin space between
>>>> the thousand separator, but the glibc locales have been using an
>>>> ASCII space <U0020> instead of the thin space <U2009>.
>>>>
>>>> Do you know of any reason for this?
>>>
>>> My *guess* is the reason is mostly historical. It seems that before
>>> 2000, locale files did not use Unicode codepoints but markup like <SP>,
>>> which was then mechanically rewritten to <U0020> and noone bothered to
>>> change it further; new locale authors likely did not think much of it
>>> and/or copied data from existing locales using <U0020>.
>>>
>>> I would certainly approve of updating this to thin space where
>>> appropriate. However, I think that
>>>
>>>   (i) Some programs might get confused by this; the person doing the
>>> change should check at least behavior of common office programs after
>>> this change (not sure if there is any other common software using this
>>> data?), and possibly discuss this with their developers if there are
>>> problems.
>>>
>>>   (ii) We should do the change en masse at least in most locales that
>>> are appropriate to change, so that any possible bugs in handling of
>>> Unicode characters in these fields are quickly noticed and fixed.
>>>
>>> Hope that this does not sound as too much work; I could help with
>>> some of it, but do not have time to do the whole change by myself.
>>>
>>
>> This sounds like a great plan. We need leaders to step up and make
>> bold suggestions! :-)
>>
>> For avoidance of doubt I would like to see:
>>
>> (a) A new patch that uses <U0020> to fix es_MX. I'll review this again.
>>
>> and
>>
>> (b) A new BZ filed to fix all the locales using <U0020> to use
>>     thin space <U2009>. Set the target milestone to 2.17 please.
>>
>> Aurelien, would you mind doing that?
>>
>
> Yes, it's something I can do, but I won't be able to do (b) before a few
> days due to lack of time, as this part needs some more testing.
>

Just to be clear you don't need to fix the BZ, just file one so we don't
forget.

I understand that doing (a) might take more time for testing.

Thank you for your efforts in cleaning this up!

Cheers,
Carlos.
--
Carlos O'Donell
Mentor Graphics / CodeSourcery
[hidden email]
[hidden email]
+1 (613) 963 1026
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Keld Simonsen-2
In reply to this post by Petr Baudis
On Thu, Jun 07, 2012 at 03:45:46AM +0200, Petr Baudis wrote:

>   Hi!
>
> On Wed, Jun 06, 2012 at 11:53:11PM +0200, Keld Simonsen wrote:
> > On Wed, Jun 06, 2012 at 01:14:25PM -0400, Carlos O'Donell wrote:
> > > (b) A new BZ filed to fix all the locales using <U0020> to use
> > >     thin space <U2009>. Set the target milestone to 2.17 please.
> ..snip..
> > Also the use of Unicode only characters break the universallity of locales,
> > as they cannot be used with 8-bit character sets.
> > And it may break programs that tries to parse numbers.
>
>   Can you please elaborate on this point? Surely, if a locale is using
> UTF-8 charset, it should be permitted to include UTF-8 characters? Is
> there a point in making a difference between e.g. LC_TIME and LC_CTYPE?

Yes, of cause it should be possible to use UTF-8 characters.

But we then may need to have more versions of locales, eg. one with
utf-8 characters, and one with a more restricted character set.

>   (In case a locale is generated with more restrictive charset, e.g.
> ISO-8859-1, the thin space is automatically transliterated to normal
> space.)

Is it really transtliterated to a notrmal space? What wonders
our locales can do:-)

A problem with the normal space is that it is difficult to parse.
A normal space is normally used as a separator.

There is also a question  on what our locales really are aimed at.
Eg dates, - what we have is partly meant for listing files in long
format (ls -l) and this indicates that the format needs to be
constant width.

The same with number formats: There seems to be several schools
whether to use a space or period/comma as thousands separator in a number of countries.
EG Norway - the linguists says space, but banks allways use period.
Spreadsheets use period too, and all financial software likewise.
For programs where you need to process outputted numbers again, the
numbers better be parsable - there is then a point in having the locales be
more computer-oriented than the linguists want them to be. This could lead
to having more specification possibilities in the locales.

BTW, why thin space? My understanding is that numbers often needs to
be output very aligned, and thus all digits need to have the same width.
The space should then have also the same width as a digit.
(The same goes for comma as  decimal separator)

Best regards
keld
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix decimal_point and thousands_sep in es_MX locale

Petr Baudis
On Fri, Jun 08, 2012 at 12:50:31AM +0200, Keld Simonsen wrote:
> Yes, of cause it should be possible to use UTF-8 characters.
>
> But we then may need to have more versions of locales, eg. one with
> utf-8 characters, and one with a more restricted character set.

  Isn't this all solved by localedef? Except where different
transliterations are required in various context, but I don't
know about any case like that.

> >   (In case a locale is generated with more restrictive charset, e.g.
> > ISO-8859-1, the thin space is automatically transliterated to normal
> > space.)
>
> Is it really transtliterated to a notrmal space? What wonders
> our locales can do:-)

  Yes it is, that's what all the translit* files do. (This one is
handled in translit_compat.)

> A problem with the normal space is that it is difficult to parse.
> A normal space is normally used as a separator.

  But that's our current situation.

> There is also a question  on what our locales really are aimed at.
> Eg dates, - what we have is partly meant for listing files in long
> format (ls -l) and this indicates that the format needs to be
> constant width.

  This is all very hairy. ls -l in particular does some very special
magic that in part boils down to internally gettexting message formats
and auto-guessing width modifier for %b based on chosen locale (so that
field can be variable-width). The final value printed does not map
easily to any LC_TIME key.

> The same with number formats: There seems to be several schools
> whether to use a space or period/comma as thousands separator in a number of countries.
> EG Norway - the linguists says space, but banks allways use period.
> Spreadsheets use period too, and all financial software likewise.
> For programs where you need to process outputted numbers again, the
> numbers better be parsable - there is then a point in having the locales be
> more computer-oriented than the linguists want them to be. This could lead
> to having more specification possibilities in the locales.

  We have separate thousands separator for normal numbers and monetary
values in locales, so is this really an issue?

> BTW, why thin space? My understanding is that numbers often needs to
> be output very aligned, and thus all digits need to have the same width.
> The space should then have also the same width as a digit.
> (The same goes for comma as  decimal separator)

  In monospace fonts, thin space is as wide as normal space. In other
fonts, digits are variable-width anyway and you will not get alignment.

--
                                Petr "Pasky" Baudis
        Smart data structures and dumb code works a lot better
        than the other way around.  -- Eric S. Raymond