[PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard

Mike Frysinger
The ISO 14652 standard defines the valid values for the category
keyword as only two options:
        posix:1993
        i18n:2002

The vast majority of locales had changed the "i18n" string to the
name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
date (presumably thinking it should be the date of submission).

Convert all of them to "i18n:2002" for consistency.

Compressed + attached due to size.  Example change:
--- a/localedata/locales/ak_GH
+++ b/localedata/locales/ak_GH
@@ -37,19 +37,19 @@ language     "Akan"
 territory    "Ghana"
 revision     "1.0"
 date         "2013-08-24"
-%
-category  "ak_GH:2013";LC_IDENTIFICATION
-category  "ak_GH:2013";LC_CTYPE
-category  "ak_GH:2013";LC_COLLATE
-category  "ak_GH:2013";LC_TIME
-category  "ak_GH:2013";LC_NUMERIC
-category  "ak_GH:2013";LC_MONETARY
-category  "ak_GH:2013";LC_PAPER
-category  "ak_GH:2013";LC_MEASUREMENT
-category  "ak_GH:2013";LC_MESSAGES
-category  "ak_GH:2013";LC_NAME
-category  "ak_GH:2013";LC_ADDRESS
-category  "ak_GH:2013";LC_TELEPHONE
+
+category "i18n:2002";LC_IDENTIFICATION
+category "i18n:2002";LC_CTYPE
+category "i18n:2002";LC_COLLATE
+category "i18n:2002";LC_TIME
+category "i18n:2002";LC_NUMERIC
+category "i18n:2002";LC_MONETARY
+category "i18n:2002";LC_PAPER
+category "i18n:2002";LC_MEASUREMENT
+category "i18n:2002";LC_MESSAGES
+category "i18n:2002";LC_NAME
+category "i18n:2002";LC_ADDRESS
+category "i18n:2002";LC_TELEPHONE
 END LC_IDENTIFICATION
 
 LC_CTYPE

0001-localedate-LC_IDENTIFICATION.category-set-to-ISO-146.patch.xz (28K) Download Attachment
signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard

Chris Leonard
 +1 from me FWIW

cjl

On Wed, Apr 13, 2016 at 12:39 PM, Mike Frysinger <[hidden email]> wrote:

> The ISO 14652 standard defines the valid values for the category
> keyword as only two options:
>         posix:1993
>         i18n:2002
>
> The vast majority of locales had changed the "i18n" string to the
> name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
> date (presumably thinking it should be the date of submission).
>
> Convert all of them to "i18n:2002" for consistency.
>
> Compressed + attached due to size.  Example change:
> --- a/localedata/locales/ak_GH
> +++ b/localedata/locales/ak_GH
> @@ -37,19 +37,19 @@ language     "Akan"
>  territory    "Ghana"
>  revision     "1.0"
>  date         "2013-08-24"
> -%
> -category  "ak_GH:2013";LC_IDENTIFICATION
> -category  "ak_GH:2013";LC_CTYPE
> -category  "ak_GH:2013";LC_COLLATE
> -category  "ak_GH:2013";LC_TIME
> -category  "ak_GH:2013";LC_NUMERIC
> -category  "ak_GH:2013";LC_MONETARY
> -category  "ak_GH:2013";LC_PAPER
> -category  "ak_GH:2013";LC_MEASUREMENT
> -category  "ak_GH:2013";LC_MESSAGES
> -category  "ak_GH:2013";LC_NAME
> -category  "ak_GH:2013";LC_ADDRESS
> -category  "ak_GH:2013";LC_TELEPHONE
> +
> +category "i18n:2002";LC_IDENTIFICATION
> +category "i18n:2002";LC_CTYPE
> +category "i18n:2002";LC_COLLATE
> +category "i18n:2002";LC_TIME
> +category "i18n:2002";LC_NUMERIC
> +category "i18n:2002";LC_MONETARY
> +category "i18n:2002";LC_PAPER
> +category "i18n:2002";LC_MEASUREMENT
> +category "i18n:2002";LC_MESSAGES
> +category "i18n:2002";LC_NAME
> +category "i18n:2002";LC_ADDRESS
> +category "i18n:2002";LC_TELEPHONE
>  END LC_IDENTIFICATION
>
>  LC_CTYPE
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard

Carlos O'Donell-6
In reply to this post by Mike Frysinger
On 04/13/2016 12:39 PM, Mike Frysinger wrote:

> The ISO 14652 standard defines the valid values for the category
> keyword as only two options:
> posix:1993
> i18n:2002
>
> The vast majority of locales had changed the "i18n" string to the
> name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
> date (presumably thinking it should be the date of submission).
>
> Convert all of them to "i18n:2002" for consistency.

Any chance you can tighten the parser to reject anything but the
two valid category keywords?

I think this change is correct, but I'd rather see a patch that
enforces policy *and* changes the locale source to match.

--
Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard

Mike Frysinger
On 13 Apr 2016 14:57, Carlos O'Donell wrote:

> On 04/13/2016 12:39 PM, Mike Frysinger wrote:
> > The ISO 14652 standard defines the valid values for the category
> > keyword as only two options:
> > posix:1993
> > i18n:2002
> >
> > The vast majority of locales had changed the "i18n" string to the
> > name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
> > date (presumably thinking it should be the date of submission).
> >
> > Convert all of them to "i18n:2002" for consistency.
>
> Any chance you can tighten the parser to reject anything but the
> two valid category keywords?
i figured someone would ask for that eventually :).  it's not clear to
me how many valid values there are because the ISO 14652 standard is
difficult to obtain.  i've only be able to find 1999 and 2002 copies,
but i'm pretty sure there's other revisions as well.  maybe we start
off only accepting these two values and worry about the rest later ?

the other aspect is that, while we might validate some sanity on the
category fields in general, the code (afaict) is not structured for
actually handling the differences.  for example, if the locale says
posix:1993 or i18n:1999 (which the older ISO 14652 1999 standard
allows), we don't change the parsing behavior to reject features
that are new to i18n:2002.

i guess one thing at a time: let's update localdef to only accept
these two values and reject all others.  i'll look at that before
merging this patch in case it's easy to do.
-mike

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH] localedef: check LC_IDENTIFICATION.category values

Mike Frysinger
In reply to this post by Carlos O'Donell-6
Currently localedef accepts any value for the category keyword.  This has
allowed bad values to propagate to the vast majority of locales (~90%).
Add some logic to only accept the 1993 POSIX and 2002 ISO-14652 standards.

2016-04-13  Mike Frysinger  <[hidden email]>

        * locale/programs/ld-identification.c (identification_finish): Check
        that the values in identification->category are only posix:1993 or
        i18n:2002.
---
 locale/programs/ld-identification.c | 42 ++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c
index 1e8fa84..eccb388 100644
--- a/locale/programs/ld-identification.c
+++ b/locale/programs/ld-identification.c
@@ -164,14 +164,42 @@ No definition for %s category found"), "LC_IDENTIFICATION"));
   TEST_ELEM (date);
 
   for (num = 0; num < __LC_LAST; ++num)
-    if (num != LC_ALL && identification->category[num] == NULL)
-      {
- if (verbose && ! nothing)
-  WITH_CUR_LOCALE (error (0, 0, _("\
+    {
+      /* We don't accept/parse this category, so skip it early.  */
+      if (num == LC_ALL)
+ continue;
+
+      if (identification->category[num] == NULL)
+ {
+  if (verbose && ! nothing)
+    WITH_CUR_LOCALE (error (0, 0, _("\
 %s: no identification for category `%s'"),
-  "LC_IDENTIFICATION", category_name[num]));
- identification->category[num] = "";
-      }
+    "LC_IDENTIFICATION", category_name[num]));
+  identification->category[num] = "";
+ }
+      else
+ {
+  /* Only list the standards we care about.  */
+  static const char * const standards[] =
+    {
+      "posix:1993",
+      "i18n:2002",
+    };
+  size_t i;
+  bool matched = false;
+
+  for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i)
+    if (strcmp (identification->category[num], standards[i]) == 0)
+      matched = true;
+
+  if (matched != true)
+    WITH_CUR_LOCALE (error (0, 0, _("\
+%s: unknown standard `%s' for category `%s'"),
+    "LC_IDENTIFICATION",
+    identification->category[num],
+    category_name[num]));
+ }
+    }
 }
 
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] localedef: check LC_IDENTIFICATION.category values

Keld Simonsen-2
Please also allow ISO 30112 categories.

best regards
keld

On Wed, Apr 13, 2016 at 06:45:32PM -0400, Mike Frysinger wrote:

> Currently localedef accepts any value for the category keyword.  This has
> allowed bad values to propagate to the vast majority of locales (~90%).
> Add some logic to only accept the 1993 POSIX and 2002 ISO-14652 standards.
>
> 2016-04-13  Mike Frysinger  <[hidden email]>
>
> * locale/programs/ld-identification.c (identification_finish): Check
> that the values in identification->category are only posix:1993 or
> i18n:2002.
> ---
>  locale/programs/ld-identification.c | 42 ++++++++++++++++++++++++++++++-------
>  1 file changed, 35 insertions(+), 7 deletions(-)
>
> diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c
> index 1e8fa84..eccb388 100644
> --- a/locale/programs/ld-identification.c
> +++ b/locale/programs/ld-identification.c
> @@ -164,14 +164,42 @@ No definition for %s category found"), "LC_IDENTIFICATION"));
>    TEST_ELEM (date);
>  
>    for (num = 0; num < __LC_LAST; ++num)
> -    if (num != LC_ALL && identification->category[num] == NULL)
> -      {
> - if (verbose && ! nothing)
> -  WITH_CUR_LOCALE (error (0, 0, _("\
> +    {
> +      /* We don't accept/parse this category, so skip it early.  */
> +      if (num == LC_ALL)
> + continue;
> +
> +      if (identification->category[num] == NULL)
> + {
> +  if (verbose && ! nothing)
> +    WITH_CUR_LOCALE (error (0, 0, _("\
>  %s: no identification for category `%s'"),
> -  "LC_IDENTIFICATION", category_name[num]));
> - identification->category[num] = "";
> -      }
> +    "LC_IDENTIFICATION", category_name[num]));
> +  identification->category[num] = "";
> + }
> +      else
> + {
> +  /* Only list the standards we care about.  */
> +  static const char * const standards[] =
> +    {
> +      "posix:1993",
> +      "i18n:2002",
> +    };
> +  size_t i;
> +  bool matched = false;
> +
> +  for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i)
> +    if (strcmp (identification->category[num], standards[i]) == 0)
> +      matched = true;
> +
> +  if (matched != true)
> +    WITH_CUR_LOCALE (error (0, 0, _("\
> +%s: unknown standard `%s' for category `%s'"),
> +    "LC_IDENTIFICATION",
> +    identification->category[num],
> +    category_name[num]));
> + }
> +    }
>  }
>  
>  
> --
> 2.7.4
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] localedef: check LC_IDENTIFICATION.category values

Keld Simonsen-2
Actually the standards 14652/30112 were set up so you could declare
what version of the locale category was used for the data.
POSIX is different from 14652 and again different from 30112.
30112 is the one that most closely corresponds to glibc implementations.


I also think that POSIX allows for more categories than the ones that the
9945 standard defines, and in that way 14652 and 30112 are compatible
with POSIX. I would advise that this still be allowed, but then declared
in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like
"non-standard" to indicate that.

I would advice to use the values for the locale versions
given in 30112. The values defined in 30112 are:
i18n:2004
i18n:2012
posix:1993

Best regards
Keld


On Thu, Apr 14, 2016 at 10:59:19AM +0200, [hidden email] wrote:

> Please also allow ISO 30112 categories.
>
> best regards
> keld
>
> On Wed, Apr 13, 2016 at 06:45:32PM -0400, Mike Frysinger wrote:
> > Currently localedef accepts any value for the category keyword.  This has
> > allowed bad values to propagate to the vast majority of locales (~90%).
> > Add some logic to only accept the 1993 POSIX and 2002 ISO-14652 standards.
> >
> > 2016-04-13  Mike Frysinger  <[hidden email]>
> >
> > * locale/programs/ld-identification.c (identification_finish): Check
> > that the values in identification->category are only posix:1993 or
> > i18n:2002.
> > ---
> >  locale/programs/ld-identification.c | 42 ++++++++++++++++++++++++++++++-------
> >  1 file changed, 35 insertions(+), 7 deletions(-)
> >
> > diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c
> > index 1e8fa84..eccb388 100644
> > --- a/locale/programs/ld-identification.c
> > +++ b/locale/programs/ld-identification.c
> > @@ -164,14 +164,42 @@ No definition for %s category found"), "LC_IDENTIFICATION"));
> >    TEST_ELEM (date);
> >  
> >    for (num = 0; num < __LC_LAST; ++num)
> > -    if (num != LC_ALL && identification->category[num] == NULL)
> > -      {
> > - if (verbose && ! nothing)
> > -  WITH_CUR_LOCALE (error (0, 0, _("\
> > +    {
> > +      /* We don't accept/parse this category, so skip it early.  */
> > +      if (num == LC_ALL)
> > + continue;
> > +
> > +      if (identification->category[num] == NULL)
> > + {
> > +  if (verbose && ! nothing)
> > +    WITH_CUR_LOCALE (error (0, 0, _("\
> >  %s: no identification for category `%s'"),
> > -  "LC_IDENTIFICATION", category_name[num]));
> > - identification->category[num] = "";
> > -      }
> > +    "LC_IDENTIFICATION", category_name[num]));
> > +  identification->category[num] = "";
> > + }
> > +      else
> > + {
> > +  /* Only list the standards we care about.  */
> > +  static const char * const standards[] =
> > +    {
> > +      "posix:1993",
> > +      "i18n:2002",
> > +    };
> > +  size_t i;
> > +  bool matched = false;
> > +
> > +  for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i)
> > +    if (strcmp (identification->category[num], standards[i]) == 0)
> > +      matched = true;
> > +
> > +  if (matched != true)
> > +    WITH_CUR_LOCALE (error (0, 0, _("\
> > +%s: unknown standard `%s' for category `%s'"),
> > +    "LC_IDENTIFICATION",
> > +    identification->category[num],
> > +    category_name[num]));
> > + }
> > +    }
> >  }
> >  
> >  
> > --
> > 2.7.4
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] localedef: check LC_IDENTIFICATION.category values

Mike Frysinger
On 14 Apr 2016 11:26, [hidden email] wrote:
> Actually the standards 14652/30112 were set up so you could declare
> what version of the locale category was used for the data.
> POSIX is different from 14652 and again different from 30112.
> 30112 is the one that most closely corresponds to glibc implementations.

in general, for standards that are stuck behind ISO's dumb paywall (they
want to charge CHF198 for the pleasure of downloading what should be in
the public), you'll have to tell me what values to plug in, and/or what
it says.

although i have found this link:
        http://www.open-std.org/JTC1/SC35/WG5/docs/30112d10.pdf
is that the same ?

if it is, i would highlight that the examples provided in the spec do
not seem to line up with the spec itself ;).  the Danish example that
is embedded in the file tries to use "i18n:2000", and it doesn't use
double quotes like it says it should be.

> I also think that POSIX allows for more categories than the ones that the
> 9945 standard defines, and in that way 14652 and 30112 are compatible

looks like ISO 9945 is just the combined POSIX standard (2003 edition).
the public 2004 edition [1] and 2013 edition [2] do not define the cat
LC_IDENTIFICATION, so they wouldn't have anything to say here.  also,
even if those allow for defining of arbitrary categories, that's kind
of orthogonal to glibc's localedef needs isn't it ?  the utility has
been rejecting all unknown categories for basically ever at this point.
[1] http://pubs.opengroup.org/onlinepubs/009695399/
[2] http://pubs.opengroup.org/onlinepubs/9699919799/

if you try to do:
LC_FOO
...
END LC_FOO
localdef will reject it as a syntax error.

if you try to do:
LC_IDENTIFICATION
...
category "en_US:2000";LC_FOO
...
END LC_IDENTIFICATION
localdef will reject it as a syntax error (ignoring the standard part).

are you referring to something else ?

> with POSIX. I would advise that this still be allowed, but then declared
> in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like
> "non-standard" to indicate that.

why do we need to support that ?  we're talking about what localedef
will accept, and localedef is entirely a glibc-specific utility.  the
binary format it produces is internal glibc ABI.  seems like accepting
other random values isn't useful to us.

> I would advice to use the values for the locale versions
> given in 30112. The values defined in 30112 are:
> i18n:2004
> i18n:2012
> posix:1993

OK.  shall i update all the locale files then to use i18n:2012 ?
-mike

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] localedef: check LC_IDENTIFICATION.category values

Keld Simonsen-2
On Thu, Apr 14, 2016 at 09:50:33AM -0400, Mike Frysinger wrote:

> On 14 Apr 2016 11:26, [hidden email] wrote:
> > Actually the standards 14652/30112 were set up so you could declare
> > what version of the locale category was used for the data.
> > POSIX is different from 14652 and again different from 30112.
> > 30112 is the one that most closely corresponds to glibc implementations.
>
> in general, for standards that are stuck behind ISO's dumb paywall (they
> want to charge CHF198 for the pleasure of downloading what should be in
> the public), you'll have to tell me what values to plug in, and/or what
> it says.

I agree.

> although i have found this link:
> http://www.open-std.org/JTC1/SC35/WG5/docs/30112d10.pdf
> is that the same ?

It is a new Working Draft for the revision of 30112, so it contains all of
the approved TR 30112 from 2014, plus some. But it is not a standard,
it is work in progress. That is why we are allowed to have it publically available.

> if it is, i would highlight that the examples provided in the spec do
> not seem to line up with the spec itself ;).  the Danish example that
> is embedded in the file tries to use "i18n:2000", and it doesn't use
> double quotes like it says it should be.

There are errors everywhere. This is a draft, and not supposed to be error-free.
Anyway, the same inconsistency was probably in the approved TR.
I will see to that this be corrected. Probably it should be marked with
the new standards's identifying value.

> > I also think that POSIX allows for more categories than the ones that the
> > 9945 standard defines, and in that way 14652 and 30112 are compatible
>
> looks like ISO 9945 is just the combined POSIX standard (2003 edition).
> the public 2004 edition [1] and 2013 edition [2] do not define the cat
> LC_IDENTIFICATION, so they wouldn't have anything to say here.  also,
> even if those allow for defining of arbitrary categories, that's kind
> of orthogonal to glibc's localedef needs isn't it ?  the utility has
> been rejecting all unknown categories for basically ever at this point.
> [1] http://pubs.opengroup.org/onlinepubs/009695399/
> [2] http://pubs.opengroup.org/onlinepubs/9699919799/

Well, yes, LC_IDENTIFICATION is a novelty of 14652.
But 9945 - POSIX does allow implementation defined categories AFAIK.
There is one new category in 30112, namely LC_KEYBOARD. I am not sure whether
glibc supports LC_XLITERATE eitherC, or the functionality is present only in
LC_CTYPE.

>
> if you try to do:
> LC_FOO
> ...
> END LC_FOO
> localdef will reject it as a syntax error.
>
> if you try to do:
> LC_IDENTIFICATION
> ...
> category "en_US:2000";LC_FOO
> ...
> END LC_IDENTIFICATION
> localdef will reject it as a syntax error (ignoring the standard part).
>
> are you referring to something else ?

No. I would like your last example to not error, it could issue a warning,
or at least that LC_KEYBOARD be accepted.
In that way one could use localedef to test new functionality.

> > with POSIX. I would advise that this still be allowed, but then declared
> > in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like
> > "non-standard" to indicate that.
>
> why do we need to support that ?  we're talking about what localedef
> will accept, and localedef is entirely a glibc-specific utility.  the
> binary format it produces is internal glibc ABI.  seems like accepting
> other random values isn't useful to us.

Localedef is specified in POSIX,
http://pubs.opengroup.org/onlinepubs/009696699/utilities/localedef.html

> > I would advice to use the values for the locale versions
> > given in 30112. The values defined in 30112 are:
> > i18n:2004
> > i18n:2012
> > posix:1993
>
> OK.  shall i update all the locale files then to use i18n:2012 ?

Yes, I think that this is the most appropiate.

Best regards
Keld


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] localedef: check LC_IDENTIFICATION.category values

Mike Frysinger
On 14 Apr 2016 17:04, [hidden email] wrote:

> On Thu, Apr 14, 2016 at 09:50:33AM -0400, Mike Frysinger wrote:
> > On 14 Apr 2016 11:26, [hidden email] wrote:
> > > I also think that POSIX allows for more categories than the ones that the
> > > 9945 standard defines, and in that way 14652 and 30112 are compatible
> >
> > looks like ISO 9945 is just the combined POSIX standard (2003 edition).
> > the public 2004 edition [1] and 2013 edition [2] do not define the cat
> > LC_IDENTIFICATION, so they wouldn't have anything to say here.  also,
> > even if those allow for defining of arbitrary categories, that's kind
> > of orthogonal to glibc's localedef needs isn't it ?  the utility has
> > been rejecting all unknown categories for basically ever at this point.
> > [1] http://pubs.opengroup.org/onlinepubs/009695399/
> > [2] http://pubs.opengroup.org/onlinepubs/9699919799/
>
> Well, yes, LC_IDENTIFICATION is a novelty of 14652.
> But 9945 - POSIX does allow implementation defined categories AFAIK.
sure -- see below

> There is one new category in 30112, namely LC_KEYBOARD. I am not sure whether
> glibc supports LC_XLITERATE eitherC, or the functionality is present only in
> LC_CTYPE.

we don't support LC_KEYBOARD or LC_XLITERATE today.  i think any new
categories would need to be proposed including why glibc should carry
them at all.  i haven't read the standard, so i can't speak to either.

> > if you try to do:
> > LC_FOO
> > ...
> > END LC_FOO
> > localdef will reject it as a syntax error.
> >
> > if you try to do:
> > LC_IDENTIFICATION
> > ...
> > category "en_US:2000";LC_FOO
> > ...
> > END LC_IDENTIFICATION
> > localdef will reject it as a syntax error (ignoring the standard part).
> >
> > are you referring to something else ?
>
> No. I would like your last example to not error, it could issue a warning,
> or at least that LC_KEYBOARD be accepted.
> In that way one could use localedef to test new functionality.
we can have it warn.  localedef has precedence w/not warning about many
things or being fatal by default, but adding -v makes it more strict.
this seems to fall into that bucket.

i'm not keen on -v/--verbose being a hidden alias to also "exit non-zero
in many more cases", but that's a diff topic :).

> > > with POSIX. I would advise that this still be allowed, but then declared
> > > in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like
> > > "non-standard" to indicate that.
> >
> > why do we need to support that ?  we're talking about what localedef
> > will accept, and localedef is entirely a glibc-specific utility.  the
> > binary format it produces is internal glibc ABI.  seems like accepting
> > other random values isn't useful to us.
>
> Localedef is specified in POSIX,
> http://pubs.opengroup.org/onlinepubs/009696699/utilities/localedef.html
on the frontend sure.  i was thinking of its output format which is not
specified by POSIX but is an internal glibc ABI detail.  it even says:
        The localedef utility shall convert source definitions for locale
        categories into a format usable by the functions and utilities ...
i.e. it doesn't specify that output format.

back to the frontend, what POSIX specifically says is:
        In addition, the input may contain source for implementation-defined
        categories.

so glibc's localedef is free to support as many more or few categories as
it sees fit.  that includes outright rejecting unknown ones.

also, if we want to speak stricly about POSIX, it also says:
        -u  code_set_name
        Specify the name of a codeset used as the target mapping of character
        symbols and collating element symbols whose encoding values are defined
        in terms of the ISO/IEC 10646-1:2000 standard position constant values.

pretty sure that says we aren't even permitted to support a newer standard
there.  whether it matters in practice i'm not sure (haven't done a diff on
the diff versions/standards).
-mike

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH v2] localedata: LC_IDENTIFICATION.category: set to ISO 30112 2014 standard

Mike Frysinger
In reply to this post by Mike Frysinger
The ISO 30112 standard defines the valid values for the category
keyword as only a few options:
        posix:1993
        i18n:2004
        i18n:2012

The vast majority of locales had changed the "i18n" string to the
name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
date (presumably thinking it should be the date of submission).

Convert all of them to "i18n:2012" for consistency.  A follow up
change will update localedef to actually check/validate the field.

Compressed for size.  Sample change:
--- a/localedata/locales/kk_KZ
+++ b/localedata/locales/kk_KZ
@@ -35,19 +35,19 @@ language   "Kazakh"
 territory  "Kazakhstan"
 revision   "1.0"
 date       "2003-06-06"
-%
-category  "kk_KZ:2000";LC_IDENTIFICATION
-category  "kk_KZ:2000";LC_CTYPE
-category  "kk_KZ:2000";LC_COLLATE
-category  "kk_KZ:2000";LC_TIME
-category  "kk_KZ:2000";LC_NUMERIC
-category  "kk_KZ:2000";LC_MONETARY
-category  "kk_KZ:2000";LC_MESSAGES
-category  "kk_KZ:2000";LC_PAPER
-category  "kk_KZ:2000";LC_NAME
-category  "kk_KZ:2000";LC_ADDRESS
-category  "kk_KZ:2000";LC_TELEPHONE
-category  "kk_KZ:2000";LC_MEASUREMENT
+
+category "i18n:2012";LC_IDENTIFICATION
+category "i18n:2012";LC_CTYPE
+category "i18n:2012";LC_COLLATE
+category "i18n:2012";LC_TIME
+category "i18n:2012";LC_NUMERIC
+category "i18n:2012";LC_MONETARY
+category "i18n:2012";LC_MESSAGES
+category "i18n:2012";LC_PAPER
+category "i18n:2012";LC_NAME
+category "i18n:2012";LC_ADDRESS
+category "i18n:2012";LC_TELEPHONE
+category "i18n:2012";LC_MEASUREMENT
 END LC_IDENTIFICATION
 
 LC_COLLATE

0001-localedata-LC_IDENTIFICATION.category-set-to-ISO-301.patch.xz (28K) Download Attachment
signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH v2] localedef: check LC_IDENTIFICATION.category values

Mike Frysinger
In reply to this post by Mike Frysinger
Currently localedef accepts any value for the category keyword.  This has
allowed bad values to propagate to the vast majority of locales (~90%).
Add some logic to only accept a few standards.

2016-04-13  Mike Frysinger  <[hidden email]>

        * locale/programs/ld-identification.c (identification_finish): Check
        that the values in identification->category are only known.
---
v2:
        - tweak list of accepted standards

 locale/programs/ld-identification.c | 43 +++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c
index 1e8fa84..9234304 100644
--- a/locale/programs/ld-identification.c
+++ b/locale/programs/ld-identification.c
@@ -164,14 +164,43 @@ No definition for %s category found"), "LC_IDENTIFICATION"));
   TEST_ELEM (date);
 
   for (num = 0; num < __LC_LAST; ++num)
-    if (num != LC_ALL && identification->category[num] == NULL)
-      {
- if (verbose && ! nothing)
-  WITH_CUR_LOCALE (error (0, 0, _("\
+    {
+      /* We don't accept/parse this category, so skip it early.  */
+      if (num == LC_ALL)
+ continue;
+
+      if (identification->category[num] == NULL)
+ {
+  if (verbose && ! nothing)
+    WITH_CUR_LOCALE (error (0, 0, _("\
 %s: no identification for category `%s'"),
-  "LC_IDENTIFICATION", category_name[num]));
- identification->category[num] = "";
-      }
+    "LC_IDENTIFICATION", category_name[num]));
+  identification->category[num] = "";
+ }
+      else
+ {
+  /* Only list the standards we care about.  */
+  static const char * const standards[] =
+    {
+      "posix:1993",
+      "i18n:2004",
+      "i18n:2012",
+    };
+  size_t i;
+  bool matched = false;
+
+  for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i)
+    if (strcmp (identification->category[num], standards[i]) == 0)
+      matched = true;
+
+  if (matched != true)
+    WITH_CUR_LOCALE (error (0, 0, _("\
+%s: unknown standard `%s' for category `%s'"),
+    "LC_IDENTIFICATION",
+    identification->category[num],
+    category_name[num]));
+ }
+    }
 }
 
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2] localedef: check LC_IDENTIFICATION.category values

Carlos O'Donell-6
On 04/14/2016 05:21 PM, Mike Frysinger wrote:

> Currently localedef accepts any value for the category keyword.  This has
> allowed bad values to propagate to the vast majority of locales (~90%).
> Add some logic to only accept a few standards.
>
> 2016-04-13  Mike Frysinger  <[hidden email]>
>
> * locale/programs/ld-identification.c (identification_finish): Check
> that the values in identification->category are only known.
> ---
> v2:
> - tweak list of accepted standards

OK if you expand the comment "Only list the standards we care about." to
list the standards we reviewed when making this list, that way a future
developers can review.

--
Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2] localedata: LC_IDENTIFICATION.category: set to ISO 30112 2014 standard

Carlos O'Donell-6
In reply to this post by Mike Frysinger
On 04/14/2016 05:18 PM, Mike Frysinger wrote:

> The ISO 30112 standard defines the valid values for the category
> keyword as only a few options:
> posix:1993
> i18n:2004
> i18n:2012
>
> The vast majority of locales had changed the "i18n" string to the
> name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
> date (presumably thinking it should be the date of submission).
>
> Convert all of them to "i18n:2012" for consistency.  A follow up
> change will update localedef to actually check/validate the field.

This looks good to me.

--
Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2] localedef: check LC_IDENTIFICATION.category values

Mike Frysinger
In reply to this post by Carlos O'Donell-6
On 15 Apr 2016 00:44, Carlos O'Donell wrote:

> On 04/14/2016 05:21 PM, Mike Frysinger wrote:
> > Currently localedef accepts any value for the category keyword.  This has
> > allowed bad values to propagate to the vast majority of locales (~90%).
> > Add some logic to only accept a few standards.
> >
> > 2016-04-13  Mike Frysinger  <[hidden email]>
> >
> > * locale/programs/ld-identification.c (identification_finish): Check
> > that the values in identification->category are only known.
> > ---
> > v2:
> > - tweak list of accepted standards
>
> OK if you expand the comment "Only list the standards we care about." to
> list the standards we reviewed when making this list, that way a future
> developers can review.
ok, i've written:
          /* Only list the standards we care about.  This is based on the
             ISO 30112 WD10 [2014] standard which supersedes all previous
             revisions of the ISO 14652 standard.  */
-mike

signature.asc (836 bytes) Download Attachment