[RFC][PATCH] Multiple locales: Use the correct date and time formats (bug 10496, 23724).

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC][PATCH] Multiple locales: Use the correct date and time formats (bug 10496, 23724).

Rafal Luzynski
Hi,

I need an advice.  Recently I've been working on a change which
initially seemed to be a simple one-liner but eventually turned
out to be updating 80 locales.  The change fixes the 12-hour time
formats, adding the AM/PM indicator and changing the hour format.
The locales mostly include those using Arabic language, from India,
and from the region of Eritrea, Ethiopia and Somalia.

My question is: should I treat the CLDR database literally and
change the time formats like "%I:%M:%S" to "%l:%M:%S" whenever CLDR
provides the time format "h:mm:ss"?  I spotted this difference while
importing the time formats from CLDR in order to find those which
should use the AM/PM indicator.

The difference is that "%I" is a zero-padded hour number while "%l"
is a space-padded hour number;  in CLDR format "h" is an hour number
using as many digits as necessary (no additional padding mentioned).

My doubt is because the original complaint here was about missing the
AM/PM indicator, nobody complained about the clock using the zero-padded
hour number.

I am afraid that the change is minor and irrelevant and most people's
answer even from the involved countries is "I don't know/I don't care".
So if the glibc community does not provide any sustained objection I will
prepare and eventually commit this change.

As a sample here please find attached a patch for Albanian language.
I'd like to fix this locale as a separate patch because this also
fixes the date formats and it cannot be separated from the time formats.
If accepted I will commit this patch and move to other locales.

I am consulting this change with the people from Albania and India but
I don't know anyone speaking Arabic or living in Eritrea, Ethiopia
or Somalia.

Regards,

Rafal

0001-sq_AL-Use-the-correct-date-and-time-formats-bug-1049.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH] Multiple locales: Use the correct date and time formats (bug 10496, 23724).

TAMUKI Shoichi
Hello Rafal,

From: Rafal Luzynski <[hidden email]>
Subject: [RFC][PATCH] Multiple locales: Use the correct date and time formats (bug 10496, 23724).
Date: Fri, 2 Nov 2018 12:11:09 +0100 (CET)

> It also sets the correct date format because the old "%Y-%b-%d" produced
> rather weird results like "2018-Sht-28".
>
> (snip)
>
> diff --git a/localedata/locales/sq_AL b/localedata/locales/sq_AL
> index 9cec37a..a90251f 100644
> --- a/localedata/locales/sq_AL
> +++ b/localedata/locales/sq_AL
> @@ -303,16 +303,19 @@ mon         "janar";/
>  am_pm       "PD";"MD"
>  %
>  % Appropriate date and time representation
> -d_t_fmt     "%Y-%b-%d %I.%M.%S.%p %Z"
> +d_t_fmt     "%a %-d %b %Y %l:%M:%S.%p"

I am afraid that change will unable to keep it a constant width.

How about using "%a %_d %b %Y %l:%M:%S.%p" instead.

> +%
> +% Appropriate date representation for date(1)
> +date_fmt    "%a %-d %b %Y %l:%M:%S.%p %Z"

Likewise, how about using "%a %_d %b %Y %l:%M:%S.%p %Z" instead.

>  %
>  % Appropriate date representation
> -d_fmt       "%Y-%b-%d"
> +d_fmt       "%-d.%-m.%y"

Likewise, how about using "%_d.%_m.%y" instead.

Regards,
TAMUKI Shoichi
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH] Multiple locales: Use the correct date and time formats (bug 10496, 23724).

Rafal Luzynski
Hello Tamuki Shoichi and thank you for your feedback.

3.11.2018 05:06 TAMUKI Shoichi <[hidden email]> wrote:

>
>
> Hello Rafal,
>
> From: Rafal Luzynski <[hidden email]>
> Subject: [RFC][PATCH] Multiple locales: Use the correct date and time
> formats (bug 10496, 23724).
> Date: Fri, 2 Nov 2018 12:11:09 +0100 (CET)
>
> > [...]
> > @@ -303,16 +303,19 @@ mon         "janar";/
> >  am_pm       "PD";"MD"
> >  %
> >  % Appropriate date and time representation
> > -d_t_fmt     "%Y-%b-%d %I.%M.%S.%p %Z"
> > +d_t_fmt     "%a %-d %b %Y %l:%M:%S.%p"
>
> I am afraid that change will unable to keep it a constant width.

Is it required to keep the constant width?  In my native locale we use
"%-d" and I think it works fine.

> How about using "%a %_d %b %Y %l:%M:%S.%p" instead.

"_" means "use a space as padding".  If I had to use space padding
I would use "%e" instead which does the same and is less tricky.

> [...]
> >  %
> >  % Appropriate date representation
> > -d_fmt       "%Y-%b-%d"
> > +d_fmt       "%-d.%-m.%y"
>
> Likewise, how about using "%_d.%_m.%y" instead.

No, additional space before a month number looks definitely bad.
We use dots or other punctuation characters and zero padding to avoid
spaces in a string which normally should have no spaces.

Thank you again and best regards,

Rafal
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH] Multiple locales: Use the correct date and time formats (bug 10496, 23724).

TAMUKI Shoichi
Hello Rafal,

From: Rafal Luzynski <[hidden email]>
Date: Sat, 3 Nov 2018 20:45:56 +0100 (CET)
Subject: Re: [RFC][PATCH] Multiple locales: Use the correct date and time formats (bug 10496, 23724).

> > > [...]
> > > @@ -303,16 +303,19 @@ mon         "janar";/
> > >  am_pm       "PD";"MD"
> > >  %
> > >  % Appropriate date and time representation
> > > -d_t_fmt     "%Y-%b-%d %I.%M.%S.%p %Z"
> > > +d_t_fmt     "%a %-d %b %Y %l:%M:%S.%p"
> >
> > I am afraid that change will unable to keep it a constant width.
>
> Is it required to keep the constant width?  In my native locale we use
> "%-d" and I think it works fine.

Although I am no expert in sq_AL locale, if d_t_fmt which is fixed
width suddenly becomes variable width, I just thought that there might
be people in trouble.

I have roughly checked in the current locale data to see which locale
uses a variable width at d_t_fmt.  The result is:

ca_ES: "%A, %-d %B de %Y, %T %Z"
cs_CZ: "%a<U00A0>%-d.<U00A0>%B<U00A0>%Y,<U00A0>%H:%M:%S<U00A0>%Z"
hu_HU: "%Y. %b. %-e., %A, %H:%M:%S %Z"
it_CH: "%a %-d %b %Y, %T"
it_IT: "%a %-d %b %Y, %T"
nr_ZA: "%a %-e %b %Y %T %Z"
pl_PL: "%a, %-d %b %Y, %T"
ss_ZA: "%a %-e %b %Y %T %Z"
st_ZA: "%a %-e %b %Y %T %Z"
szl_PL: "%a, %-d %b %Y, %T"
tn_ZA: "%a %-e %b %Y %T %Z"
ts_ZA: "%a %-e %b %Y %T %Z"
xh_ZA: "%a %-e %b %Y %T %Z"

> > How about using "%a %_d %b %Y %l:%M:%S.%p" instead.
>
> "_" means "use a space as padding".  If I had to use space padding
> I would use "%e" instead which does the same and is less tricky.

Yes, indeed.  Since "%e" and "%-d" are equivalent, one byte can be
saved by using "%e".  Thank you for your advice.

> > [...]
> > >  %
> > >  % Appropriate date representation
> > > -d_fmt       "%Y-%b-%d"
> > > +d_fmt       "%-d.%-m.%y"
> >
> > Likewise, how about using "%_d.%_m.%y" instead.
>
> No, additional space before a month number looks definitely bad.
> We use dots or other punctuation characters and zero padding to avoid
> spaces in a string which normally should have no spaces.

For example, "LANG=C date" shows a space as padding to keep it a
constant width.

Anyway, if people involved in that locale are not particularly
troubled, I think that there is no problem without padding.

Regards,
TAMUKI Shoichi
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH] Multiple locales: Use the correct date and time formats (bug 10496, 23724).

Rafal Luzynski
4.11.2018 04:07 TAMUKI Shoichi <[hidden email]> wrote:
> [...]
> Although I am no expert in sq_AL locale,

Sure, neither am I.  That's why I'm trying to approach the actual native
speakers (off-list).

> if d_t_fmt which is fixed
> width suddenly becomes variable width, I just thought that there might
> be people in trouble.

I'm trying to understand the use case where people might be in trouble
if the date width varies.  It it maybe that someone may be trying to extract
pieces of the date (e.g., the current month or the current week day) from
the
output of date finding a substring at specified constant index?  If yes then
there are more reasons why this may not work, especially in non-English
locales,
and better ways to achieve the correct results.

> I have roughly checked in the current locale data to see which locale
> uses a variable width at d_t_fmt.  The result is:
>
> ca_ES: "%A, %-d %B de %Y, %T %Z"
> cs_CZ: "%a<U00A0>%-d.<U00A0>%B<U00A0>%Y,<U00A0>%H:%M:%S<U00A0>%Z"
> hu_HU: "%Y. %b. %-e., %A, %H:%M:%S %Z"
> it_CH: "%a %-d %b %Y, %T"
> it_IT: "%a %-d %b %Y, %T"
> nr_ZA: "%a %-e %b %Y %T %Z"
> pl_PL: "%a, %-d %b %Y, %T"
> ss_ZA: "%a %-e %b %Y %T %Z"
> st_ZA: "%a %-e %b %Y %T %Z"
> szl_PL: "%a, %-d %b %Y, %T"
> tn_ZA: "%a %-e %b %Y %T %Z"
> ts_ZA: "%a %-e %b %Y %T %Z"
> xh_ZA: "%a %-e %b %Y %T %Z"

Thank you for this survey!  Yes, this is an evidence that variable width
date formats may be acceptable.

> > > [...]
> > > >  %
> > > >  % Appropriate date representation
> > > > -d_fmt       "%Y-%b-%d"
> > > > +d_fmt       "%-d.%-m.%y"
> > >
> > > Likewise, how about using "%_d.%_m.%y" instead.
> >
> > No, additional space before a month number looks definitely bad.
> > We use dots or other punctuation characters and zero padding to avoid
> > spaces in a string which normally should have no spaces.
>
> For example, "LANG=C date" shows a space as padding to keep it a
> constant width.

This is a different case.  When you execute "date" with no arguments
it uses the date format which consists of numbers and words: day number,
abbreviated month name, abbreviated weekday name, etc.  It is correct
to separate them with spaces, and not so bad to add more spaces, when
needed.  In my case I was referring to a string which consists exclusively
of digits and punctuation characters (like 5.11.2018 or 11/5/18) which
should not contain spaces.  That's why zero-padding exists.  (e.g.,
05.11.18 or 11/05/18). " 5.11.2018" may not be so bad but "11/ 5/18"
is rather weird.

> Anyway, if people involved in that locale are not particularly
> troubled, I think that there is no problem without padding.

Yes, thank you for helping me formulate my question more precisely
and more clearly for native speaker.  I will ask them if they expect
the formatted dates to have a constant width and if making the width
variable will cause a trouble for them.

Regards,

Rafal
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH] Multiple locales: Use the correct date and time formats (bug 10496, 23724).

TAMUKI Shoichi
Hello Rafal,

From: Rafal Luzynski <[hidden email]>
Subject: Re: [RFC][PATCH] Multiple locales: Use the correct date and time formats (bug 10496, 23724).
Date: Mon, 5 Nov 2018 22:59:05 +0100 (CET)

> > if d_t_fmt which is fixed
> > width suddenly becomes variable width, I just thought that there might
> > be people in trouble.
>
> I'm trying to understand the use case where people might be in trouble
> if the date width varies.  It it maybe that someone may be trying to extract
> pieces of the date (e.g., the current month or the current week day) from the
> output of date finding a substring at specified constant index?  If yes then
> there are more reasons why this may not work, especially in non-English locales,
> and better ways to achieve the correct results.

For example, user-oriented software implemented to output logs using
%c, the preferred calendar time representation for the current locale
for the sake of clarity, will be inconvenient if the date and time
width varies.

> > I have roughly checked in the current locale data to see which locale
> > uses a variable width at d_t_fmt.  The result is:
> >
> > ca_ES: "%A, %-d %B de %Y, %T %Z"
> > cs_CZ: "%a<U00A0>%-d.<U00A0>%B<U00A0>%Y,<U00A0>%H:%M:%S<U00A0>%Z"
> > hu_HU: "%Y. %b. %-e., %A, %H:%M:%S %Z"
> > it_CH: "%a %-d %b %Y, %T"
> > it_IT: "%a %-d %b %Y, %T"
> > nr_ZA: "%a %-e %b %Y %T %Z"
> > pl_PL: "%a, %-d %b %Y, %T"
> > ss_ZA: "%a %-e %b %Y %T %Z"
> > st_ZA: "%a %-e %b %Y %T %Z"
> > szl_PL: "%a, %-d %b %Y, %T"
> > tn_ZA: "%a %-e %b %Y %T %Z"
> > ts_ZA: "%a %-e %b %Y %T %Z"
> > xh_ZA: "%a %-e %b %Y %T %Z"
>
> Thank you for this survey!  Yes, this is an evidence that variable width
> date formats may be acceptable.

These are only 13 locales out of the total 353 locales.  Please note
that there are fixed-width date and time formats for almost all other
locales (96.3%) except the above.

> > For example, "LANG=C date" shows a space as padding to keep it a
> > constant width.
>
> This is a different case.  When you execute "date" with no arguments
> it uses the date format which consists of numbers and words: day number,
> abbreviated month name, abbreviated weekday name, etc.  It is correct
> to separate them with spaces, and not so bad to add more spaces, when
> needed.  In my case I was referring to a string which consists exclusively
> of digits and punctuation characters (like 5.11.2018 or 11/5/18) which
> should not contain spaces.  That's why zero-padding exists.  (e.g.,
> 05.11.18 or 11/05/18). " 5.11.2018" may not be so bad but "11/ 5/18"
> is rather weird.

I agree.  If you do not have any problems using zero-padding (e.g.,
05.11.18 or 11/05/18), I think it is a good choice.

By the way,

        "9:00 a.m. - 5:00 p.m."

in English is translated as

        "9:00 e paradites - 5:00 e pasdites"

in Albanian.

So, should we include such a representation?

Regards,
TAMUKI Shoichi