[PING^10][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[PING^10][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]

Marko Myllynen
Hi,

It seems that we have consensus so is there anything still left with this?

@Egor, do you think the commit message of the latest patch is ok or
should it be somehow amended by the recent discussions leading to
consensus or is it ok as-is?

Thanks,

--
Marko Myllynen
Reply | Threaded
Open this post in threaded view
|

Re: [PING^10][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]

Rafal Luzynski
9.07.2019 08:34 Marko Myllynen <[hidden email]> wrote:
>
> Hi,
>
> It seems that we have consensus so is there anything still left with this?

I think it's helpful to clarify which patches we are talking about.
I think that these TWO patches should be accepted and pushed:

1. [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]
   https://sourceware.org/ml/libc-alpha/2019-03/msg00378.html

It contains Cyrillic to plain ASCII transliteration according to
GOST 7.79-2000 System B standard for the C locales

2. [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
   https://sourceware.org/ml/libc-alpha/2018-11/msg00369.html

It contains Cyrillic to Latin extended transliteration according to
ISO 9 standard which is the same as GOST 7.79-2000 System A for almost
all locales, plus a fallback to plain ASCII which is as similar as possible
to GOST 7.70-2000 System A.  However, IMHO one more change is required:
the line converting <U0423><U0301> and <U0443><U0301> (Cyrillic U with
acute, using composition) should be removed, as it was removed in the
v10 of the patch.

Why don't I like v10?  Because it removes the fallback.  The fallback
is not perfect and does not comply with any standard but it has been
already stated that the transliteration does not have to be perfect.

Why these two patches?  Because the v12 contains only Cyrillic to plain
ASCII and only for the C locales while v9 contains Cyrillic to Latin
extended with an attempt to fallback to plain ASCII for many locales
but excluding C.

Additionally, I think we should mention this new feature in NEWS also
stating that this implementation is not perfect and will never be but
further works on the issue are expected in future versions.

> @Egor, do you think the commit message of the latest patch is ok or
> should it be somehow amended by the recent discussions leading to
> consensus or is it ok as-is?

Probably one or two more lines would be nice.

Regards,

Rafal
Reply | Threaded
Open this post in threaded view
|

Re: [PING^10][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]

Egor Kobylkin
Hi Rafal,


on my side we are definitely speaking about the V12 patch Cyrillic-ASCII transliteration. Because there was no further change proposed to it I see it as complete. In view of the impending 2.30 release it should probably be committed to the tree sooner than later.


@Rafal: would you like to go on and commit this V12 patch already?

To the V9 patch - my understanding is that we have agreed to handle it as a new feature because it is actually not fixing [BZ #2872] per se (and V12 does). I am not going to work on it for 2.30 and if you, Rafal, or someone else wants to take ownership and push it I'm more than happy to help.


Bests,
Egor

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, July 10, 2019 1:34 AM, Rafal Luzynski <[hidden email]> wrote:

> 9.07.2019 08:34 Marko Myllynen [hidden email] wrote:
>

> > Hi,
> > It seems that we have consensus so is there anything still left with this?
>

> I think it's helpful to clarify which patches we are talking about.
> I think that these TWO patches should be accepted and pushed:
>

> 1.  [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]
>     https://sourceware.org/ml/libc-alpha/2019-03/msg00378.html
>    

>     It contains Cyrillic to plain ASCII transliteration according to
>     GOST 7.79-2000 System B standard for the C locales
>    

> 2.  [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
>     https://sourceware.org/ml/libc-alpha/2018-11/msg00369.html
>    

>     It contains Cyrillic to Latin extended transliteration according to
>     ISO 9 standard which is the same as GOST 7.79-2000 System A for almost
>     all locales, plus a fallback to plain ASCII which is as similar as possible
>     to GOST 7.70-2000 System A. However, IMHO one more change is required:
>     the line converting <U0423><U0301> and <U0443><U0301> (Cyrillic U with
>    

>

> acute, using composition) should be removed, as it was removed in the
> v10 of the patch.
>

> Why don't I like v10? Because it removes the fallback. The fallback
> is not perfect and does not comply with any standard but it has been
> already stated that the transliteration does not have to be perfect.
>

> Why these two patches? Because the v12 contains only Cyrillic to plain
> ASCII and only for the C locales while v9 contains Cyrillic to Latin
> extended with an attempt to fallback to plain ASCII for many locales
> but excluding C.
>

> Additionally, I think we should mention this new feature in NEWS also
> stating that this implementation is not perfect and will never be but
> further works on the issue are expected in future versions.
>

> > @Egor, do you think the commit message of the latest patch is ok or
> > should it be somehow amended by the recent discussions leading to
> > consensus or is it ok as-is?
>

> Probably one or two more lines would be nice.
>

> Regards,
>

> Rafal


publickey - egor@kobylkin.com - 0x01FEB4E8.asc (898 bytes) Download Attachment
signature.asc (259 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PING^10][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]

Rafal Luzynski
13.07.2019 12:03 "Diego (Egor) Kobylkin" <[hidden email]> wrote:
> [...]
> @Rafal: would you like to go on and commit this V12 patch already?

Yes but also as my time is limited I'm OK if anybody else does
the commit.  That means, my "Yes" does not mean "please nobody
touch this".

While at this, I think that this change should be mentioned in NEWS.

> To the V9 patch - my understanding is that we have agreed to handle it as
> a new feature because it is actually not fixing [BZ #2872] per se (and V12
> does). I am not going to work on it for 2.30 and if you, Rafal, or someone
> else wants to take ownership and push it I'm more than happy to help.

OK, so IIUC your goal is to provide Cyrillic to plain ASCII transliteration
according to GOST 7.79 System B standard, when the locale is set to C
(or any derivative, like C.UTF-8).  You don't want ISO 9 a.k.a. GOST 7.79
System A (Cyrillic to Latin extended) with a possible fallback to plain
ASCII and in many other locales because you consider this as a separate
task which may be done later, not in this release cycle.  Is that
correct?

Regards,

Rafal
Reply | Threaded
Open this post in threaded view
|

Re: [PING^10][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]

Egor Kobylkin
On Tuesday, July 16, 2019 11:38 AM, Rafal Luzynski <[hidden email]> wrote:
> 13.07.2019 12:03 "Diego (Egor) Kobylkin" [hidden email] wrote:
>

> > To the V9 patch - my understanding is that we have agreed to handle it as
> > a new feature because it is actually not fixing [BZ #2872] per se (and V12
> > does). I am not going to work on it for 2.30 and if you, Rafal, or someone
> > else wants to take ownership and push it I'm more than happy to help.
>

> OK, so IIUC your goal is to provide Cyrillic to plain ASCII transliteration
> according to GOST 7.79 System B standard, when the locale is set to C
> (or any derivative, like C.UTF-8). You don't want ISO 9 a.k.a. GOST 7.79
> System A (Cyrillic to Latin extended) with a possible fallback to plain
> ASCII and in many other locales because you consider this as a separate
> task which may be done later, not in this release cycle. Is that
> correct?

Yes, this is correct.

Here is the original email with the explanation:

https://marc.info/?l=glibc-alpha&m=154430592326862&w=2
On 08.12.18 22:51, Egor Kobylkin wrote:> Rafal, Dmitry, Marko, Mike
>

> On 08.12.18 00:35, Rafal Luzynski wrote:
>> 19.11.2018 12:10 Egor Kobylkin <[hidden email]> wrote:
>>>
>>> Changelog v10: * Removed ISO 9.1995 GOST 7.79-2000 System A
>>> (transliteration to Latin with diacritics) as conflicting with
>>> System B within glibc mechanics and not solving BZ #2872
>>
>> I'm in favor of implementing System A and dropping System B instead.
>

> The BZ #2872 bug name is explicitly "Transliteration Cyrillic -> ASCII
> fails". The ISO 9 System A does not map to ASCII so it is not a solution
> to BZ #2872 at all.
>

> I was scratching my head as to how can we avoid the explosion of the
> scope for this patch. And then it appeared to me that it was wrong to
> target all the present locales for the ASCII translit. This seems to be
> the root cause for this prolonged A vs. B discussions. The proper target
> for my table is actually the C locale translit file
> (locale/C-translit.h.in). I will submit a proper patch shortly.
>

> If anyone wants to keep working on the implementation of the Latin
> Diacritics transliteration of the Cyrillic letters (System A) you are
> welcome to use the tables I have submitted before (v9). That would be a
> new feature for glibc as per my understanding. Let's just make super
> clear the distinction of the System A (Latin with Diacritics, non-ASCII)
> to the ASCII translit as mentioned in BZ #2872 (System B).
>

> My focus is super sharp on helping with Cyrillic -> ASCII translit
> availability for a default installation with glibc.
>

> Hope this helps,
> Egor
>


Bests,
Egor

publickey - egor@kobylkin.com - 0x01FEB4E8.asc (898 bytes) Download Attachment
signature.asc (259 bytes) Download Attachment