Re: How to add crh.po, and tt@iqtel locale's .po for gettext

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: How to add crh.po, and tt@iqtel locale's .po for gettext

Bruno Haible
Hello,

Reshat Sabiq (Res,at) asked:
> I'm planning on doing Crieam Tatar IQTElif-based (a version of Latin
> alphabet for Tatar) Qazan Tatar localizations, and would appreciate
> any feedback on the process for adding crh.po and  [hidden email].

Assuming you want this for a glibc system (e.g. Linux), the first step is
to create a glibc locale for it. This is unfortunately also one of the
hardest steps, but it is the basis for the entire system. You find some
tips about it at
  http://www.student.uit.no/~pere/linux/glibc/howto.html
When your locale works fine, you are encouraged to submit it to the
glibc maintainers for inclusion.

You "compile" the locale using the localedef utility, and start using it
by setting the LANG or LC_ALL environment variable, as described in
gettext's ABOUT-NLS file.

Then you can already start creating localised PO files, "compile" them
using msgfmt, and install them in the locations (typically
/usr/share/locale/... or /usr/local/share/locale/...) where the programs
will expect it.

You should also contact the translation projects (the Free Translation
Project at http://www.iro.umontreal.ca/translation/, the KDE localization
project, the GNOME localization project) and register with them, so that
someone else with the same ideas as you will be aware of your work and
not make duplicated efforts.

Finally, a small note about "crh": According to
  http://www.alvestrand.no/pipermail/ietf-languages/2003-May/000969.html
  http://www.ethnologue.com/show_language.asp?code=crh
Crimean Turkish is written in Cyrillic script. If you want to provide a
locale that uses the Latin script for it, you should call it crh@latin;
if you want to provide a locale that uses the IQTELif script, you should
call it crh@iqtelif (not crh@iqtel - the glibc maintainers don't want
abbreviations here). Even if a "crh" locale with Cyrillic script does
not yet exist.

Bruno
Reply | Threaded
Open this post in threaded view
|

Re: How to add crh.po, and tt@iqtel locale's .po for gettext

"Reşat Sabiq (Reshat)"
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bruno Haible wrote:

> Hello,
>
> Reshat Sabiq (Res,at) asked:
>> I'm planning on doing Crieam Tatar IQTElif-based (a version of Latin
>> alphabet for Tatar) Qazan Tatar localizations, and would appreciate
>> any feedback on the process for adding crh.po and  [hidden email].
> Finally, a small note about "crh": According to
>   http://www.alvestrand.no/pipermail/ietf-languages/2003-May/000969.html
>   http://www.ethnologue.com/show_language.asp?code=crh
> Crimean Turkish is written in Cyrillic script. If you want to provide a
> locale that uses the Latin script for it, you should call it crh@latin;


I think these references are out-of-date, or incorrect. They also
appear to focus on Uzbekistan population of Crimean Tatars, whereas
they now have an official status in Crimea, Ukraine, and appear to
have largely migrated from the exile in Uzbekistan back to Crimea. I
found the following reference:
"139. Cyrillic alphabet is absolutely unacceptable for the Crimean
Tatar phonetic and grammar system. Kurultay of Crimean Tatar People
unanimously had decided in1991 to restore the Latin alphabet as the
most comfortable for the Crimean Tatar language. The Crimean Tatar
deputies group in Crimean parliament in 1997 had achieved the positive
voting on this question..."
http://www.minelres.lv/reports/ukraine/Article_5.htm
> if you want to provide a locale that uses the IQTELif script, you should
> call it crh@iqtelif (not crh@iqtel - the glibc maintainers don't want
> abbreviations here). Even if a "crh" locale with Cyrillic script does
> not yet exist.
Isn't there already sr_CS@ije locale? Since ije isn't fully spelled
out there, wouldn't that mean that @iqtel is also an acceptable
candidate? I don't have much against @iqtelif, except that it takes 2
more characters, and they appear to be safely skippable, as there is
enough clarity in 5 characters, and no risk of a future conflict. In
short, the motivation is to reduce typing a little.

Thanks.

- --
My public GPG key is at:
http://keyserver.veridis.com:11371/export?id=476802195259949354
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE/JcdBp3xEgSYgSoRAs/BAKCh7KZMXNtgmSfn3JQXzfu9zf013wCdGSiu
wJuGQJwgqvng1Tbu1n9jYns=
=qBlP
-----END PGP SIGNATURE-----

Reply | Threaded
Open this post in threaded view
|

Re: How to add crh.po, and tt@iqtel locale's .po for gettext

Danilo Segan
Hi Reshat,

Yesterday at 23:14, Reshat Sabiq wrote:

> Isn't there already sr_CS@ije locale? Since ije isn't fully spelled
> out there

No.  That one might be present in outside sources such as belocs
(Denis Barbier's collection of locales) simply because GNU libc
maintainer(s?) don't accept locales for dialects.  And it would still
have to be "jekavian" even if that was not the case (as I explained in
private email to you).

Btw, sr_CS@ije name came out of simple ignorance (mine, of course:
there's also the problem that there is no well established spelling
for the name of that Serbian dialect, though "Jekavian" seems to be
most common).

Cheers,
Danilo
Reply | Threaded
Open this post in threaded view
|

Re: How to add crh.po, and tt@iqtel locale's .po for gettext

Bruno Haible
In reply to this post by "Reşat Sabiq (Reshat)"
Hi Reşat,

Reshat Sabiq (Res,at) wrote:
> I found the following reference:
> "139. Cyrillic alphabet is absolutely unacceptable for the Crimean
> Tatar phonetic and grammar system. Kurultay of Crimean Tatar People
> unanimously had decided in1991 to restore the Latin alphabet as the
> most comfortable for the Crimean Tatar language. The Crimean Tatar
> deputies group in Crimean parliament in 1997 had achieved the positive
> voting on this question..."
> http://www.minelres.lv/reports/ukraine/Article_5.htm

Thanks, that makes it clear which alphabet to use for the crh_UA locale.

> I think these references are out-of-date, or incorrect. They also
> appear to focus on Uzbekistan population of Crimean Tatars, whereas
> they now have an official status in Crimea, Ukraine, and appear to
> have largely migrated from the exile in Uzbekistan back to Crimea.

"largely" or not, I can't tell. A lexicon tells me "they have started
migrating back (from Uzbekistan to the Crim) in 1989 but the majority still
is in Uzbekistan". And here are the figures from ethnologue.com (outdated
or not, I can't judge):

      Country      People speaking Crimean Turkish

      Ukraine      200000
      Uzbekistan   189000 (1993)
      Kyrgyzstan    38000
      Romania       21482
      Bulgaria       6000
      Moldova        1859

So this looks like nearly equal population sizes in the Ukraine and in
Uzbekistan.

I'd therefore recommend to use:
  - for the glibc locales: the names crh_UA and crh_UZ.
    Use Latin alphabet for crh_UA and Cyrillic one for crh_UZ.
  - for the PO files (translations): Use crh_UA and crh_UZ as well, and
    DON'T create PO files for 'crh'.

> In short, the motivation is to reduce typing a little.

Few users nowadays enter their locale by hand. It's typically chosen
from a menu at system installation time. Which means that the user types
nothing at all.

Bruno
Reply | Threaded
Open this post in threaded view
|

Re: How to add crh.po, and tt@iqtel locale's .po for gettext

"Reşat Sabiq (Reshat)"
In reply to this post by Danilo Segan
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Danilo Šegan yazmış:

> Hi Reshat,
>
> Yesterday at 23:14, Reshat Sabiq wrote:
>
>> Isn't there already sr_CS@ije locale? Since ije isn't fully spelled
>> out there
>
> No.  That one might be present in outside sources such as belocs
> (Denis Barbier's collection of locales) simply because GNU libc
> maintainer(s?) don't accept locales for dialects.
Wow. That's a new nuance. Is belocs as widely accepted as glibc? For
instance, do Fedora or SUSE users get sr_CS@ije or sr_CS@Latn from the
distro, or do they have to install it on their own because it's not in
glibc?
In short, is there a big motivation for a locale to be in glibc, as
opposed to just being in belocs? Also, am i right to conclude that if a
locale makes into glibc, it will be included in belocs?

> And it would still
> have to be "jekavian" even if that was not the case (as I explained in
> private email to you).
I can somewhat relate, though not fully, to this requirement, but it's a
little strange that @Latn, which appears to be widely used (registered
in fact, AFAIK), has to be changed to @latin for glibc. Looks like glibc
requires its own "namespace" in such cases. Does this not cause problems
when, for instance, a website or a document uses sr-Latn, but glibc has
an equivalent of sr-latin? I guess even if it doesn't cause any problems
for the user, it requires a different modifier to be used in glibc, in
comparison to other possible uses. I tend to think it would be better if
this wasn't the case.

Finally, again for the website usage, for instance, there is a slight
advantage of not requiring the spelling out of jekavian, and using an
abbreviation. So one has to think in cases like this whether to
accommodate full spelling requirement, or end up w/ 2 modifiers: one for
glibc, and possibly a different one elsewhere. Is there any talk or
chance of full-spelling requirement being dropped?

Thanks all.

- --
My public GPG key (ID 0x262839AF) is at: http://keyserver.veridis.com:11371
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)

iD8DBQFFA59pO75ytyYoOa8RAhP4AJ9JEPAEPm1gjbxjHkStw2OuRSrQuACeJrZO
xhsXKeMv/s61kEOO6lkbpiw=
=9UB1
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: How to add crh.po, and tt@iqtel locale's .po for gettext

Denis Barbier
On Sun, Sep 10, 2006 at 12:15:21AM -0500, Reshat Sabiq wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Danilo Šegan yazmış:
> > Hi Reshat,
> >
> > Yesterday at 23:14, Reshat Sabiq wrote:
> >
> >> Isn't there already sr_CS@ije locale? Since ije isn't fully spelled
> >> out there
> >
> > No.  That one might be present in outside sources such as belocs
> > (Denis Barbier's collection of locales) simply because GNU libc
> > maintainer(s?) don't accept locales for dialects.
> Wow. That's a new nuance. Is belocs as widely accepted as glibc?

No, I believe that Debian and its derivatives are the only one to have it,
and this is an extra package, locales come from glibc by default.

> For instance, do Fedora or SUSE users get sr_CS@ije or sr_CS@Latn from
> the distro, or do they have to install it on their own because it's
> not in glibc?

They have to install it by hand.

> In short, is there a big motivation for a locale to be in glibc, as
> opposed to just being in belocs?

Well, you are the one being motivated, so it is up to you ;)

> Also, am i right to conclude that if a locale makes into glibc, it
> will be included in belocs?

Yes.

Denis
Reply | Threaded
Open this post in threaded view
|

Re: How to add crh.po, and tt@iqtel locale's .po for gettext

"Reşat Sabiq (Reshat)"
In reply to this post by Bruno Haible
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bruno Haible yazmış:

> Thanks, that makes it clear which alphabet to use for the crh_UA locale.
>
>> I think these references are out-of-date, or incorrect. They also
>> appear to focus on Uzbekistan population of Crimean Tatars, whereas
>> they now have an official status in Crimea, Ukraine, and appear to
>> have largely migrated from the exile in Uzbekistan back to Crimea.
>
> "largely" or not, I can't tell. A lexicon tells me "they have started
> migrating back (from Uzbekistan to the Crim) in 1989 but the majority still
> is in Uzbekistan". And here are the figures from ethnologue.com (outdated
> or not, I can't judge):
>
>       Country      People speaking Crimean Turkish
>
>       Ukraine      200000
>       Uzbekistan   189000 (1993)
>       Kyrgyzstan    38000
>       Romania       21482
>       Bulgaria       6000
>       Moldova        1859
>
> So this looks like nearly equal population sizes in the Ukraine and in
> Uzbekistan.
There'd need to be over a million Crimean Tatars living in Turkey in
this list. Total population is about 4-to-6 million, from what i read,
but i'm not sure how it splits between Crimean and Idil-Ural Tatars.
>
> I'd therefore recommend to use:
>   - for the glibc locales: the names crh_UA and crh_UZ.
>     Use Latin alphabet for crh_UA and Cyrillic one for crh_UZ.
>   - for the PO files (translations): Use crh_UA and crh_UZ as well, and
>     DON'T create PO files for 'crh'.
I think crh by default should be Latin, because it's official, and in
all 3 countries where Crimean Tatars mostly now live, Turkey, Crimea,
and Uzbekistan, Latin alphabet is pre-dominant. 'Ozbekiston also
switched to Latin, so i'm not finding much ground behind Crimean Tatar
being based on Cyrillic in UZ. Qaraqalpaq language in 'Ozbekiston is
also now officially in Latin. I've never heard of UZ granting an
official status to Crimean Tatar, and i'm sure the community sticks w/
Crimea-based fellows in this regard.
I would think crh should be Latin by default, and if somebody desires in
the future, which i highly doubt, they could contribute crh@cyrillic, or
crh_UZ@cyrillic. That said, I'm not even sure how fast this will proceed
in one alphabet.

Thanks.

- --
My public GPG key (ID 0x262839AF) is at: http://keyserver.veridis.com:11371
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)

iD8DBQFFBOijO75ytyYoOa8RAmfDAKCZDO80sn6l43F/AIiBIRiO1LKKawCgkz0z
EsWoKDqW4bnanWvN/ror+Kw=
=7/4l
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: How to add crh.po, and tt@iqtel locale's .po for gettext

Bruno Haible
In reply to this post by "Reşat Sabiq (Reshat)"
Reshat Sabiq wrote:
> it's a
> little strange that @Latn, which appears to be widely used (registered
> in fact, AFAIK), has to be changed to @latin for glibc. Looks like glibc
> requires its own "namespace" in such cases. Does this not cause problems
> when, for instance, a website or a document uses sr-Latn, but glibc has
> an equivalent of sr-latin? I guess even if it doesn't cause any problems
> for the user, it requires a different modifier to be used in glibc, in
> comparison to other possible uses. I tend to think it would be better if
> this wasn't the case.

glibc indeed has its own, defined, naming conventions for locales.
For web sites, the relevant document are
  http://www.w3.org/TR/2005/WD-i18n-html-tech-lang-20050224/
  http://www.w3.org/TR/2006/WD-ltli-20060612/
Indeed while the basic language codes are the same, thanks to ISO-639,
some conversion is needed between the two naming conventions.

Yes it would be simpler if that distinction wouldn't exist; but the
glibc convention was invented before some people started using "Latn",
"Hans", "Hant" etc. We would have to stick with it even if it was not
a good convention. Actually spelling out the script names is a good
convention.

Bruno
Reply | Threaded
Open this post in threaded view
|

Re: How to add crh.po, and tt@iqtel locale's .po for gettext

Bruno Haible
In reply to this post by "Reşat Sabiq (Reshat)"
Reshat Sabiq wrote:
> There'd need to be over a million Crimean Tatars living in Turkey in
> this list. Total population is about 4-to-6 million, from what i read,
> but i'm not sure how it splits between Crimean and Idil-Ural Tatars.

Probably negligible compared to Ukraine and Uzbekistan:
http://www.ethnologue.com/show_country.asp?name=TRA
says "there are definitely some Crimean Tatar villages".

> > I'd therefore recommend to use:
> >   - for the glibc locales: the names crh_UA and crh_UZ.
> >     Use Latin alphabet for crh_UA and Cyrillic one for crh_UZ.
> >   - for the PO files (translations): Use crh_UA and crh_UZ as well, and
> >     DON'T create PO files for 'crh'.
>
> I think crh by default should be Latin, because it's official

You have shown an official decision by the Ukraine only. Regarding
Uzbekistan, you haven't any facts.

> and in
> all 3 countries where Crimean Tatars mostly now live, Turkey, Crimea,
> and Uzbekistan,

Forget about Turkey in this context. See above.

> Latin alphabet is pre-dominant. 'Ozbekiston also  
> switched to Latin

Where do you got this from? Usually the script will be same as the
one of the major language of the country, which in case of Uzbekistan
is Cyrillic; see http://www.ethnologue.com/show_language.asp?code=uzn

> I've never heard of UZ granting an
> official status to Crimean Tatar, and i'm sure the community sticks w/
> Crimea-based fellows in this regard.

You can not make decisions that affect a community that you don't belong
to, on your own, based on rumours and "I'm sure" statements. So, please
stick to the community that you know about, that is crh_UA. When
people from the crh_UZ community appear and get interested in localization
for glibc, you can discuss the issue with them. Until then, please
care about crh_UA only.

Bruno
Reply | Threaded
Open this post in threaded view
|

Re: How to add crh.po, and tt@iqtel locale's .po for gettext

"Reşat Sabiq (Reshat)"
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bruno Haible yazmış:
> Reshat Sabiq wrote:
>> There'd need to be over a million Crimean Tatars living in Turkey in
>> this list. Total population is about 4-to-6 million, from what i read,
>> but i'm not sure how it splits between Crimean and Idil-Ural Tatars.
>
> Probably negligible compared to Ukraine and Uzbekistan:
> http://www.ethnologue.com/show_country.asp?name=TRA
> says "there are definitely some Crimean Tatar villages".

Hi Bruno,

I don't want to get into a polemic, but the ethnologue references you
have mentioned are obviously incorrect: definitely incorrect about
Crimean Tatar and Uzbek being officially in Cyrillic, and at least
misleading or unclear in terms of Crimean Tatar population in Turkey.
wikipedia, for instance, mentions a number of 5 million of Crimean Tatar
(also called Crimean Turk) descendants in Turkey:
http://en.wikipedia.org/wiki/Crimean_Tatars
The reason ethnologue apparently is listing only 300K is because Crimean
Tatars living in Turkey are considered Turks. This makes sense, because
the people call themselves both ways, even those that live in Turkey:
Qırımtatar, Qırım Tatarı, Qırım Türkü. But they are still a part of the
rest of Crimean Tatars.
I don't have time to provide a convincing proof, which apparently needs
to be presented to correct such references, but here's one quote in Turkish:
"Anadolu'ya ve o dönemde tamamına yakını Türk hâkimiyeti allında bulunan
Balkanlar'a göçmeye başladılar, ilk göç dalga­sında 300.000 kişi göç
etmiş: hu sayı 19. yüzyılda l milyonu, arkasından da l milyon 200 hini
bulmuştur."
It says first wave was 300K immigrants to Anatolia, and Balkans: this
number reached 1 million, and then 1.2 million in the 19th century.
http://www.turkhaber.org/tatar.html
Obviously, Crimean Turks (Tatars) living in Turkey, would not assimilate
(what are they going to assimilate into?). And they have not been
exterminated or exiled, as Stalin did. So their population should be at
least in the million range now.
As a compromise, i think the number of 2.1 million might make sense:
http://www.joshuaproject.net/peopctry.php

I found this reference on a Turkish site, where there was some
discontent that Turkish population is being categorized into ehtnic
subgroups, so i don't think this American apparently NGO can be accused
of being biased in favor of Crimean Tatars (Crimean Turks).


>>> I'd therefore recommend to use:
>>>   - for the glibc locales: the names crh_UA and crh_UZ.
>>>     Use Latin alphabet for crh_UA and Cyrillic one for crh_UZ.
>>>   - for the PO files (translations): Use crh_UA and crh_UZ as well, and
>>>     DON'T create PO files for 'crh'.
>> I think crh by default should be Latin, because it's official
>
> You have shown an official decision by the Ukraine only. Regarding
> Uzbekistan, you haven't any facts.
>
>> and in
>> all 3 countries where Crimean Tatars mostly now live, Turkey, Crimea,
>> and Uzbekistan,
>
> Forget about Turkey in this context. See above.
>
>> Latin alphabet is pre-dominant. 'Ozbekiston also  
>> switched to Latin
>
> Where do you got this from? Usually the script will be same as the
> one of the major language of the country, which in case of Uzbekistan
> is Cyrillic; see http://www.ethnologue.com/show_language.asp?code=uzn
Well, the whole alphabet issue of Turkic nations and especially Tatars
in the entire post-Soviet geography is close to my heart (if my chest
was opened, the Tatar alphabet issue might be imprinted on it ;) ). I
have followed this for many years, and know that in O'zbekiston  (sorry
i misspelled it before) language has officially adopted Latin alphabet
many years ago. Of course Cyrillic is still in use (more and more
marginally so), because it takes time to fully transition. But there is
no argument about what alphabet is official: it's Latin. Here's
O'zbekiston Hokimiyati (Government) site:
http://www.gov.uz/uz/
and a TV channel:
http://aqkopruk.4t.com/canli/Yoshlar_tv.html
and another one:
http://mtrk.uz/uz/online/tv/

> When
> people from the crh_UZ community appear and get interested in localization
> for glibc, you can discuss the issue with them. Until then, please
> care about crh_UA only.
I think given the above, there is a preponderance of evidence that crh
can be started in Latin by default. I definitely have nothing against
crh_UZ@cyrillic, or even crh_UZ (Cyrillic implied, even though i think
it would not be accurate), but i think it's quite clear that default
alphabet for Crimean Tatar is Latin.
Of course updating ethnologue and such is something that needs to be
looked into in time.

P.S. If someone told an average Tatar 5 centuries ago that there'd be
disputes about Tatar alphabet today, i wonder what would be his
reaction. ;) (:

Sincerely,
Reshat.

- --
My public GPG key (ID 0x262839AF) is at: http://keyserver.veridis.com:11371
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)

iD8DBQFFBjT+O75ytyYoOa8RAqxQAJ9rbTN8O3+cT9MSCXRXFdm7R9LJLwCeLVOX
P9AXWbBIU9HwWhOkns0tjVA=
=oEdp
-----END PGP SIGNATURE-----