Difference between revisions of "Iranian languages"
Firespeaker (talk | contribs) |
|||
(One intermediate revision by one other user not shown) | |||
Line 100: | Line 100: | ||
|- |
|- |
||
|| <code>[[apertium-pes]]</code> |
|| <code>[[apertium-pes]]</code> |
||
|| [[Iranian Persian]] |
|| [[Iranian Persian]] (Farsi) |
||
|| <code>fa</code> |
|| <code>fa</code> |
||
|| <code>pes</code> |
|| <code>pes</code> |
||
Line 202: | Line 202: | ||
|align="right"| د بشر ټول افراد آزاد نړۍ ته راځی او د حيثيت او حقوقو له پلوه سره برابر دی. ټول د عقل او وجدان خاوندان دی او يو له بل سره د ورورۍ په روحيې سره بايد چلند کړی. |
|align="right"| د بشر ټول افراد آزاد نړۍ ته راځی او د حيثيت او حقوقو له پلوه سره برابر دی. ټول د عقل او وجدان خاوندان دی او يو له بل سره د ورورۍ په روحيې سره بايد چلند کړی. |
||
|- |
|- |
||
|| Kurdish, |
|| Kurdish, Northern || Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin. |
||
|- |
|- |
||
|| Tajik || Тамоми одамон озод ва аз лиҳози шарафу ҳуқуқ ба ҳам баробар ба дунё меоянд. Онҳо соҳиби ақлу виҷдонанд ва бояд бо якдигар муносибати бародарона дошта бошанд. |
|| Tajik || Тамоми одамон озод ва аз лиҳози шарафу ҳуқуқ ба ҳам баробар ба дунё меоянд. Онҳо соҳиби ақлу виҷдонанд ва бояд бо якдигар муносибати бародарона дошта бошанд. |
Latest revision as of 11:36, 30 July 2018
The Iranian languages include Iranian Persian, Dari, Tajik (three varieties of Modern Persian), Pashto, Balochi, Kurdish, Ossetian, Tat, and several dozen other languages.
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status[edit]
The ultimate goal is to have multi-purposable transducers for a variety of Iranian languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers[edit]
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
name | Language | ISO 639 | formalism | state | stems | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|
-2 | -3 | ||||||||
apertium-pes
|
Iranian Persian (Farsi) | —
|
pes
|
lttoolbox | development | apertium-pes (incubator) | |||
apertium-tgk
|
Tajik | tg
|
tgk
|
lttoolbox | development | ||||
apertium-glk
|
Gilaki | —
|
glk
|
lttoolbox | development | apertium-glk (incubator) | ronl, Fran | ||
apertium-oss
|
Ossetian | os
|
oss
|
lttoolbox | development | apertium-oss (incubator) |
name | Language | ISO 639 | formalism | state | stems | paradigms | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | |||||||||
apertium-kmr
|
Kurdish (Kurmanji) | ku
|
kmr
|
lttoolbox | development | 17,771 | 157 | [[Apertium-kmr#Current_State|~Apertium-kmr/stats/average%]] | apertium-kmr (languages) | Fran, Memduh |
apertium-pes
|
Iranian Persian (Farsi) | fa
|
pes
|
lttoolbox | development | 13,167 | 113 | [[Apertium-pes#Current_State|~Apertium-pes/stats/average%]] | apertium-pes (languages) | Fran, ... |
apertium-tgk
|
Tajik | tg
|
tgk
|
lttoolbox | development | 2,784 | 79 | [[Apertium-tgk#Current_State|~Apertium-tgk/stats/average%]] | apertium-tgk (languages) | Fran, ... |
apertium-oss
|
Ossetian | os
|
oss
|
lttoolbox | prototype | 111 | ~17% | apertium-oss (languages) | Fran, ... | |
apertium-glk
|
Gilaki |
|
glk
|
lttoolbox | prototype | 4 | 28 | [[Apertium-oss#Current_State|~Apertium-glk/stats/average%]] | apertium-glk (languages) | Fran, ronl |
apertium-ckb
|
Central Kurdish (Sorani) |
|
ckb
|
lttoolbox | prototype | 2 | [[Apertium-kmr#Current_State|~Apertium-ckb/stats/average%]] | apertium-ckb (languages) | Fran, Memduh |
Table of Existing Pairs[edit]
Text in italics denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in bold denotes a stable well-working language pair in trunk and text in bold and italics denotes a pair in staging. Bidix stems as counted with dixcounter are displayed below.
pes | tgk | glk | oss | fas | |
---|---|---|---|---|---|
pes | - | tgk-pes 504 |
pes-glk 8 |
||
tgk | tgk-pes 504 |
- | |||
glk | pes-glk 8 |
- | |||
oss | - | ||||
fas | - | ||||
eng | tg-en ? |
||||
epo | eo-fa ? | ||||
urd | ur-fa ? |
Language Codes[edit]
Note that fas
(/per
) and fa
are macrocodes for Persian, which includes Farsi (Iranian Persian - pes
), Dari (Afghan Persian - prs
), and Tajik (tgk
).
Samples[edit]
Article 1 of the Universal Declaration of Human Rights:
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
Language | Text |
---|---|
Ossetian | Адӕймӕгтӕ се¢ ппӕт дӕр райгуырынц сӕрибарӕй ӕмӕ ӕмхуызонӕй сӕ барты. Уыдон ӕххӕст сты зонд ӕмӕ намысӕй, ӕмӕ кӕрӕдзийӕн хъуамӕ уой ӕфсымӕрты хуызӕн. |
Pashto, Northern | د بشر ټول افراد آزاد نړۍ ته راځی او د حيثيت او حقوقو له پلوه سره برابر دی. ټول د عقل او وجدان خاوندان دی او يو له بل سره د ورورۍ په روحيې سره بايد چلند کړی. |
Kurdish, Northern | Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin. |
Tajik | Тамоми одамон озод ва аз лиҳози шарафу ҳуқуқ ба ҳам баробар ба дунё меоянд. Онҳо соҳиби ақлу виҷдонанд ва бояд бо якдигар муносибати бародарона дошта бошанд. |
Iranian Persian | تمام افراد بشر آزاد بدنیا میایند و از لحاظ حیثیت و حقوق با هم برابرند. همه دارای عقل و وجدان میباشند و باید نسبت بیکدیگر با روح برادری رفتار کنند. |
Dari | تمام افراد بشر آزاد به دنیا میآیند و از لحاظ حیثیت و حقوق با هم برابرند. همه دارای عقل و وجدان هستند و باید نسبت به یکدیگر با روح برادری رفتار کنند. |
Vulnerability[edit]
This table summarizes the vulnerability of various Iranian languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.
Language | ISO639-3 | Location | Speakers | Status | |
---|---|---|---|---|---|
Ethnologue | UNESCO | ||||
Avestan | ave
|
Iran | 0 | 10 (Extinct) | - |
Pahlavani | phv
|
Afghanistan | 0 | 9 (Dormant) | - |
Koroshi | ktl
|
Iran | 180 | 8b (Nearly extinct) | 4 (Critically endangered) |
Kumzari | zum
|
Oman | 2,300 | 8a (Moribund) | 3 (Severely endangered) |
Parachi | prc
|
Afghanistan | 3,500 | 7 (Shifting) | 2 (Definitely endangered) |
Bashkardi | bsg
|
Iran | 7,030 | 7 (Shifting) | 2 (Definitely endangered) |
Gazi | gzi
|
Iran | 7,030 | 7 (Shifting) | 2 (Definitely endangered) |
Sivandi | siy
|
Iran | 7,030 | 7 (Shifting) | - |
Fars, Northwestern | faz
|
Iran | 7,500 | 7 (Shifting) | - |
Dari, Zoroastrian | gbz
|
Iran | 8,000 | 7 (Shifting) | 2 (Definitely endangered) |
Shabak | sdb
|
Iraq | 15,000 | 7 (Shifting) | - |
Karingani | kgn
|
Iran | 17,600 | 7 (Shifting) | - |
Vafsi | vaf
|
Iran | 18,000 | 7 (Shifting) | 2 (Definitely endangered) |
Bajelani | bjm
|
Iraq | 20,000 | 7 (Shifting) | - |
Ashtiani | atn
|
Iran | 21,100 | 7 (Shifting) | 2 (Definitely endangered) |
Khunsari | kfm
|
Iran | 21,100 | 7 (Shifting) | 2 (Definitely endangered) |
Tat, Muslim | ttt
|
Azerbaijan | 28,010 | 7 (Shifting) | 3 (Severely endangered) |
Harzani | hrz
|
Iran | 28,100 | 7 (Shifting) | - |
Dzhidi | jpr
|
Israel & Iran | 60,000 | 7 (Shifting) | 2 (Definitely endangered) |
Fars, Southwestern | fay
|
Iran | 100,000 | 7 (Shifting) | - |
Bukharic | bhh
|
Israel & Uzbekistan | 110,000 | 7 (Shifting) | 2 (Definitely endangered) |
Gurani | hac
|
Iraq | 200,000 | 7 (Shifting) | - |
Takestani | tks
|
Iran | 220,000 | 7 (Shifting) | - |
Mazanderani | mzn
|
Iran | 3,270,000 | 7 (Shifting) | - |
Alviri-Vidari | avd
|
Iran | ? | 7 (Shifting) | - |
Eshtehardi | esh
|
Iran | ? | 7 (Shifting) | - |
Gozarkhani | goz
|
Iran | ? | 7 (Shifting) | - |
Kabatei | xkp
|
Iran | ? | 7 (Shifting) | - |
Kajali | xkj
|
Iran | ? | 7 (Shifting) | - |
Kho’ini | xkc
|
Iran | ? | 7 (Shifting) | - |
Koresh-e Rostam | okh
|
Iran | ? | 7 (Shifting) | - |
Maraghei | vmh
|
Iran | ? | 7 (Shifting) | - |
Razajerdi | rat
|
Iran | ? | 7 (Shifting) | - |
Rudbari | rdb
|
Iran | ? | 7 (Shifting) | - |
Shahrudi | shm
|
Iran | ? | 7 (Shifting) | - |
Taromi, Upper | tov
|
Iran | ? | 7 (Shifting) | - |
Sarli | sdf
|
Iraq | Fewer than 20,000. | 7 (Shifting) | - |
Ishkashimi | isk
|
Afghanistan | 3,000 | 6b (Threatened) | - |
Yidgha | ydg
|
Pakistan | 6,150 | 6b (Threatened) | 2 (Definitely endangered) |
Yazgulyam | yah
|
Tajikistan | 9,000 | 6b (Threatened) | 3 (Severely endangered) |
Yagnobi | yai
|
Tajikistan | 12,000 | 6b (Threatened) | 2 (Definitely endangered) |
Sarikoli | srh
|
China | 16,000 | 6b (Threatened) | 2 (Definitely endangered) |
Lasgerdi | lsa
|
Iran | 1,000 | 6a (Vigorous) | - |
Sanglechi | sgy
|
Afghanistan | 2,200 | 6a (Vigorous) | - |
Munji | mnj
|
Afghanistan | 5,300 | 6a (Vigorous) | 3 (Severely endangered) |
Ormuri | oru
|
Pakistan, Afghanistan | 6,050 | 6a (Vigorous) | 2 (Definitely endangered) |
Natanzi | ntz
|
Iran | 7,030 | 6a (Vigorous) | 3 (Severely endangered) |
Nayini | nyq
|
Iran | 7,030 | 6a (Vigorous) | 3 (Severely endangered) |
Soi | soj
|
Iran | 7,030 | 6a (Vigorous) | - |
Sorkhei | sqo
|
Iran | 10,000 | 6a (Vigorous) | - |
Dehwari | deh
|
Pakistan | 13,000 | 6a (Vigorous) | - |
Sangisari | sgr
|
Iran | 36,000 | 6a (Vigorous) | - |
Wakhi | wbl
|
China, Pakistan, Tajikistan, Afghanistan | 47,100 | 6a (Vigorous) | 2 (Definitely endangered) |
Semnani | smy
|
Iran | 60,000 | 6a (Vigorous) | - |
Shughni | sgh
|
Tajikistan | 80,000 | 6a (Vigorous) | 3 (Severely endangered) |
Lari | lrl
|
Iran | 80,000, | 6a (Vigorous) | - |
Waneci | wne
|
Pakistan | 95,000 | 6a (Vigorous) | - |
Parsi | prp
|
India | 326,000 | 6a (Vigorous) | - |
Parsi-Dari | prd
|
Iran | 350,000 | 6a (Vigorous) | - |
Aimaq | aiq
|
Afghanistan | 650,000 | 6a (Vigorous) | - |
Luri, Southern | luz
|
Iran | 875,000 | 6a (Vigorous) | - |
Laki | lki
|
Iran | 1,000,000 | 6a (Vigorous) | - |
Bakhtiâri | bqi
|
Iran | 1,000,000 | 6a (Vigorous) | - |
Luri, Northern | lrc
|
Iran | 1,500,000 | 6a (Vigorous) | - |
Kurdish, Southern | sdh
|
Iran | 3,000,000 | 6a (Vigorous) | - |
Pashto, Central | pst
|
Pakistan | 7,920,000 | 6a (Vigorous) | - |
Shahmirzadi | srz
|
Iran | ? | 6a (Vigorous) | - |
Dezfuli | def
|
Iran | ? | 6a (Vigorous) | - |
Khalaj | kjf
|
Azerbaijan | 42,100 | 5 (Developing) | - |
Ossetic | oss
|
Georgia, Russian Federation | 577,450 | 5 (Developing) | 1 (Vulnerable) |
Hazaragi | haz
|
Afghanistan | 2,210,000 | 5 (Developing) | - |
Judeo-Tat | jdt
|
Azerbaijan, Russian Federation | 2,010 | 4 (Educational) | 2 (Definitely endangered) |
Zazaki, Northern | kiu
|
Turkey | 140,000 | 4 (Educational) | - |
Talysh | tly
|
Azerbaijan, Iran | 915,400 | 4 (Educational) | 1 (Vulnerable) |
Zazaki, Southern | diq
|
Turkey | 1,500,000 | 4 (Educational) | - |
Balochi, Western | bgn
|
Pakistan | 1,799,840 | 4 (Educational) | - |
Balochi, Eastern | bgp
|
Pakistan | 1,800,800 | 4 (Educational) | - |
Gilaki | glk
|
Iran | 3,270,000 | 4 (Educational) | - |
Balochi, Southern | bcc
|
Pakistan | 3,405,000 | 4 (Educational) | - |
Pashto, Northern | pbu
|
Pakistan | 11,430,000 | 4 (Educational) | - |
Kurdish, Northern | kmr
|
Turkey | 20,210,872 | 3 (Wider communication) | - |
Kurdish, Central | ckb
|
Iraq | 6,750,000 | 2 (Provincial) | - |
Tajiki | tgk
|
Tajikistan | 4,479,650 | 1 (National) | - |
Pashto, Southern | pbt
|
Afghanistan | 7,590,100 | 1 (National) | - |
Dari | prs
|
Afghanistan | 9,600,000 | 1 (National) | - |
Persian, Iranian | pes
|
Iran | 47,045,100 | 1 (National) | - |