Iranian languages
The Iranian languages include Farsi, Dari, Tajik (three varieties of Modern Persian), Pashto, Balochi, Kurdish, Ossetian, Tat, and several dozen other languages.
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status
The ultimate goal is to have multi-purposable transducers for a variety of Iranian languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
name | Language | ISO 639 | formalism | state | stems | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|
-2 | -3 | ||||||||
apertium-pes
|
Iranian Persian (Farsi) | —
|
pes
|
lttoolbox | development | apertium-pes (incubator) | |||
apertium-tgk
|
Tajik | tg
|
tgk
|
lttoolbox | development | ||||
apertium-glk
|
Gilaki | —
|
glk
|
lttoolbox | development | apertium-glk (incubator) | ronl, Fran | ||
apertium-oss
|
Ossetian | os
|
oss
|
lttoolbox | development | apertium-oss (incubator) |
Table of Existing Pairs
Text in italics denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in bold denotes a stable well-working language pair in trunk and text in bold and italics denotes a pair in staging. Bidix stems as counted with dixcounter are displayed below.
pes | tgk | glk | oss | fas | |
---|---|---|---|---|---|
pes | - | pes-glk 8 |
|||
tgk | - | tg-fa 502 | |||
glk | pes-glk 8 |
- | |||
oss | - | ||||
fas | tg-fa 502 |
- | |||
epo | eo-fa ? | ||||
urd | ur-fa ? |
Samples
Article 1 of the Universal Declaration of Human Rights:
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
Language | Text |
---|---|
Osetin | Адӕймӕгтӕ се¢ ппӕт дӕр райгуырынц сӕрибарӕй ӕмӕ ӕмхуызонӕй сӕ барты. Уыдон ӕххӕст сты зонд ӕмӕ намысӕй, ӕмӕ кӕрӕдзийӕн хъуамӕ уой ӕфсымӕрты хуызӕн. |
Pashto, Northern | د بشر ټول افراد آزاد نړۍ ته راځی او د حيثيت او حقوقو له پلوه سره برابر دی. ټول د عقل او وجدان خاوندان دی او يو له بل سره د ورورۍ په روحيې سره بايد چلند کړی. |
Kurdish, Central | Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin. |
Samoan | O tagata soifua uma ua saoloto lo latou fananau mai, ma e tutusa o latou tulaga aloaia faapea a latou aia tatau. Ua faaeeina atu i a latou le mafaufau lelei ma le loto fuatiaifo ma e tatau ona faatino le agaga faauso i le va o le tasi i le isi, |
Tongan (Tonga) | Ko e kotoa ‘o ha’a tangata ‘oku fanau’i mai ‘oku tau’ataina pea tatau ‘i he ngeia mo e ngaahi totonu. Na’e fakanaunau’i kinautolu ‘aki ‘a e ‘atamai mo e konisenisi pea ‘oku totonu ke nau feohi ‘i he laumalie ‘o e nofo fakatautehina. |
Tajiki | Тамоми одамон озод ва аз лиҳози шарафу ҳуқуқ ба ҳам баробар ба дунё меоянд. Онҳо соҳиби ақлу виҷдонанд ва бояд бо якдигар муносибати бародарона дошта бошанд. |
Vulnerability
This table summarizes the vulnerability of various Iranian languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.
Language | ISO639-3 | Location | Speakers | Status | |
---|---|---|---|---|---|
Ethnologue | UNESCO | ||||
Avestan | ave
|
Iran | 0 | 10 (Extinct) | - |
Pahlavani | phv
|
Afghanistan | 0 | 9 (Dormant) | - |
Koroshi | ktl
|
Iran | 180 | 8b (Nearly extinct) | 4 (Critically endangered) |
Kumzari | zum
|
Oman | 2,300 | 8a (Moribund) | 3 (Severely endangered) |
Parachi | prc
|
Afghanistan | 3,500 | 7 (Shifting) | 2 (Definitely endangered) |
Bashkardi | bsg
|
Iran | 7,030 | 7 (Shifting) | 2 (Definitely endangered) |
Gazi | gzi
|
Iran | 7,030 | 7 (Shifting) | 2 (Definitely endangered) |
Sivandi | siy
|
Iran | 7,030 | 7 (Shifting) | - |
Fars, Northwestern | faz
|
Iran | 7,500 | 7 (Shifting) | - |
Dari, Zoroastrian | gbz
|
Iran | 8,000 | 7 (Shifting) | 2 (Definitely endangered) |
Shabak | sdb
|
Iraq | 15,000 | 7 (Shifting) | - |
Karingani | kgn
|
Iran | 17,600 | 7 (Shifting) | - |
Vafsi | vaf
|
Iran | 18,000 | 7 (Shifting) | 2 (Definitely endangered) |
Bajelani | bjm
|
Iraq | 20,000 | 7 (Shifting) | - |
Ashtiani | atn
|
Iran | 21,100 | 7 (Shifting) | 2 (Definitely endangered) |
Khunsari | kfm
|
Iran | 21,100 | 7 (Shifting) | 2 (Definitely endangered) |
Tat, Muslim | ttt
|
Azerbaijan | 28,010 | 7 (Shifting) | 3 (Severely endangered) |
Harzani | hrz
|
Iran | 28,100 | 7 (Shifting) | - |
Dzhidi | jpr
|
Israel & Iran | 60,000 | 7 (Shifting) | 2 (Definitely endangered) |
Fars, Southwestern | fay
|
Iran | 100,000 | 7 (Shifting) | - |
Bukharic | bhh
|
Israel & Uzbekistan | 110,000 | 7 (Shifting) | 2 (Definitely endangered) |
Gurani | hac
|
Iraq | 200,000 | 7 (Shifting) | - |
Takestani | tks
|
Iran | 220,000 | 7 (Shifting) | - |
Mazanderani | mzn
|
Iran | 3,270,000 | 7 (Shifting) | - |
Alviri-Vidari | avd
|
Iran | ? | 7 (Shifting) | - |
Eshtehardi | esh
|
Iran | ? | 7 (Shifting) | - |
Gozarkhani | goz
|
Iran | ? | 7 (Shifting) | - |
Kabatei | xkp
|
Iran | ? | 7 (Shifting) | - |
Kajali | xkj
|
Iran | ? | 7 (Shifting) | - |
Kho’ini | xkc
|
Iran | ? | 7 (Shifting) | - |
Koresh-e Rostam | okh
|
Iran | ? | 7 (Shifting) | - |
Maraghei | vmh
|
Iran | ? | 7 (Shifting) | - |
Razajerdi | rat
|
Iran | ? | 7 (Shifting) | - |
Rudbari | rdb
|
Iran | ? | 7 (Shifting) | - |
Shahrudi | shm
|
Iran | ? | 7 (Shifting) | - |
Taromi, Upper | tov
|
Iran | ? | 7 (Shifting) | - |
Sarli | sdf
|
Iraq | Fewer than 20,000. | 7 (Shifting) | - |
Ishkashimi | isk
|
Afghanistan | 3,000 | 6b (Threatened) | - |
Yidgha | ydg
|
Pakistan | 6,150 | 6b (Threatened) | 2 (Definitely endangered) |
Yazgulyam | yah
|
Tajikistan | 9,000 | 6b (Threatened) | 3 (Severely endangered) |
Yagnobi | yai
|
Tajikistan | 12,000 | 6b (Threatened) | 2 (Definitely endangered) |
Sarikoli | srh
|
China | 16,000 | 6b (Threatened) | 2 (Definitely endangered) |
Lasgerdi | lsa
|
Iran | 1,000 | 6a (Vigorous) | - |
Sanglechi | sgy
|
Afghanistan | 2,200 | 6a (Vigorous) | - |
Munji | mnj
|
Afghanistan | 5,300 | 6a (Vigorous) | 3 (Severely endangered) |
Ormuri | oru
|
Pakistan, Afghanistan | 6,050 | 6a (Vigorous) | 2 (Definitely endangered) |
Natanzi | ntz
|
Iran | 7,030 | 6a (Vigorous) | 3 (Severely endangered) |
Nayini | nyq
|
Iran | 7,030 | 6a (Vigorous) | 3 (Severely endangered) |
Soi | soj
|
Iran | 7,030 | 6a (Vigorous) | - |
Sorkhei | sqo
|
Iran | 10,000 | 6a (Vigorous) | - |
Dehwari | deh
|
Pakistan | 13,000 | 6a (Vigorous) | - |
Sangisari | sgr
|
Iran | 36,000 | 6a (Vigorous) | - |
Wakhi | wbl
|
China, Pakistan, Tajikistan, Afghanistan | 47,100 | 6a (Vigorous) | 2 (Definitely endangered) |
Semnani | smy
|
Iran | 60,000 | 6a (Vigorous) | - |
Shughni | sgh
|
Tajikistan | 80,000 | 6a (Vigorous) | 3 (Severely endangered) |
Lari | lrl
|
Iran | 80,000, | 6a (Vigorous) | - |
Waneci | wne
|
Pakistan | 95,000 | 6a (Vigorous) | - |
Parsi | prp
|
India | 326,000 | 6a (Vigorous) | - |
Parsi-Dari | prd
|
Iran | 350,000 | 6a (Vigorous) | - |
Aimaq | aiq
|
Afghanistan | 650,000 | 6a (Vigorous) | - |
Luri, Southern | luz
|
Iran | 875,000 | 6a (Vigorous) | - |
Laki | lki
|
Iran | 1,000,000 | 6a (Vigorous) | - |
Bakhtiâri | bqi
|
Iran | 1,000,000 | 6a (Vigorous) | - |
Luri, Northern | lrc
|
Iran | 1,500,000 | 6a (Vigorous) | - |
Kurdish, Southern | sdh
|
Iran | 3,000,000 | 6a (Vigorous) | - |
Pashto, Central | pst
|
Pakistan | 7,920,000 | 6a (Vigorous) | - |
Shahmirzadi | srz
|
Iran | ? | 6a (Vigorous) | - |
Dezfuli | def
|
Iran | ? | 6a (Vigorous) | - |
Khalaj | kjf
|
Azerbaijan | 42,100 | 5 (Developing) | - |
Ossetic | oss
|
Georgia, Russian Federation | 577,450 | 5 (Developing) | 1 (Vulnerable) |
Hazaragi | haz
|
Afghanistan | 2,210,000 | 5 (Developing) | - |
Judeo-Tat | jdt
|
Azerbaijan, Russian Federation | 2,010 | 4 (Educational) | 2 (Definitely endangered) |
Zazaki, Northern | kiu
|
Turkey | 140,000 | 4 (Educational) | - |
Talysh | tly
|
Azerbaijan, Iran | 915,400 | 4 (Educational) | 1 (Vulnerable) |
Zazaki, Southern | diq
|
Turkey | 1,500,000 | 4 (Educational) | - |
Balochi, Western | bgn
|
Pakistan | 1,799,840 | 4 (Educational) | - |
Balochi, Eastern | bgp
|
Pakistan | 1,800,800 | 4 (Educational) | - |
Gilaki | glk
|
Iran | 3,270,000 | 4 (Educational) | - |
Balochi, Southern | bcc
|
Pakistan | 3,405,000 | 4 (Educational) | - |
Pashto, Northern | pbu
|
Pakistan | 11,430,000 | 4 (Educational) | - |
Kurdish, Northern | kmr
|
Turkey | 20,210,872 | 3 (Wider communication) | - |
Kurdish, Central | ckb
|
Iraq | 6,750,000 | 2 (Provincial) | - |
Tajiki | tgk
|
Tajikistan | 4,479,650 | 1 (National) | - |
Pashto, Southern | pbt
|
Afghanistan | 7,590,100 | 1 (National) | - |
Dari | prs
|
Afghanistan | 9,600,000 | 1 (National) | - |
Persian, Iranian | pes
|
Iran | 47,045,100 | 1 (National) | - |
Language Codes
Note that fas
(/per
) and fa
are macrocodes for Persian, which includes Farsi (Iranian Persian - pes
), Dari (Afghan Persian - prs
), and Tajik (tgk
).