Difference between revisions of "Iranian languages"

From Apertium
Jump to navigation Jump to search
(Created page with '{| class="wikitable sortable" |- !rowspan=2| name !rowspan=2| Language !colspan=2 class="unsortable"| ISO 639 !rowspan=2| formalism !rowspan=2| state !rowspan=2| stems !rowspan=2…')
 
 
(20 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{TOCD}}
The '''Iranian languages''' include [[Farsi|Iranian Persian]], [[Dari]], [[Tajik]] (three varieties of Modern Persian), [[Pashto]], Balochi, Kurdish, [[Ossetian]], Tat, and several dozen other languages.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

==Status==
The ultimate goal is to have multi-purposable transducers for a variety of Iranian languages. These can then be paired for X→Y translation with the addition of a [[Constraint Grammar|CG]] for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

===Transducers===
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

{| class="wikitable sortable"
{| class="wikitable sortable"
|-
|-
Line 22: Line 33:
|align="right"|
|align="right"|
|align="center"|
|align="center"|
|| [[apertium-pes]] ([[languages]])
|| [[apertium-pes]] ([[incubator]])
||
||
|-
|-
|| <code></code>
|| <code>[[apertium-tgk]]</code>
|| Tajik
|| Tajik
|| <code>tg</code>
|| <code>tg</code>
Line 44: Line 55:
|align="right"|
|align="right"|
|align="center"|
|align="center"|
|| [[apertium-glk]]&nbsp;([[languages]])
|| [[apertium-glk]]&nbsp;([[incubator]])
|| ronl, [[User:Francis Tyers|Fran]]
|-
|| <code>[[apertium-oss]]</code>
|| Ossetian
|| <code>os</code>
|| <code>oss</code>
|| [[lttoolbox]]
|| development
|align="right"|
|align="center"|
|| [[apertium-oss]]&nbsp;([[incubator]])
||
||
|}


{| class="wikitable sortable"
|-
!rowspan=2| name
!rowspan=2| Language
!colspan=2 class="unsortable"| ISO 639
!rowspan=2| formalism
!rowspan=2| state
!rowspan=2| stems
!rowspan=2| paradigms
!rowspan=2| coverage
!rowspan=2| location
!rowspan=2 class="unsortable"| primary authors
|-class="sortbottom"
! -2
! -3
|-
|| <code>[[apertium-kmr]]</code>
|| [[Kurdish]] ([[Kurmanji]])
|| <code>ku</code>
|| <code>kmr</code>
|| [[lttoolbox]]
|| development
|align="right"| {{#lst:Apertium-kmr/stats|stems}}
|align="right"| {{#lst:Apertium-kmr/stats|paradigms}}
|align="center"| [[Apertium-kmr#Current_State|~{{:Apertium-kmr/stats/average}}%]]
|| [[apertium-kmr]] ([[languages]])
|| [[User:Francis Tyers|Fran]], [[User:Memduh|Memduh]]
|-
|| <code>[[apertium-pes]]</code>
|| [[Iranian Persian]] (Farsi)
|| <code>fa</code>
|| <code>pes</code>
|| [[lttoolbox]]
|| development
|align="right"| {{#lst:Apertium-pes/stats|stems}}
|align="right"| {{#lst:Apertium-pes/stats|paradigms}}
|align="center"| [[Apertium-pes#Current_State|~{{:Apertium-pes/stats/average}}%]]
|| [[apertium-pes]] ([[languages]])
|| [[User:Francis Tyers|Fran]], ...
|-
|| <code>[[apertium-tgk]]</code>
|| [[Tajik]]
|| <code>tg</code>
|| <code>tgk</code>
|| [[lttoolbox]]
|| development
|align="right"| {{#lst:Apertium-tgk/stats|stems}}
|align="right"| {{#lst:Apertium-tgk/stats|paradigms}}
|align="center"| [[Apertium-tgk#Current_State|~{{:Apertium-tgk/stats/average}}%]]
|| [[apertium-tgk]] ([[languages]])
|| [[User:Francis Tyers|Fran]], ...
|-
|| <code>[[apertium-oss]]</code>
|| [[Ossetian]]
|| <code>os</code>
|| <code>oss</code>
|| [[lttoolbox]]
|| prototype
|align="right"| {{#lst:Apertium-oss/stats|stems}}
|align="right"| {{#lst:Apertium-oss/stats|paradigms}}
|align="center"| [[Apertium-oss#Current_State|~{{:Apertium-oss/stats/average}}%]]
|| [[apertium-oss]] ([[languages]])
|| [[User:Francis Tyers|Fran]], ...
|-
|| <code>[[apertium-glk]]</code>
|| [[Gilaki]]
|| <code></code>
|| <code>glk</code>
|| [[lttoolbox]]
|| prototype
|align="right"| {{#lst:Apertium-glk/stats|stems}}
|align="right"| {{#lst:Apertium-glk/stats|paradigms}}
|align="center"| [[Apertium-oss#Current_State|~{{:Apertium-glk/stats/average}}%]]
|| [[apertium-glk]] ([[languages]])
|| [[User:Francis Tyers|Fran]], ronl
|-
|| <code>[[apertium-ckb]]</code>
|| [[Central Kurdish]] ([[Sorani]])
|| <code></code>
|| <code>ckb</code>
|| [[lttoolbox]]
|| prototype
|align="right"| {{#lst:Apertium-ckb/stats|stems}}
|align="right"| {{#lst:Apertium-ckb/stats|paradigms}}
|align="center"| [[Apertium-kmr#Current_State|~{{:Apertium-ckb/stats/average}}%]]
|| [[apertium-ckb]] ([[languages]])
|| [[User:Francis Tyers|Fran]], [[User:Memduh|Memduh]]
|}

=== Table of Existing Pairs ===
Text in ''italics'' denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in '''bold''' denotes a stable well-working language pair in trunk and text in '''''bold and italics''''' denotes a pair in staging. Bidix stems as counted with [[dixcounter]] are displayed below.

{| style="text-align: center;" class="wikitable dixtable"
|- style="background: #ececec"
! !! pes !! tgk !! glk !! oss !! fas
|-
| '''pes''' || - || [[Apertium-tgk-pes|tgk-pes]]<br>{{#lst:Apertium-tgk-pes/stats|tgk-pes_stems}} || ''[[Apertium-pes-glk|pes-glk]]''<br>{{#lst:Apertium-pes-glk/stats|pes-glk_stems}} || ||
|-
| '''tgk''' || [[Apertium-tgk-pes|tgk-pes]]<br>{{#lst:Apertium-tgk-pes/stats|tgk-pes_stems}} || - || || ||
|-
| '''glk''' || ''[[Apertium-pes-glk|pes-glk]]''<br>{{#lst:Apertium-pes-glk/stats|pes-glk_stems}} || || - || ||
|-
| '''oss''' || || || || - ||
|-
| '''fas''' || || || || || -
|-
| || || || || ||
|-
| '''eng''' || || ''[[Apertium-tg-en|tg-en]]''<br>{{#lst:Apertium-tg-en/stats|tg-en_stems}} || || ||
|-
| '''epo''' || || || || || ''[[Apertium-eo-fa|eo-fa]]''<br>{{#lst:Apertium-eo-fa/stats|eo-fa_stems}}
|-
| '''urd''' || || || || || ''[[Apertium-ur-fa|ur-fa]]''<br>{{#lst:Apertium-ur-fa/stats|ur-fa_stems}}
|}
|}


== Language Codes ==
== Language Codes ==
Note that <code>fas</code>(/<code>per</code>) and <code>fa</code> are macrocodes for Persian, which includes Farsi (Iranian Persian - <code>pes</code>), Dari (Afghan Persian - <code>prs</code>), and Tajik (<code>tgk</code>)
Note that <code>fas</code>(/<code>per</code>) and <code>fa</code> are macrocodes for Persian, which includes Farsi (Iranian Persian - <code>pes</code>), Dari (Afghan Persian - <code>prs</code>), and Tajik (<code>tgk</code>).

== Samples ==
Article 1 of the Universal Declaration of Human Rights:

''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.''

{|class=wikitable
! Language !! Text
|-
|| Ossetian || Адӕймӕгтӕ се¢ ппӕт дӕр райгуырынц сӕрибарӕй ӕмӕ ӕмхуызонӕй сӕ барты. Уыдон ӕххӕст сты зонд ӕмӕ намысӕй, ӕмӕ кӕрӕдзийӕн хъуамӕ уой ӕфсымӕрты хуызӕн.
|-
|| Pashto, Northern
|align="right"| د بشر ټول افراد آزاد نړۍ ته راځی او د حيثيت او حقوقو له پلوه سره برابر دی. ټول د عقل او وجدان خاوندان دی او يو له بل سره د ورورۍ په روحيې سره بايد چلند کړی.
|-
|| Kurdish, Northern || Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin.
|-
|| Tajik || Тамоми одамон озод ва аз лиҳози шарафу ҳуқуқ ба ҳам баробар ба дунё меоянд. Онҳо соҳиби ақлу виҷдонанд ва бояд бо якдигар муносибати бародарона дошта бошанд.
|-
|| Iranian Persian
|align="right"| تمام افراد بشر آزاد بدنیا میایند و از لحاظ حیثیت و حقوق با هم برابرند. همه دارای عقل و وجدان میباشند و باید نسبت بیکدیگر با روح برادری رفتار کنند.
|-
|| Dari
|align="right"| تمام افراد بشر آزاد به دنیا می‌آیند و از لحاظ حیثیت و حقوق با هم برابرند. همه دارای عقل و وجدان هستند و باید نسبت به یکدیگر با روح برادری رفتار کنند.
|}

== Vulnerability ==
This table summarizes the vulnerability of various Iranian languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, [http://www.unesco.org/culture/languages-atlas http://www.unesco.org/culture/languages-atlas]’ and [http://www.ethnologue.com/ Ethnologue].

{| class="wikitable sortable"
!rowspan=2| Language
!rowspan=2| ISO639-3
!rowspan=2| Location
!rowspan=2| Speakers
!colspan=2|Status
|-class="sortbottom"
! Ethnologue
! UNESCO
|-
|| Avestan
|align="center"| <code>[http://www.ethnologue.com/language/ave ave]</code>
|| Iran
|align="right"| 0
|| 10 (Extinct)
|| -
|-
|| Pahlavani
|align="center"| <code>[http://www.ethnologue.com/language/phv phv]</code>
|| Afghanistan
|align="right"| 0
|| 9 (Dormant)
|| -
|-
|| Koroshi
|align="center"| <code>[http://www.ethnologue.com/language/ktl ktl]</code>
|| Iran
|align="right"| 180
|| 8b (Nearly extinct)
|| 4 (Critically endangered)
|-
|| Kumzari
|align="center"| <code>[http://www.ethnologue.com/language/zum zum]</code>
|| Oman
|align="right"| 2,300
|| 8a (Moribund)
|| 3 (Severely endangered)
|-
|| Parachi
|align="center"| <code>[http://www.ethnologue.com/language/prc prc]</code>
|| Afghanistan
|align="right"| 3,500
|| 7 (Shifting)
|| 2 (Definitely endangered)
|-
|| Bashkardi
|align="center"| <code>[http://www.ethnologue.com/language/bsg bsg]</code>
|| Iran
|align="right"| 7,030
|| 7 (Shifting)
|| 2 (Definitely endangered)
|-
|| Gazi
|align="center"| <code>[http://www.ethnologue.com/language/gzi gzi]</code>
|| Iran
|align="right"| 7,030
|| 7 (Shifting)
|| 2 (Definitely endangered)
|-
|| Sivandi
|align="center"| <code>[http://www.ethnologue.com/language/siy siy]</code>
|| Iran
|align="right"| 7,030
|| 7 (Shifting)
|| -
|-
|| Fars, Northwestern
|align="center"| <code>[http://www.ethnologue.com/language/faz faz]</code>
|| Iran
|align="right"| 7,500
|| 7 (Shifting)
|| -
|-
|| Dari, Zoroastrian
|align="center"| <code>[http://www.ethnologue.com/language/gbz gbz]</code>
|| Iran
|align="right"| 8,000
|| 7 (Shifting)
|| 2 (Definitely endangered)
|-
|| Shabak
|align="center"| <code>[http://www.ethnologue.com/language/sdb sdb]</code>
|| Iraq
|align="right"| 15,000
|| 7 (Shifting)
|| -
|-
|| Karingani
|align="center"| <code>[http://www.ethnologue.com/language/kgn kgn]</code>
|| Iran
|align="right"| 17,600
|| 7 (Shifting)
|| -
|-
|| Vafsi
|align="center"| <code>[http://www.ethnologue.com/language/vaf vaf]</code>
|| Iran
|align="right"| 18,000
|| 7 (Shifting)
|| 2 (Definitely endangered)
|-
|| Bajelani
|align="center"| <code>[http://www.ethnologue.com/language/bjm bjm]</code>
|| Iraq
|align="right"| 20,000
|| 7 (Shifting)
|| -
|-
|| Ashtiani
|align="center"| <code>[http://www.ethnologue.com/language/atn atn]</code>
|| Iran
|align="right"| 21,100
|| 7 (Shifting)
|| 2 (Definitely endangered)
|-
|| Khunsari
|align="center"| <code>[http://www.ethnologue.com/language/kfm kfm]</code>
|| Iran
|align="right"| 21,100
|| 7 (Shifting)
|| 2 (Definitely endangered)
|-
|| Tat, Muslim
|align="center"| <code>[http://www.ethnologue.com/language/ttt ttt]</code>
|| Azerbaijan
|align="right"| 28,010
|| 7 (Shifting)
|| 3 (Severely endangered)
|-
|| Harzani
|align="center"| <code>[http://www.ethnologue.com/language/hrz hrz]</code>
|| Iran
|align="right"| 28,100
|| 7 (Shifting)
|| -
|-
|| Dzhidi
|align="center"| <code>[http://www.ethnologue.com/language/jpr jpr]</code>
|| Israel & Iran
|align="right"| 60,000
|| 7 (Shifting)
|| 2 (Definitely endangered)
|-
|| Fars, Southwestern
|align="center"| <code>[http://www.ethnologue.com/language/fay fay]</code>
|| Iran
|align="right"| 100,000
|| 7 (Shifting)
|| -
|-
|| Bukharic
|align="center"| <code>[http://www.ethnologue.com/language/bhh bhh]</code>
|| Israel & Uzbekistan
|align="right"| 110,000
|| 7 (Shifting)
|| 2 (Definitely endangered)
|-
|| Gurani
|align="center"| <code>[http://www.ethnologue.com/language/hac hac]</code>
|| Iraq
|align="right"| 200,000
|| 7 (Shifting)
|| -
|-
|| Takestani
|align="center"| <code>[http://www.ethnologue.com/language/tks tks]</code>
|| Iran
|align="right"| 220,000
|| 7 (Shifting)
|| -
|-
|| Mazanderani
|align="center"| <code>[http://www.ethnologue.com/language/mzn mzn]</code>
|| Iran
|align="right"| 3,270,000
|| 7 (Shifting)
|| -
|-
|| Alviri-Vidari
|align="center"| <code>[http://www.ethnologue.com/language/avd avd]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Eshtehardi
|align="center"| <code>[http://www.ethnologue.com/language/esh esh]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Gozarkhani
|align="center"| <code>[http://www.ethnologue.com/language/goz goz]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Kabatei
|align="center"| <code>[http://www.ethnologue.com/language/xkp xkp]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Kajali
|align="center"| <code>[http://www.ethnologue.com/language/xkj xkj]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Kho’ini
|align="center"| <code>[http://www.ethnologue.com/language/xkc xkc]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Koresh-e Rostam
|align="center"| <code>[http://www.ethnologue.com/language/okh okh]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Maraghei
|align="center"| <code>[http://www.ethnologue.com/language/vmh vmh]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Razajerdi
|align="center"| <code>[http://www.ethnologue.com/language/rat rat]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Rudbari
|align="center"| <code>[http://www.ethnologue.com/language/rdb rdb]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Shahrudi
|align="center"| <code>[http://www.ethnologue.com/language/shm shm]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Taromi, Upper
|align="center"| <code>[http://www.ethnologue.com/language/tov tov]</code>
|| Iran
|align="right"| ?
|| 7 (Shifting)
|| -
|-
|| Sarli
|align="center"| <code>[http://www.ethnologue.com/language/sdf sdf]</code>
|| Iraq
|align="right"| Fewer than 20,000.
|| 7 (Shifting)
|| -
|-
|| Ishkashimi
|align="center"| <code>[http://www.ethnologue.com/language/isk isk]</code>
|| Afghanistan
|align="right"| 3,000
|| 6b (Threatened)
|| -
|-
|| Yidgha
|align="center"| <code>[http://www.ethnologue.com/language/ydg ydg]</code>
|| Pakistan
|align="right"| 6,150
|| 6b (Threatened)
|| 2 (Definitely endangered)
|-
|| Yazgulyam
|align="center"| <code>[http://www.ethnologue.com/language/yah yah]</code>
|| Tajikistan
|align="right"| 9,000
|| 6b (Threatened)
|| 3 (Severely endangered)
|-
|| Yagnobi
|align="center"| <code>[http://www.ethnologue.com/language/yai yai]</code>
|| Tajikistan
|align="right"| 12,000
|| 6b (Threatened)
|| 2 (Definitely endangered)
|-
|| Sarikoli
|align="center"| <code>[http://www.ethnologue.com/language/srh srh]</code>
|| China
|align="right"| 16,000
|| 6b (Threatened)
|| 2 (Definitely endangered)
|-
|| Lasgerdi
|align="center"| <code>[http://www.ethnologue.com/language/lsa lsa]</code>
|| Iran
|align="right"| 1,000
|| 6a (Vigorous)
|| -
|-
|| Sanglechi
|align="center"| <code>[http://www.ethnologue.com/language/sgy sgy]</code>
|| Afghanistan
|align="right"| 2,200
|| 6a (Vigorous)
|| -
|-
|| Munji
|align="center"| <code>[http://www.ethnologue.com/language/mnj mnj]</code>
|| Afghanistan
|align="right"| 5,300
|| 6a (Vigorous)
|| 3 (Severely endangered)
|-
|| Ormuri
|align="center"| <code>[http://www.ethnologue.com/language/oru oru]</code>
|| Pakistan, Afghanistan
|align="right"| 6,050
|| 6a (Vigorous)
|| 2 (Definitely endangered)
|-
|| Natanzi
|align="center"| <code>[http://www.ethnologue.com/language/ntz ntz]</code>
|| Iran
|align="right"| 7,030
|| 6a (Vigorous)
|| 3 (Severely endangered)
|-
|| Nayini
|align="center"| <code>[http://www.ethnologue.com/language/nyq nyq]</code>
|| Iran
|align="right"| 7,030
|| 6a (Vigorous)
|| 3 (Severely endangered)
|-
|| Soi
|align="center"| <code>[http://www.ethnologue.com/language/soj soj]</code>
|| Iran
|align="right"| 7,030
|| 6a (Vigorous)
|| -
|-
|| Sorkhei
|align="center"| <code>[http://www.ethnologue.com/language/sqo sqo]</code>
|| Iran
|align="right"| 10,000
|| 6a (Vigorous)
|| -
|-
|| Dehwari
|align="center"| <code>[http://www.ethnologue.com/language/deh deh]</code>
|| Pakistan
|align="right"| 13,000
|| 6a (Vigorous)
|| -
|-
|| Sangisari
|align="center"| <code>[http://www.ethnologue.com/language/sgr sgr]</code>
|| Iran
|align="right"| 36,000
|| 6a (Vigorous)
|| -
|-
|| Wakhi
|align="center"| <code>[http://www.ethnologue.com/language/wbl wbl]</code>
|| China, Pakistan, Tajikistan, Afghanistan
|align="right"| 47,100
|| 6a (Vigorous)
|| 2 (Definitely endangered)
|-
|| Semnani
|align="center"| <code>[http://www.ethnologue.com/language/smy smy]</code>
|| Iran
|align="right"| 60,000
|| 6a (Vigorous)
|| -
|-
|| Shughni
|align="center"| <code>[http://www.ethnologue.com/language/sgh sgh]</code>
|| Tajikistan
|align="right"| 80,000
|| 6a (Vigorous)
|| 3 (Severely endangered)
|-
|| Lari
|align="center"| <code>[http://www.ethnologue.com/language/lrl lrl]</code>
|| Iran
|align="right"| 80,000,
|| 6a (Vigorous)
|| -
|-
|| Waneci
|align="center"| <code>[http://www.ethnologue.com/language/wne wne]</code>
|| Pakistan
|align="right"| 95,000
|| 6a (Vigorous)
|| -
|-
|| Parsi
|align="center"| <code>[http://www.ethnologue.com/language/prp prp]</code>
|| India
|align="right"| 326,000
|| 6a (Vigorous)
|| -
|-
|| Parsi-Dari
|align="center"| <code>[http://www.ethnologue.com/language/prd prd]</code>
|| Iran
|align="right"| 350,000
|| 6a (Vigorous)
|| -
|-
|| Aimaq
|align="center"| <code>[http://www.ethnologue.com/language/aiq aiq]</code>
|| Afghanistan
|align="right"| 650,000
|| 6a (Vigorous)
|| -
|-
|| Luri, Southern
|align="center"| <code>[http://www.ethnologue.com/language/luz luz]</code>
|| Iran
|align="right"| 875,000
|| 6a (Vigorous)
|| -
|-
|| Laki
|align="center"| <code>[http://www.ethnologue.com/language/lki lki]</code>
|| Iran
|align="right"| 1,000,000
|| 6a (Vigorous)
|| -
|-
|| Bakhtiâri
|align="center"| <code>[http://www.ethnologue.com/language/bqi bqi]</code>
|| Iran
|align="right"| 1,000,000
|| 6a (Vigorous)
|| -
|-
|| Luri, Northern
|align="center"| <code>[http://www.ethnologue.com/language/lrc lrc]</code>
|| Iran
|align="right"| 1,500,000
|| 6a (Vigorous)
|| -
|-
|| Kurdish, Southern
|align="center"| <code>[http://www.ethnologue.com/language/sdh sdh]</code>
|| Iran
|align="right"| 3,000,000
|| 6a (Vigorous)
|| -
|-
|| Pashto, Central
|align="center"| <code>[http://www.ethnologue.com/language/pst pst]</code>
|| Pakistan
|align="right"| 7,920,000
|| 6a (Vigorous)
|| -
|-
|| Shahmirzadi
|align="center"| <code>[http://www.ethnologue.com/language/srz srz]</code>
|| Iran
|align="right"| ?
|| 6a (Vigorous)
|| -
|-
|| Dezfuli
|align="center"| <code>[http://www.ethnologue.com/language/def def]</code>
|| Iran
|align="right"| ?
|| 6a (Vigorous)
|| -
|-
|| Khalaj
|align="center"| <code>[http://www.ethnologue.com/language/kjf kjf]</code>
|| Azerbaijan
|align="right"| 42,100
|| 5 (Developing)
|| -
|-
|| Ossetic
|align="center"| <code>[http://www.ethnologue.com/language/oss oss]</code>
|| Georgia, Russian Federation
|align="right"| 577,450
|| 5 (Developing)
|| 1 (Vulnerable)
|-
|| Hazaragi
|align="center"| <code>[http://www.ethnologue.com/language/haz haz]</code>
|| Afghanistan
|align="right"| 2,210,000
|| 5 (Developing)
|| -
|-
|| Judeo-Tat
|align="center"| <code>[http://www.ethnologue.com/language/jdt jdt]</code>
|| Azerbaijan, Russian Federation
|align="right"| 2,010
|| 4 (Educational)
|| 2 (Definitely endangered)
|-
|| Zazaki, Northern
|align="center"| <code>[http://www.ethnologue.com/language/kiu kiu]</code>
|| Turkey
|align="right"| 140,000
|| 4 (Educational)
|| -
|-
|| Talysh
|align="center"| <code>[http://www.ethnologue.com/language/tly tly]</code>
|| Azerbaijan, Iran
|align="right"| 915,400
|| 4 (Educational)
|| 1 (Vulnerable)
|-
|| Zazaki, Southern
|align="center"| <code>[http://www.ethnologue.com/language/diq diq]</code>
|| Turkey
|align="right"| 1,500,000
|| 4 (Educational)
|| -
|-
|| Balochi, Western
|align="center"| <code>[http://www.ethnologue.com/language/bgn bgn]</code>
|| Pakistan
|align="right"| 1,799,840
|| 4 (Educational)
|| -
|-
|| Balochi, Eastern
|align="center"| <code>[http://www.ethnologue.com/language/bgp bgp]</code>
|| Pakistan
|align="right"| 1,800,800
|| 4 (Educational)
|| -
|-
|| Gilaki
|align="center"| <code>[http://www.ethnologue.com/language/glk glk]</code>
|| Iran
|align="right"| 3,270,000
|| 4 (Educational)
|| -
|-
|| Balochi, Southern
|align="center"| <code>[http://www.ethnologue.com/language/bcc bcc]</code>
|| Pakistan
|align="right"| 3,405,000
|| 4 (Educational)
|| -
|-
|| Pashto, Northern
|align="center"| <code>[http://www.ethnologue.com/language/pbu pbu]</code>
|| Pakistan
|align="right"| 11,430,000
|| 4 (Educational)
|| -
|-
|| Kurdish, Northern
|align="center"| <code>[http://www.ethnologue.com/language/kmr kmr]</code>
|| Turkey
|align="right"| 20,210,872
|| 3 (Wider communication)
|| -
|-
|| Kurdish, Central
|align="center"| <code>[http://www.ethnologue.com/language/ckb ckb]</code>
|| Iraq
|align="right"| 6,750,000
|| 2 (Provincial)
|| -
|-
|| Tajiki
|align="center"| <code>[http://www.ethnologue.com/language/tgk tgk]</code>
|| Tajikistan
|align="right"| 4,479,650
|| 1 (National)
|| -
|-
|| Pashto, Southern
|align="center"| <code>[http://www.ethnologue.com/language/pbt pbt]</code>
|| Afghanistan
|align="right"| 7,590,100
|| 1 (National)
|| -
|-
|| Dari
|align="center"| <code>[http://www.ethnologue.com/language/prs prs]</code>
|| Afghanistan
|align="right"| 9,600,000
|| 1 (National)
|| -
|-
|| Persian, Iranian
|align="center"| <code>[http://www.ethnologue.com/language/pes pes]</code>
|| Iran
|align="right"| 47,045,100
|| 1 (National)
|| -
|}

==Classification==

* Southwestern: [[Iranian Persian]], [[Tajik]]
* Northwestern: [[Kurdish]] ([[Kurmanji]], [[Sorani]])
* Southeastern: [[Pashto]]
* Northeastern: [[Ossetian]]


[[Category:Iranian languages]]

Latest revision as of 11:36, 30 July 2018

The Iranian languages include Iranian Persian, Dari, Tajik (three varieties of Modern Persian), Pashto, Balochi, Kurdish, Ossetian, Tat, and several dozen other languages.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status[edit]

The ultimate goal is to have multi-purposable transducers for a variety of Iranian languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

Transducers[edit]

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name Language ISO 639 formalism state stems coverage location primary authors
-2 -3
apertium-pes Iranian Persian (Farsi) pes lttoolbox development apertium-pes (incubator)
apertium-tgk Tajik tg tgk lttoolbox development
apertium-glk Gilaki glk lttoolbox development apertium-glk (incubator) ronl, Fran
apertium-oss Ossetian os oss lttoolbox development apertium-oss (incubator)


name Language ISO 639 formalism state stems paradigms coverage location primary authors
-2 -3
apertium-kmr Kurdish (Kurmanji) ku kmr lttoolbox development 17,771 157 [[Apertium-kmr#Current_State|~Apertium-kmr/stats/average%]] apertium-kmr (languages) Fran, Memduh
apertium-pes Iranian Persian (Farsi) fa pes lttoolbox development 13,167 113 [[Apertium-pes#Current_State|~Apertium-pes/stats/average%]] apertium-pes (languages) Fran, ...
apertium-tgk Tajik tg tgk lttoolbox development 2,784 79 [[Apertium-tgk#Current_State|~Apertium-tgk/stats/average%]] apertium-tgk (languages) Fran, ...
apertium-oss Ossetian os oss lttoolbox prototype 111 ~17% apertium-oss (languages) Fran, ...
apertium-glk Gilaki glk lttoolbox prototype 4 28 [[Apertium-oss#Current_State|~Apertium-glk/stats/average%]] apertium-glk (languages) Fran, ronl
apertium-ckb Central Kurdish (Sorani) ckb lttoolbox prototype 2 [[Apertium-kmr#Current_State|~Apertium-ckb/stats/average%]] apertium-ckb (languages) Fran, Memduh

Table of Existing Pairs[edit]

Text in italics denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in bold denotes a stable well-working language pair in trunk and text in bold and italics denotes a pair in staging. Bidix stems as counted with dixcounter are displayed below.

pes tgk glk oss fas
pes - tgk-pes
504
pes-glk
8
tgk tgk-pes
504
-
glk pes-glk
8
-
oss -
fas -
eng tg-en
?
epo eo-fa
?
urd ur-fa
?

Language Codes[edit]

Note that fas(/per) and fa are macrocodes for Persian, which includes Farsi (Iranian Persian - pes), Dari (Afghan Persian - prs), and Tajik (tgk).

Samples[edit]

Article 1 of the Universal Declaration of Human Rights:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Language Text
Ossetian Адӕймӕгтӕ се¢ ппӕт дӕр райгуырынц сӕрибарӕй ӕмӕ ӕмхуызонӕй сӕ барты. Уыдон ӕххӕст сты зонд ӕмӕ намысӕй, ӕмӕ кӕрӕдзийӕн хъуамӕ уой ӕфсымӕрты хуызӕн.
Pashto, Northern د بشر ټول افراد آزاد نړۍ ته راځی او د حيثيت او حقوقو له پلوه سره برابر دی. ټول د عقل او وجدان خاوندان دی او يو له بل سره د ورورۍ په روحيې سره بايد چلند کړی.
Kurdish, Northern Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin.
Tajik Тамоми одамон озод ва аз лиҳози шарафу ҳуқуқ ба ҳам баробар ба дунё меоянд. Онҳо соҳиби ақлу виҷдонанд ва бояд бо якдигар муносибати бародарона дошта бошанд.
Iranian Persian تمام افراد بشر آزاد بدنیا میایند و از لحاظ حیثیت و حقوق با هم برابرند. همه دارای عقل و وجدان میباشند و باید نسبت بیکدیگر با روح برادری رفتار کنند.
Dari تمام افراد بشر آزاد به دنیا می‌آیند و از لحاظ حیثیت و حقوق با هم برابرند. همه دارای عقل و وجدان هستند و باید نسبت به یکدیگر با روح برادری رفتار کنند.

Vulnerability[edit]

This table summarizes the vulnerability of various Iranian languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.

Language ISO639-3 Location Speakers Status
Ethnologue UNESCO
Avestan ave Iran 0 10 (Extinct) -
Pahlavani phv Afghanistan 0 9 (Dormant) -
Koroshi ktl Iran 180 8b (Nearly extinct) 4 (Critically endangered)
Kumzari zum Oman 2,300 8a (Moribund) 3 (Severely endangered)
Parachi prc Afghanistan 3,500 7 (Shifting) 2 (Definitely endangered)
Bashkardi bsg Iran 7,030 7 (Shifting) 2 (Definitely endangered)
Gazi gzi Iran 7,030 7 (Shifting) 2 (Definitely endangered)
Sivandi siy Iran 7,030 7 (Shifting) -
Fars, Northwestern faz Iran 7,500 7 (Shifting) -
Dari, Zoroastrian gbz Iran 8,000 7 (Shifting) 2 (Definitely endangered)
Shabak sdb Iraq 15,000 7 (Shifting) -
Karingani kgn Iran 17,600 7 (Shifting) -
Vafsi vaf Iran 18,000 7 (Shifting) 2 (Definitely endangered)
Bajelani bjm Iraq 20,000 7 (Shifting) -
Ashtiani atn Iran 21,100 7 (Shifting) 2 (Definitely endangered)
Khunsari kfm Iran 21,100 7 (Shifting) 2 (Definitely endangered)
Tat, Muslim ttt Azerbaijan 28,010 7 (Shifting) 3 (Severely endangered)
Harzani hrz Iran 28,100 7 (Shifting) -
Dzhidi jpr Israel & Iran 60,000 7 (Shifting) 2 (Definitely endangered)
Fars, Southwestern fay Iran 100,000 7 (Shifting) -
Bukharic bhh Israel & Uzbekistan 110,000 7 (Shifting) 2 (Definitely endangered)
Gurani hac Iraq 200,000 7 (Shifting) -
Takestani tks Iran 220,000 7 (Shifting) -
Mazanderani mzn Iran 3,270,000 7 (Shifting) -
Alviri-Vidari avd Iran ? 7 (Shifting) -
Eshtehardi esh Iran ? 7 (Shifting) -
Gozarkhani goz Iran ? 7 (Shifting) -
Kabatei xkp Iran ? 7 (Shifting) -
Kajali xkj Iran ? 7 (Shifting) -
Kho’ini xkc Iran ? 7 (Shifting) -
Koresh-e Rostam okh Iran ? 7 (Shifting) -
Maraghei vmh Iran ? 7 (Shifting) -
Razajerdi rat Iran ? 7 (Shifting) -
Rudbari rdb Iran ? 7 (Shifting) -
Shahrudi shm Iran ? 7 (Shifting) -
Taromi, Upper tov Iran ? 7 (Shifting) -
Sarli sdf Iraq Fewer than 20,000. 7 (Shifting) -
Ishkashimi isk Afghanistan 3,000 6b (Threatened) -
Yidgha ydg Pakistan 6,150 6b (Threatened) 2 (Definitely endangered)
Yazgulyam yah Tajikistan 9,000 6b (Threatened) 3 (Severely endangered)
Yagnobi yai Tajikistan 12,000 6b (Threatened) 2 (Definitely endangered)
Sarikoli srh China 16,000 6b (Threatened) 2 (Definitely endangered)
Lasgerdi lsa Iran 1,000 6a (Vigorous) -
Sanglechi sgy Afghanistan 2,200 6a (Vigorous) -
Munji mnj Afghanistan 5,300 6a (Vigorous) 3 (Severely endangered)
Ormuri oru Pakistan, Afghanistan 6,050 6a (Vigorous) 2 (Definitely endangered)
Natanzi ntz Iran 7,030 6a (Vigorous) 3 (Severely endangered)
Nayini nyq Iran 7,030 6a (Vigorous) 3 (Severely endangered)
Soi soj Iran 7,030 6a (Vigorous) -
Sorkhei sqo Iran 10,000 6a (Vigorous) -
Dehwari deh Pakistan 13,000 6a (Vigorous) -
Sangisari sgr Iran 36,000 6a (Vigorous) -
Wakhi wbl China, Pakistan, Tajikistan, Afghanistan 47,100 6a (Vigorous) 2 (Definitely endangered)
Semnani smy Iran 60,000 6a (Vigorous) -
Shughni sgh Tajikistan 80,000 6a (Vigorous) 3 (Severely endangered)
Lari lrl Iran 80,000, 6a (Vigorous) -
Waneci wne Pakistan 95,000 6a (Vigorous) -
Parsi prp India 326,000 6a (Vigorous) -
Parsi-Dari prd Iran 350,000 6a (Vigorous) -
Aimaq aiq Afghanistan 650,000 6a (Vigorous) -
Luri, Southern luz Iran 875,000 6a (Vigorous) -
Laki lki Iran 1,000,000 6a (Vigorous) -
Bakhtiâri bqi Iran 1,000,000 6a (Vigorous) -
Luri, Northern lrc Iran 1,500,000 6a (Vigorous) -
Kurdish, Southern sdh Iran 3,000,000 6a (Vigorous) -
Pashto, Central pst Pakistan 7,920,000 6a (Vigorous) -
Shahmirzadi srz Iran ? 6a (Vigorous) -
Dezfuli def Iran ? 6a (Vigorous) -
Khalaj kjf Azerbaijan 42,100 5 (Developing) -
Ossetic oss Georgia, Russian Federation 577,450 5 (Developing) 1 (Vulnerable)
Hazaragi haz Afghanistan 2,210,000 5 (Developing) -
Judeo-Tat jdt Azerbaijan, Russian Federation 2,010 4 (Educational) 2 (Definitely endangered)
Zazaki, Northern kiu Turkey 140,000 4 (Educational) -
Talysh tly Azerbaijan, Iran 915,400 4 (Educational) 1 (Vulnerable)
Zazaki, Southern diq Turkey 1,500,000 4 (Educational) -
Balochi, Western bgn Pakistan 1,799,840 4 (Educational) -
Balochi, Eastern bgp Pakistan 1,800,800 4 (Educational) -
Gilaki glk Iran 3,270,000 4 (Educational) -
Balochi, Southern bcc Pakistan 3,405,000 4 (Educational) -
Pashto, Northern pbu Pakistan 11,430,000 4 (Educational) -
Kurdish, Northern kmr Turkey 20,210,872 3 (Wider communication) -
Kurdish, Central ckb Iraq 6,750,000 2 (Provincial) -
Tajiki tgk Tajikistan 4,479,650 1 (National) -
Pashto, Southern pbt Afghanistan 7,590,100 1 (National) -
Dari prs Afghanistan 9,600,000 1 (National) -
Persian, Iranian pes Iran 47,045,100 1 (National) -

Classification[edit]