Difference between revisions of "Iranian languages"

From Apertium
Jump to navigation Jump to search
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
The '''Iranian languages''' include [[Farsi]], [[Dari]], [[Tajik]] (three varieties of Modern Persian), [[Pashto]], Balochi, Kurdish, [[Ossetian]], Tat, and several dozen other languages.
+
The '''Iranian languages''' include [[Farsi|Iranian Persian]], [[Dari]], [[Tajik]] (three varieties of Modern Persian), [[Pashto]], Balochi, Kurdish, [[Ossetian]], Tat, and several dozen other languages.
   
 
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
 
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Line 68: Line 68:
 
|| [[apertium-oss]] ([[incubator]])
 
|| [[apertium-oss]] ([[incubator]])
 
||
 
||
  +
|}
  +
  +
  +
{| class="wikitable sortable"
  +
|-
  +
!rowspan=2| name
  +
!rowspan=2| Language
  +
!colspan=2 class="unsortable"| ISO 639
  +
!rowspan=2| formalism
  +
!rowspan=2| state
  +
!rowspan=2| stems
  +
!rowspan=2| paradigms
  +
!rowspan=2| coverage
  +
!rowspan=2| location
  +
!rowspan=2 class="unsortable"| primary authors
  +
|-class="sortbottom"
  +
! -2
  +
! -3
  +
|-
  +
|| <code>[[apertium-kmr]]</code>
  +
|| [[Kurdish]] ([[Kurmanji]])
  +
|| <code>ku</code>
  +
|| <code>kmr</code>
  +
|| [[lttoolbox]]
  +
|| development
  +
|align="right"| {{#lst:Apertium-kmr/stats|stems}}
  +
|align="right"| {{#lst:Apertium-kmr/stats|paradigms}}
  +
|align="center"| [[Apertium-kmr#Current_State|~{{:Apertium-kmr/stats/average}}%]]
  +
|| [[apertium-kmr]] ([[languages]])
  +
|| [[User:Francis Tyers|Fran]], [[User:Memduh|Memduh]]
  +
|-
  +
|| <code>[[apertium-pes]]</code>
  +
|| [[Iranian Persian]] (Farsi)
  +
|| <code>fa</code>
  +
|| <code>pes</code>
  +
|| [[lttoolbox]]
  +
|| development
  +
|align="right"| {{#lst:Apertium-pes/stats|stems}}
  +
|align="right"| {{#lst:Apertium-pes/stats|paradigms}}
  +
|align="center"| [[Apertium-pes#Current_State|~{{:Apertium-pes/stats/average}}%]]
  +
|| [[apertium-pes]] ([[languages]])
  +
|| [[User:Francis Tyers|Fran]], ...
  +
|-
  +
|| <code>[[apertium-tgk]]</code>
  +
|| [[Tajik]]
  +
|| <code>tg</code>
  +
|| <code>tgk</code>
  +
|| [[lttoolbox]]
  +
|| development
  +
|align="right"| {{#lst:Apertium-tgk/stats|stems}}
  +
|align="right"| {{#lst:Apertium-tgk/stats|paradigms}}
  +
|align="center"| [[Apertium-tgk#Current_State|~{{:Apertium-tgk/stats/average}}%]]
  +
|| [[apertium-tgk]] ([[languages]])
  +
|| [[User:Francis Tyers|Fran]], ...
  +
|-
  +
|| <code>[[apertium-oss]]</code>
  +
|| [[Ossetian]]
  +
|| <code>os</code>
  +
|| <code>oss</code>
  +
|| [[lttoolbox]]
  +
|| prototype
  +
|align="right"| {{#lst:Apertium-oss/stats|stems}}
  +
|align="right"| {{#lst:Apertium-oss/stats|paradigms}}
  +
|align="center"| [[Apertium-oss#Current_State|~{{:Apertium-oss/stats/average}}%]]
  +
|| [[apertium-oss]] ([[languages]])
  +
|| [[User:Francis Tyers|Fran]], ...
  +
|-
  +
|| <code>[[apertium-glk]]</code>
  +
|| [[Gilaki]]
  +
|| <code></code>
  +
|| <code>glk</code>
  +
|| [[lttoolbox]]
  +
|| prototype
  +
|align="right"| {{#lst:Apertium-glk/stats|stems}}
  +
|align="right"| {{#lst:Apertium-glk/stats|paradigms}}
  +
|align="center"| [[Apertium-oss#Current_State|~{{:Apertium-glk/stats/average}}%]]
  +
|| [[apertium-glk]] ([[languages]])
  +
|| [[User:Francis Tyers|Fran]], ronl
  +
|-
  +
|| <code>[[apertium-ckb]]</code>
  +
|| [[Central Kurdish]] ([[Sorani]])
  +
|| <code></code>
  +
|| <code>ckb</code>
  +
|| [[lttoolbox]]
  +
|| prototype
  +
|align="right"| {{#lst:Apertium-ckb/stats|stems}}
  +
|align="right"| {{#lst:Apertium-ckb/stats|paradigms}}
  +
|align="center"| [[Apertium-kmr#Current_State|~{{:Apertium-ckb/stats/average}}%]]
  +
|| [[apertium-ckb]] ([[languages]])
  +
|| [[User:Francis Tyers|Fran]], [[User:Memduh|Memduh]]
 
|}
 
|}
   
Line 107: Line 197:
 
! Language !! Text
 
! Language !! Text
 
|-
 
|-
|| Osetin || Адӕймӕгтӕ се¢ ппӕт дӕр райгуырынц сӕрибарӕй ӕмӕ ӕмхуызонӕй сӕ барты. Уыдон ӕххӕст сты зонд ӕмӕ намысӕй, ӕмӕ кӕрӕдзийӕн хъуамӕ уой ӕфсымӕрты хуызӕн.
+
|| Ossetian || Адӕймӕгтӕ се¢ ппӕт дӕр райгуырынц сӕрибарӕй ӕмӕ ӕмхуызонӕй сӕ барты. Уыдон ӕххӕст сты зонд ӕмӕ намысӕй, ӕмӕ кӕрӕдзийӕн хъуамӕ уой ӕфсымӕрты хуызӕн.
 
|-
 
|-
 
|| Pashto, Northern
 
|| Pashto, Northern
 
|align="right"| د بشر ټول افراد آزاد نړۍ ته راځی او د حيثيت او حقوقو له پلوه سره برابر دی. ټول د عقل او وجدان خاوندان دی او يو له بل سره د ورورۍ په روحيې سره بايد چلند کړی.
 
|align="right"| د بشر ټول افراد آزاد نړۍ ته راځی او د حيثيت او حقوقو له پلوه سره برابر دی. ټول د عقل او وجدان خاوندان دی او يو له بل سره د ورورۍ په روحيې سره بايد چلند کړی.
 
|-
 
|-
|| Kurdish, Central || Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin.
+
|| Kurdish, Northern || Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin.
 
|-
 
|-
  +
|| Tajik || Тамоми одамон озод ва аз лиҳози шарафу ҳуқуқ ба ҳам баробар ба дунё меоянд. Онҳо соҳиби ақлу виҷдонанд ва бояд бо якдигар муносибати бародарона дошта бошанд.
|| Samoan || O tagata soifua uma ua saoloto lo latou fananau mai, ma e tutusa o latou tulaga aloaia faapea a latou aia tatau. Ua faaeeina atu i a latou le mafaufau lelei ma le loto fuatiaifo ma e tatau ona faatino le agaga faauso i le va o le tasi i le isi,
 
 
|-
 
|-
  +
|| Iranian Persian
|| Tongan (Tonga) || Ko e kotoa ‘o ha’a tangata ‘oku fanau’i mai ‘oku tau’ataina pea tatau ‘i he ngeia mo e ngaahi totonu. Na’e fakanaunau’i kinautolu ‘aki ‘a e ‘atamai mo e konisenisi pea ‘oku totonu ke nau feohi ‘i he laumalie ‘o e nofo fakatautehina.
 
|-
 
|| Tajiki || Тамоми одамон озод ва аз лиҳози шарафу ҳуқуқ ба ҳам баробар ба дунё меоянд. Онҳо соҳиби ақлу виҷдонанд ва бояд бо якдигар муносибати бародарона дошта бошанд.
 
|-
 
|| Persian, Iranian
 
 
|align="right"| تمام افراد بشر آزاد بدنیا میایند و از لحاظ حیثیت و حقوق با هم برابرند. همه دارای عقل و وجدان میباشند و باید نسبت بیکدیگر با روح برادری رفتار کنند.
 
|align="right"| تمام افراد بشر آزاد بدنیا میایند و از لحاظ حیثیت و حقوق با هم برابرند. همه دارای عقل و وجدان میباشند و باید نسبت بیکدیگر با روح برادری رفتار کنند.
 
|-
 
|-
Line 742: Line 828:
 
|| -
 
|| -
 
|}
 
|}
  +
  +
==Classification==
  +
  +
* Southwestern: [[Iranian Persian]], [[Tajik]]
  +
* Northwestern: [[Kurdish]] ([[Kurmanji]], [[Sorani]])
  +
* Southeastern: [[Pashto]]
  +
* Northeastern: [[Ossetian]]
  +
   
 
[[Category:Iranian languages]]
 
[[Category:Iranian languages]]

Latest revision as of 11:36, 30 July 2018

The Iranian languages include Iranian Persian, Dari, Tajik (three varieties of Modern Persian), Pashto, Balochi, Kurdish, Ossetian, Tat, and several dozen other languages.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status[edit]

The ultimate goal is to have multi-purposable transducers for a variety of Iranian languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

Transducers[edit]

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name Language ISO 639 formalism state stems coverage location primary authors
-2 -3
apertium-pes Iranian Persian (Farsi) pes lttoolbox development apertium-pes (incubator)
apertium-tgk Tajik tg tgk lttoolbox development
apertium-glk Gilaki glk lttoolbox development apertium-glk (incubator) ronl, Fran
apertium-oss Ossetian os oss lttoolbox development apertium-oss (incubator)


name Language ISO 639 formalism state stems paradigms coverage location primary authors
-2 -3
apertium-kmr Kurdish (Kurmanji) ku kmr lttoolbox development 17,771 157 [[Apertium-kmr#Current_State|~Apertium-kmr/stats/average%]] apertium-kmr (languages) Fran, Memduh
apertium-pes Iranian Persian (Farsi) fa pes lttoolbox development 13,167 113 [[Apertium-pes#Current_State|~Apertium-pes/stats/average%]] apertium-pes (languages) Fran, ...
apertium-tgk Tajik tg tgk lttoolbox development 2,784 79 [[Apertium-tgk#Current_State|~Apertium-tgk/stats/average%]] apertium-tgk (languages) Fran, ...
apertium-oss Ossetian os oss lttoolbox prototype 111 ~17% apertium-oss (languages) Fran, ...
apertium-glk Gilaki glk lttoolbox prototype 4 28 [[Apertium-oss#Current_State|~Apertium-glk/stats/average%]] apertium-glk (languages) Fran, ronl
apertium-ckb Central Kurdish (Sorani) ckb lttoolbox prototype 2 [[Apertium-kmr#Current_State|~Apertium-ckb/stats/average%]] apertium-ckb (languages) Fran, Memduh

Table of Existing Pairs[edit]

Text in italics denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in bold denotes a stable well-working language pair in trunk and text in bold and italics denotes a pair in staging. Bidix stems as counted with dixcounter are displayed below.

pes tgk glk oss fas
pes - tgk-pes
504
pes-glk
8
tgk tgk-pes
504
-
glk pes-glk
8
-
oss -
fas -
eng tg-en
?
epo eo-fa
?
urd ur-fa
?

Language Codes[edit]

Note that fas(/per) and fa are macrocodes for Persian, which includes Farsi (Iranian Persian - pes), Dari (Afghan Persian - prs), and Tajik (tgk).

Samples[edit]

Article 1 of the Universal Declaration of Human Rights:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Language Text
Ossetian Адӕймӕгтӕ се¢ ппӕт дӕр райгуырынц сӕрибарӕй ӕмӕ ӕмхуызонӕй сӕ барты. Уыдон ӕххӕст сты зонд ӕмӕ намысӕй, ӕмӕ кӕрӕдзийӕн хъуамӕ уой ӕфсымӕрты хуызӕн.
Pashto, Northern د بشر ټول افراد آزاد نړۍ ته راځی او د حيثيت او حقوقو له پلوه سره برابر دی. ټول د عقل او وجدان خاوندان دی او يو له بل سره د ورورۍ په روحيې سره بايد چلند کړی.
Kurdish, Northern Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin.
Tajik Тамоми одамон озод ва аз лиҳози шарафу ҳуқуқ ба ҳам баробар ба дунё меоянд. Онҳо соҳиби ақлу виҷдонанд ва бояд бо якдигар муносибати бародарона дошта бошанд.
Iranian Persian تمام افراد بشر آزاد بدنیا میایند و از لحاظ حیثیت و حقوق با هم برابرند. همه دارای عقل و وجدان میباشند و باید نسبت بیکدیگر با روح برادری رفتار کنند.
Dari تمام افراد بشر آزاد به دنیا می‌آیند و از لحاظ حیثیت و حقوق با هم برابرند. همه دارای عقل و وجدان هستند و باید نسبت به یکدیگر با روح برادری رفتار کنند.

Vulnerability[edit]

This table summarizes the vulnerability of various Iranian languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.

Language ISO639-3 Location Speakers Status
Ethnologue UNESCO
Avestan ave Iran 0 10 (Extinct) -
Pahlavani phv Afghanistan 0 9 (Dormant) -
Koroshi ktl Iran 180 8b (Nearly extinct) 4 (Critically endangered)
Kumzari zum Oman 2,300 8a (Moribund) 3 (Severely endangered)
Parachi prc Afghanistan 3,500 7 (Shifting) 2 (Definitely endangered)
Bashkardi bsg Iran 7,030 7 (Shifting) 2 (Definitely endangered)
Gazi gzi Iran 7,030 7 (Shifting) 2 (Definitely endangered)
Sivandi siy Iran 7,030 7 (Shifting) -
Fars, Northwestern faz Iran 7,500 7 (Shifting) -
Dari, Zoroastrian gbz Iran 8,000 7 (Shifting) 2 (Definitely endangered)
Shabak sdb Iraq 15,000 7 (Shifting) -
Karingani kgn Iran 17,600 7 (Shifting) -
Vafsi vaf Iran 18,000 7 (Shifting) 2 (Definitely endangered)
Bajelani bjm Iraq 20,000 7 (Shifting) -
Ashtiani atn Iran 21,100 7 (Shifting) 2 (Definitely endangered)
Khunsari kfm Iran 21,100 7 (Shifting) 2 (Definitely endangered)
Tat, Muslim ttt Azerbaijan 28,010 7 (Shifting) 3 (Severely endangered)
Harzani hrz Iran 28,100 7 (Shifting) -
Dzhidi jpr Israel & Iran 60,000 7 (Shifting) 2 (Definitely endangered)
Fars, Southwestern fay Iran 100,000 7 (Shifting) -
Bukharic bhh Israel & Uzbekistan 110,000 7 (Shifting) 2 (Definitely endangered)
Gurani hac Iraq 200,000 7 (Shifting) -
Takestani tks Iran 220,000 7 (Shifting) -
Mazanderani mzn Iran 3,270,000 7 (Shifting) -
Alviri-Vidari avd Iran ? 7 (Shifting) -
Eshtehardi esh Iran ? 7 (Shifting) -
Gozarkhani goz Iran ? 7 (Shifting) -
Kabatei xkp Iran ? 7 (Shifting) -
Kajali xkj Iran ? 7 (Shifting) -
Kho’ini xkc Iran ? 7 (Shifting) -
Koresh-e Rostam okh Iran ? 7 (Shifting) -
Maraghei vmh Iran ? 7 (Shifting) -
Razajerdi rat Iran ? 7 (Shifting) -
Rudbari rdb Iran ? 7 (Shifting) -
Shahrudi shm Iran ? 7 (Shifting) -
Taromi, Upper tov Iran ? 7 (Shifting) -
Sarli sdf Iraq Fewer than 20,000. 7 (Shifting) -
Ishkashimi isk Afghanistan 3,000 6b (Threatened) -
Yidgha ydg Pakistan 6,150 6b (Threatened) 2 (Definitely endangered)
Yazgulyam yah Tajikistan 9,000 6b (Threatened) 3 (Severely endangered)
Yagnobi yai Tajikistan 12,000 6b (Threatened) 2 (Definitely endangered)
Sarikoli srh China 16,000 6b (Threatened) 2 (Definitely endangered)
Lasgerdi lsa Iran 1,000 6a (Vigorous) -
Sanglechi sgy Afghanistan 2,200 6a (Vigorous) -
Munji mnj Afghanistan 5,300 6a (Vigorous) 3 (Severely endangered)
Ormuri oru Pakistan, Afghanistan 6,050 6a (Vigorous) 2 (Definitely endangered)
Natanzi ntz Iran 7,030 6a (Vigorous) 3 (Severely endangered)
Nayini nyq Iran 7,030 6a (Vigorous) 3 (Severely endangered)
Soi soj Iran 7,030 6a (Vigorous) -
Sorkhei sqo Iran 10,000 6a (Vigorous) -
Dehwari deh Pakistan 13,000 6a (Vigorous) -
Sangisari sgr Iran 36,000 6a (Vigorous) -
Wakhi wbl China, Pakistan, Tajikistan, Afghanistan 47,100 6a (Vigorous) 2 (Definitely endangered)
Semnani smy Iran 60,000 6a (Vigorous) -
Shughni sgh Tajikistan 80,000 6a (Vigorous) 3 (Severely endangered)
Lari lrl Iran 80,000, 6a (Vigorous) -
Waneci wne Pakistan 95,000 6a (Vigorous) -
Parsi prp India 326,000 6a (Vigorous) -
Parsi-Dari prd Iran 350,000 6a (Vigorous) -
Aimaq aiq Afghanistan 650,000 6a (Vigorous) -
Luri, Southern luz Iran 875,000 6a (Vigorous) -
Laki lki Iran 1,000,000 6a (Vigorous) -
Bakhtiâri bqi Iran 1,000,000 6a (Vigorous) -
Luri, Northern lrc Iran 1,500,000 6a (Vigorous) -
Kurdish, Southern sdh Iran 3,000,000 6a (Vigorous) -
Pashto, Central pst Pakistan 7,920,000 6a (Vigorous) -
Shahmirzadi srz Iran ? 6a (Vigorous) -
Dezfuli def Iran ? 6a (Vigorous) -
Khalaj kjf Azerbaijan 42,100 5 (Developing) -
Ossetic oss Georgia, Russian Federation 577,450 5 (Developing) 1 (Vulnerable)
Hazaragi haz Afghanistan 2,210,000 5 (Developing) -
Judeo-Tat jdt Azerbaijan, Russian Federation 2,010 4 (Educational) 2 (Definitely endangered)
Zazaki, Northern kiu Turkey 140,000 4 (Educational) -
Talysh tly Azerbaijan, Iran 915,400 4 (Educational) 1 (Vulnerable)
Zazaki, Southern diq Turkey 1,500,000 4 (Educational) -
Balochi, Western bgn Pakistan 1,799,840 4 (Educational) -
Balochi, Eastern bgp Pakistan 1,800,800 4 (Educational) -
Gilaki glk Iran 3,270,000 4 (Educational) -
Balochi, Southern bcc Pakistan 3,405,000 4 (Educational) -
Pashto, Northern pbu Pakistan 11,430,000 4 (Educational) -
Kurdish, Northern kmr Turkey 20,210,872 3 (Wider communication) -
Kurdish, Central ckb Iraq 6,750,000 2 (Provincial) -
Tajiki tgk Tajikistan 4,479,650 1 (National) -
Pashto, Southern pbt Afghanistan 7,590,100 1 (National) -
Dari prs Afghanistan 9,600,000 1 (National) -
Persian, Iranian pes Iran 47,045,100 1 (National) -

Classification[edit]