Difference between revisions of "User:Sushain/SemeticLanguages"

From Apertium
Jump to navigation Jump to search
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
The '''Uralic languages''' (<code>[http://www.ethnologue.com/subgroups/uralic urj]</code>) constitute a language family of some three dozen related languages descended from a Proto-Uralic language and spoken by more than 25 million people throughout Europe and Northern Asia. Hungarian, Finnish, and Estonian are the Uralic languages with the most native speakers.
+
The '''Semitic languages''' (<code>[http://www.ethnologue.com/subgroups/semutic sem]</code>) constitute a group of related languages and a branch of the Afro-Asiatic language family. Spoken by more than 470 million people throughout North Africa and Southwest Asia, the most widely spoken Semitic languages are [[Arabic]], [[Maltese]], [[Hebrew]], [[Amharic]], and [[Tigrigna]].
   
 
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
 
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
   
 
==Status==
 
==Status==
The ultimate goal is to have multi-purposable transducers for a variety of Uralic languages. These can then be paired for X→Y translation with the addition of a [[Constraint Grammar|CG]] for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
+
The ultimate goal is to have multi-purposable transducers for a variety of Semitic languages. These can then be paired for X→Y translation with the addition of a [[Constraint Grammar|CG]] for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
   
 
===Transducers===
 
===Transducers===
Line 26: Line 26:
 
! -2
 
! -2
 
! -3
 
! -3
|-
 
| <code>[[apertium-ara]]</code>
 
|| [[Arabic]]
 
|| العربية
 
||<code>ar</code>
 
|| <code>ara</code>
 
|| ?
 
|| ?
 
|align="right"| ?
 
|align="right"| ?
 
|align="center"|
 
|| ?
 
|| ?
 
 
|-
 
|-
 
| <code>[[apertium-heb]]</code>
 
| <code>[[apertium-heb]]</code>
 
|| [[Hebrew]]
 
|| [[Hebrew]]
|| עִבְרִית
+
|align="right"| עִבְרִית
 
||<code>he</code>
 
||<code>he</code>
 
|| <code>heb</code>
 
|| <code>heb</code>
  +
|| [[lttoolbox]]
|| ?
 
|| ?
+
|| development
|align="right"| ?
+
|align="right"| {{#lst:Apertium-ara-heb/stats|heb-stems}}
|align="right"| ?
+
|align="right"| {{#lst:Apertium-ara-heb/stats|heb-paradigms}}
 
|align="center"|
 
|align="center"|
  +
|| [[apertium-ara-heb]] ([[incubator]])
|| ?
 
|| ?
+
|| missmaryx
 
|-
 
|-
 
| <code>[[apertium-mlt]]</code>
 
| <code>[[apertium-mlt]]</code>
Line 58: Line 45:
 
||<code>mt</code>
 
||<code>mt</code>
 
|| <code>mlt</code>
 
|| <code>mlt</code>
  +
|| [[lttoolbox]]
|| ?
 
|| ?
+
|| development
|align="right"| ?
+
|align="right"| {{#lst:Apertium-mlt/stats|stems}}
|align="right"| ?
+
|align="right"| {{#lst:Apertium-mlt/stats|paradigms}}
 
|align="center"|
 
|align="center"|
  +
|| [[apertium-mlt]] ([[languages]])
|| ?
 
  +
|| [[User:Francis_Tyers|Fran]], [[User:Unhammer|Unhammer]], Fronczak
|| ?
 
  +
|-
  +
| <code>[[apertium-ara]]</code>
  +
|| [[Arabic]]
  +
|align="right"| العربية
  +
||<code>ar</code>
  +
|| <code>ara</code>
  +
|| [[lttoolbox]]
  +
|| development
  +
|align="right"| {{#lst:Apertium-ara-heb/stats|ara-stems}}
  +
|align="right"| {{#lst:Apertium-ara-heb/stats|ara-paradigms}}
  +
|align="center"|
  +
|| [[apertium-ara-heb]] ([[incubator]])
  +
|| missmaryx
 
|}
 
|}
   
== Semetic languages by subgroup ==
+
=== Existing language pairs ===
  +
Text in ''italics'' denotes language pairs in the incubator. Regular text denotes a developing language pair in staging, while text in '''bold''' denotes a stable well-working language pair in trunk and text in '''''bold and italics''''' denotes a pair in staging. Bidix stems as counted with [[dixcounter]] are displayed below.
There are six fairly uncontroversial nodes within the Semitic languages:
 
 
*East Semitic languages: Akkadian, Eblaite (extinct)
 
*Central Semitic languages
 
**Northwest Semitic languages: [[Aramaic]], [[Canaanite languages]], [[Hebrew]]
 
**Arabic languages: Classical Arabic, [[Arabic|Standard Arabic]], [[Maltese]], etc.
 
*South Semitic languages
 
**Western: [[Ethiopic languages]] ([[Amharic]], [[Tigrinya]], etc.) and Old South Arabian languages (Sabaean, Minaean, Qatabānian, Ḥaḑramitic, etc.)
 
**Eastern: Modern South Arabian languages (Bathari, Harsusi, Hobyót, Mehri, Shehri, Soqotri)
 
 
== Existing language pairs ==
 
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
 
   
 
{| style="text-align: center;" class="wikitable"
 
{| style="text-align: center;" class="wikitable"
 
|- style="background: #ececec"
 
|- style="background: #ececec"
! !! mlt !! heb !! ara
+
! !! heb !! mlt !! ara
 
|-
 
|-
| '''mlt''' || - || [[Apertium-mt-he|mt-he]]<br>3,634 || [[Apertium-mt-ar|mt-ar]]<br>7,570
+
| '''heb''' || - || '''''[[Apertium-mt-he|mt-he]]'''''<br>3,634 || ''[[Apertium-ara-heb|ara-heb]]''<br>131
 
|-
 
|-
| '''heb''' || [[Apertium-mt-he|mt-he]]<br>3,634 || - || ''[[Apertium-ara-heb|ara-heb]]''<br>131
+
| '''mlt''' || '''''[[Apertium-mt-he|mt-he]]'''''<br>3,634 || - || '''''[[Apertium-mt-ar|mt-ar]]'''''<br>7,570
 
|-
 
|-
| '''ara''' || [[Apertium-mt-ar|mt-ar]]<br>7,570 || ''[[Apertium-ara-heb|ara-heb]]''<br>131 || -
+
| '''ara''' || ''[[Apertium-ara-heb|ara-heb]]''<br>131 || '''''[[Apertium-mt-ar|mt-ar]]'''''<br>7,570 || -
 
|-
 
|-
 
| || || ||
 
| || || ||
 
|-
 
|-
| '''eng''' || ''[[Apertium-en-mt|en-mt]]''<br>814 || ||
+
| '''eng''' || || ''[[Apertium-en-mt|en-mt]]''<br>814 ||
 
|-
 
|-
| '''epo''' || || ''[[Apertium-eo-he|eo-he]]''<br>1,505 ||
+
| '''epo''' || ''[[Apertium-eo-he|eo-he]]''<br>1,505 || ||
 
|}
 
|}
   
  +
== Semitic languages by subgroup ==
{| style="text-align: center;" class="wikitable"
 
  +
There are six fairly uncontroversial nodes within the Semitic languages:
|- style="background: #ececec"
 
! !! mlt !! heb !! ara
 
|-
 
| '''mlt''' || - || [[Apertium-mt-he|mt-he]]<br>{{#lst:Apertium-mt-he/stats|mt-he-stems}} || [[Apertium-mt-ar|mt-ar]]<br>{{#lst:Apertium-mt-ar/stats|mt-ar-stems}}
 
|-
 
| '''heb''' || [[Apertium-mt-he|mt-he]]<br>{{#lst:Apertium-mt-he/stats|mt-he-stems}} || - || ''[[Apertium-ara-heb|ara-heb]]''<br>{{#lst:Apertium-ara-heb/stats|ara-heb-stems}}
 
|-
 
| '''ara''' || [[Apertium-mt-ar|mt-ar]]<br>{{#lst:Apertium-mt-ar/stats|mt-ar-stems}} || ''[[Apertium-ara-heb|ara-heb]]''<br>{{#lst:Apertium-ara-heb/stats|ara-heb-stems}} || -
 
|-
 
| || || ||
 
|-
 
| '''eng''' || ''[[Apertium-en-mt|en-mt]]''<br>{{#lst:Apertium-en-mt/stats|en-mt-stems}} || ||
 
|-
 
| '''epo''' || || ''[[Apertium-eo-he|eo-he]]''<br>{{#lst:Apertium-eo-he/stats|eo-he-stems}} ||
 
|}
 
   
  +
*East Semitic languages: Akkadian, Eblaite (extinct)
  +
*Central Semitic languages
  +
**Northwest Semitic languages: [[Aramaic]], [[Canaanite languages]], [[Hebrew]]
  +
**Arabic languages: Classical Arabic, [[Arabic|Standard Arabic]], [[Maltese]], etc.
  +
*South Semitic languages
  +
**Western: [[Ethiopic languages]] ([[Amharic]], [[Tigrinya]], etc.) and Old South Arabian languages (Sabaean, Minaean, Qatabānian, Ḥaḑramitic, etc.)
  +
**Eastern: Modern South Arabian languages (Bathari, Harsusi, Hobyót, Mehri, Shehri, Soqotri)
   
 
==Samples==
 
==Samples==
Line 124: Line 106:
 
! Language !! Text
 
! Language !! Text
 
|-
 
|-
  +
|| Arabic
|| Maltese || Il-bnedmin kollha jitwieldu ħielsa u ugwali fid-dinjità u d-drittijiet. Huma mogħnija bir-raġuni u bil-kuxjenza u għandhom iġibu ruħhom ma’ xulxin bi spirtu ta’ aħwa.
 
  +
|align="right"| يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق. وقد وهبوا عقلاً وضميرًا وعليهم أن يعامل بعضهم بعضًا بروح الإخاء.
 
|-
 
|-
  +
|| Maltese
|| Hebrew || כל בני אדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחוה.
 
  +
|| Il-bnedmin kollha jitwieldu ħielsa u ugwali fid-dinjità u d-drittijiet. Huma mogħnija bir-raġuni u bil-kuxjenza u għandhom iġibu ruħhom ma’ xulxin bi spirtu ta’ aħwa.
 
|-
 
|-
  +
|| Hebrew
|| Arabic || يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق. وقد وهبوا عقلاً وضميرًا وعليهم أن يعامل بعضهم بعضًا بروح الإخاء.
 
  +
|align="right"| כל בני אדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחוה.
  +
|-
  +
|| Amharic
  +
|| የሰው፡ልጅ፡ሁሉ፡ሲወለድ፡ነጻና፡በክብርና፡በመብትም፡እኩልነት፡ያለው፡ነው።፡የተፈጥሮ፡ማስተዋልና፡ሕሊና፡ስላለው፡አንዱ፡ሌላውን፡በወንድማማችነት፡መንፈስ፡መመልከት፡ይገባዋል።
  +
|-
  +
|| Tigrigna
  +
|| ብመንፅር ክብርን መሰልን ኩሎም ሰባት እንትውለዱ ነፃን ማዕሪን እዮም፡፡ ምስትውዓልን ሕልናን ዝተዓደሎም ብምዃኖም ንሕድሕዶም ብሕውነታዊ መንፈስ ክተሓላለዩ ኦለዎም፡፡
 
|}
 
|}
 
This article uses material from the Wikipedia article [https://en.wikipedia.org/wiki/Semetic_languages "Semetic languages"], which is released under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share-Alike License 3.0].
 
   
 
==Vulnerability==
 
==Vulnerability==
This table summarizes the vulnerability of various Semetic languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, [http://www.unesco.org/culture/languages-atlas http://www.unesco.org/culture/languages-atlas]’.
+
This table summarizes the vulnerability of various Semitic languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, [http://www.unesco.org/culture/languages-atlas http://www.unesco.org/culture/languages-atlas]’ and [http://www.ethnologue.com/ Ethnologue].
   
{|class="wikitable sortable"
+
{| class="wikitable sortable"
  +
!rowspan=2| Language
! Language !! ISO639-3 !! Areas !! Vulnerability
 
  +
!rowspan=2| ISO639-3
  +
!rowspan=2| Location
  +
!rowspan=2| Speakers
  +
!colspan=2|Status
  +
|-class="sortbottom"
  +
! Ethnologue
  +
! UNESCO
 
|-
 
|-
  +
|| Jewish Babylonian Aramaic
|| Ge'ez
 
|align="center"| <code>gez</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/tmr tmr]</code>
|| Ethiopia
+
|| Iraq
  +
|align="right"| 0
|| 5 - Extinct
 
  +
|| 10 (Extinct)
  +
|| -
 
|-
 
|-
|| Mlahso (Syria)
+
|| Mlahsö
|align="center"| <code>lhs</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/lhs lhs]</code>
 
|| Syrian Arab Republic
 
|| Syrian Arab Republic
  +
|align="right"| 0
|| 5 - Extinct
 
  +
|| 10 (Extinct)
  +
|| 5 (Extinct)
 
|-
 
|-
  +
|| Mandaic, Classical
|| Lishanid Noshan (Iraq)
 
|align="center"| <code>aij</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/myz myz]</code>
|| Iraq
+
|| Iran
  +
|align="right"| 0
|| 5 - Extinct
 
  +
|| 10 (Extinct)
  +
|| -
 
|-
 
|-
  +
|| Mesmes
|| Lishana Deni (Iraq)
 
|align="center"| <code>lsd</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/mys mys]</code>
|| Iraq
+
|| Ethiopia
  +
|align="right"| 0
|| 5 - Extinct
 
  +
|| 10 (Extinct)
  +
|| -
 
|-
 
|-
  +
|| Syriac
|| Lishan Didan (Iran)
 
|align="center"| <code>trg</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/syc syc]</code>
  +
|| Turkey
|| Iran (Islamic Republic of)
 
  +
|align="right"| 0
|| 5 - Extinct
 
  +
|| 9 (Dormant)
  +
|| -
 
|-
 
|-
|| Hulaula (Iran)
+
|| Hebrew, Ancient
|align="center"| <code>huy</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/hbo hbo]</code>
  +
|| Israel
|| Iran (Islamic Republic of)
 
  +
|align="right"| 0
|| 5 - Extinct
 
  +
|| 9 (Dormant)
  +
|| -
 
|-
 
|-
  +
|| Geez
|| Barzani Jewish Neo-Aramaic (Iraq)
 
|align="center"| <code>bjf</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/gez gez]</code>
|| Iraq
+
|| Ethiopia
  +
|align="right"| 0
|| 5 - Extinct
 
  +
|| 9 (Second language only)
  +
|| 5 (Extinct)
  +
|-
  +
|| Samaritan Aramaic
  +
|align="center"| <code>[http://www.ethnologue.com/language/sam sam]</code>
  +
|| Palestine
  +
|align="right"| 620
  +
|| 9 (Dormant)
  +
|| -
  +
|-
  +
|| Samaritan
  +
|align="center"| <code>[http://www.ethnologue.com/language/smp smp]</code>
  +
|| Palestine
  +
|align="right"| 620
  +
|| 9 (Dormant)
  +
|| -
  +
|-
  +
|| Barzani Jewish Neo-Aramaic
  +
|align="center"| <code>[http://www.ethnologue.com/language/bjf bjf]</code>
  +
|| Israel & Iraq
  +
|align="right"| 20
  +
|| 8b (Nearly extinct)
  +
|| 5 (Extinct)
 
|-
 
|-
 
|| Bathari
 
|| Bathari
|align="center"| <code>bhm</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/bhm bhm]</code>
 
|| Oman
 
|| Oman
  +
|align="right"| 200
|| 4 - Critically endangered
 
  +
|| 8b (Nearly extinct)
  +
|| 4 (Critically endangered)
 
|-
 
|-
|| Argobba
+
|| Senaya
|align="center"| <code>agj</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/syn syn]</code>
|| Ethiopia
+
|| Iran
  +
|align="right"| 460
|| 4 - Critically endangered
 
  +
|| 8b (Nearly extinct)
  +
|| 4 (Critically endangered)
 
|-
 
|-
|| Mandaic
+
|| Hobyót
|align="center"| <code>mid</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/hoh hoh]</code>
  +
|| Oman, Yemen
|| Iran (Islamic Republic of), Iraq
 
  +
|align="right"| 100
|| 4 - Critically endangered
 
  +
|| 8a (Moribund)
  +
|| 3 (Severely endangered)
 
|-
 
|-
  +
|| Arabic, Uzbeki Spoken
|| Senaya
 
|align="center"| <code>syn</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/auz auz]</code>
  +
|| Uzbekistan
|| Iran (Islamic Republic of)
 
  +
|align="right"| 700
|| 4 - Critically endangered
 
  +
|| 8a (Moribund)
  +
|| -
 
|-
 
|-
|| Hértevin
+
|| Hulaulá
|align="center"| <code>hrt</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/huy huy]</code>
|| Turkey
+
|| Israel & Iran
  +
|align="right"| 10,350
|| 4 - Critically endangered
 
  +
|| 8a (Moribund)
  +
|| 5 (Extinct)
 
|-
 
|-
 
|| Soqotri
 
|| Soqotri
|align="center"| <code>sqt</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/sqt sqt]</code>
 
|| Yemen
 
|| Yemen
  +
|align="right"| 64,000
|| 3 - Severely endangered
 
  +
|| 8a (Moribund)
  +
|| 3 (Severely endangered)
 
|-
 
|-
|| Jibbali
+
|| Harsusi
|align="center"| <code>shv</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/hss hss]</code>
 
|| Oman
 
|| Oman
  +
|align="right"| 600
|| 3 - Severely endangered
 
  +
|| 7 (Shifting)
  +
|| 2 (Definitely endangered)
 
|-
 
|-
  +
|| Bohtan Neo-Aramaic
|| Hobyot
 
|align="center"| <code>hoh</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/bhn bhn]</code>
  +
|| Georgia, Russian Federation
  +
|align="right"| 1,000
  +
|| 7 (Shifting)
  +
|| 3 (Severely endangered)
  +
|-
  +
|| Arabic, Cypriot Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/acy acy]</code>
  +
|| Cyprus
  +
|align="right"| 1,300
  +
|| 7 (Shifting)
  +
|| 3 (Severely endangered)
  +
|-
  +
|| Lishanid Noshan
  +
|align="center"| <code>[http://www.ethnologue.com/language/aij aij]</code>
  +
|| Israel & Iraq
  +
|align="right"| 2,200
  +
|| 7 (Shifting)
  +
|| 5 (Extinct)
  +
|-
  +
|| Lishán Didán
  +
|align="center"| <code>[http://www.ethnologue.com/language/trg trg]</code>
  +
|| Israel & Iran
  +
|align="right"| 4,450
  +
|| 7 (Shifting)
  +
|| 5 (Extinct)
  +
|-
  +
|| Mandaic
  +
|align="center"| <code>[http://www.ethnologue.com/language/mid mid]</code>
  +
|| Iran, Iraq
  +
|align="right"| 5,500
  +
|| 7 (Shifting)
  +
|| 4 (Critically endangered)
  +
|-
  +
|| Lishana Deni
  +
|align="center"| <code>[http://www.ethnologue.com/language/lsd lsd]</code>
  +
|| Israel & Iraq
  +
|align="right"| 7,500
  +
|| 7 (Shifting)
  +
|| 5 (Extinct)
  +
|-
  +
|| Western Neo-Aramaic
  +
|align="center"| <code>[http://www.ethnologue.com/language/amw amw]</code>
  +
|| Syrian Arab Republic
  +
|align="right"| 15,000
  +
|| 7 (Shifting)
  +
|| 2 (Definitely endangered)
  +
|-
  +
|| Arabic, Judeo-Tripolitanian
  +
|align="center"| <code>[http://www.ethnologue.com/language/yud yud]</code>
  +
|| Israel
  +
|align="right"| 35,000
  +
|| 7 (Shifting)
  +
|| -
  +
|-
  +
|| Arabic, Judeo-Tunisian
  +
|align="center"| <code>[http://www.ethnologue.com/language/ajt ajt]</code>
  +
|| Israel
  +
|align="right"| 45,500
  +
|| 7 (Shifting)
  +
|| 3 (Severely endangered)
  +
|-
  +
|| Mehri
  +
|align="center"| <code>[http://www.ethnologue.com/language/gdq gdq]</code>
 
|| Oman, Yemen
 
|| Oman, Yemen
  +
|align="right"| 115,200
|| 3 - Severely endangered
 
  +
|| 7 (Shifting)
  +
|| 2 (Definitely endangered)
  +
|-
  +
|| Arabic, Judeo-Iraqi
  +
|align="center"| <code>[http://www.ethnologue.com/language/yhd yhd]</code>
  +
|| Israel
  +
|align="right"| 151,820
  +
|| 7 (Shifting)
  +
|| -
  +
|-
  +
|| Chaldean Neo-Aramaic
  +
|align="center"| <code>[http://www.ethnologue.com/language/cld cld]</code>
  +
|| Iraq
  +
|align="right"| 206,000
  +
|| 7 (Shifting)
  +
|| -
  +
|-
  +
|| Arabic, Judeo-Moroccan
  +
|align="center"| <code>[http://www.ethnologue.com/language/aju aju]</code>
  +
|| Israel
  +
|align="right"| 258,930
  +
|| 7 (Shifting)
  +
|| 2 (Definitely endangered)
 
|-
 
|-
 
|| Zay
 
|| Zay
|align="center"| <code>zwa</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/zwa zwa]</code>
 
|| Ethiopia
 
|| Ethiopia
  +
|align="right"| 4,880
|| 3 - Severely endangered
 
  +
|| 6b (Threatened)
  +
|| 3 (Severely endangered)
 
|-
 
|-
|| Tunisian Judeo-Arabic (Israel)
+
|| Arabic, Tajiki Spoken
|align="center"| <code>ajt</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/abh abh]</code>
|| Israel
+
|| Tajikistan
  +
|align="right"| 6,000
|| 3 - Severely endangered
 
  +
|| 6b (Threatened)
  +
|| -
 
|-
 
|-
|| Cypriot Arabic
+
|| Shehri
|align="center"| <code>acy</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/shv shv]</code>
|| Cyprus
+
|| Oman
  +
|align="right"| 25,000
|| 3 - Severely endangered
 
  +
|| 6b (Threatened)
  +
|| 3 (Severely endangered)
  +
|-
  +
|| Argobba
  +
|align="center"| <code>[http://www.ethnologue.com/language/agj agj]</code>
  +
|| Ethiopia
  +
|align="right"| 43,700
  +
|| 6b (Threatened)
  +
|| 4 (Critically endangered)
 
|-
 
|-
 
|| Turoyo
 
|| Turoyo
|align="center"| <code>tru</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/tru tru]</code>
 
|| Syrian Arab Republic, Turkey
 
|| Syrian Arab Republic, Turkey
  +
|align="right"| 62,000
|| 3 - Severely endangered
 
  +
|| 6b (Threatened)
  +
|| 3 (Severely endangered)
 
|-
 
|-
|| Bohtan Neo-Aramaic
+
|| Assyrian Neo-Aramaic
|align="center"| <code>bhn</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/aii aii]</code>
  +
|| Iraq
|| Georgia, Russian Federation
 
  +
|align="right"| 232,300
|| 3 - Severely endangered
 
  +
|| 6b (Threatened)
  +
|| -
 
|-
 
|-
  +
|| Koy Sanjaq Surat
|| Mehri
 
|align="center"| <code>gdq</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/kqd kqd]</code>
|| Oman, Yemen
+
|| Iraq
  +
|align="right"| 800
|| 2 - Definitely endangered
 
  +
|| 6a (Vigorous)
  +
|| -
 
|-
 
|-
|| Harsusi
+
|| Hértevin
|align="center"| <code>hss</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/hrt hrt]</code>
  +
|| Turkey
  +
|align="right"| 1,000
  +
|| 6a (Vigorous)
  +
|| 4 (Critically endangered)
  +
|-
  +
|| Dahalik
  +
|align="center"| <code>[http://www.ethnologue.com/language/dlk dlk]</code>
  +
|| Eritrea
  +
|align="right"| 2,500
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Harari
  +
|align="center"| <code>[http://www.ethnologue.com/language/har har]</code>
  +
|| Ethiopia
  +
|align="right"| 25,800
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, Shihhi Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/ssh ssh]</code>
  +
|| United Arab Emirates
  +
|align="right"| 27,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, Judeo-Yemeni
  +
|align="center"| <code>[http://www.ethnologue.com/language/jye jye]</code>
  +
|| Israel
  +
|align="right"| 51,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, Dhofari Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/adf adf]</code>
 
|| Oman
 
|| Oman
  +
|align="right"| 70,000
|| 2 - Definitely endangered
 
  +
|| 6a (Vigorous)
  +
|| -
 
|-
 
|-
|| Moroccan Judeo-Arabic (Israel)
+
|| Arabic, Algerian Saharan Spoken
|align="center"| <code>aju</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/aao aao]</code>
  +
|| Algeria
  +
|align="right"| 130,500
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Mesqan
  +
|align="center"| <code>[http://www.ethnologue.com/language/mvz mvz]</code>
  +
|| Ethiopia
  +
|align="right"| 195,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Kistane
  +
|align="center"| <code>[http://www.ethnologue.com/language/gru gru]</code>
  +
|| Ethiopia
  +
|align="right"| 255,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Inor
  +
|align="center"| <code>[http://www.ethnologue.com/language/ior ior]</code>
  +
|| Ethiopia
  +
|align="right"| 280,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, Hadrami Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/ayh ayh]</code>
  +
|| Yemen
  +
|align="right"| 410,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, Eastern Egyptian Bedawi Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/avl avl]</code>
  +
|| Egypt
  +
|align="right"| 1,690,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, Gulf Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/afb afb]</code>
  +
|| Iraq
  +
|align="right"| 3,601,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, Hijazi Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/acw acw]</code>
  +
|| Saudi Arabia
  +
|align="right"| 6,023,900
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, North Mesopotamian Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/ayp ayp]</code>
  +
|| Iraq
  +
|align="right"| 6,300,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, Ta’izzi-Adeni Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/acq acq]</code>
  +
|| Yemen
  +
|align="right"| 7,078,500
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, Sanaani Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/ayn ayn]</code>
  +
|| Yemen
  +
|align="right"| 7,600,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Arabic, Sa’idi Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/aec aec]</code>
  +
|| Egypt
  +
|align="right"| 19,000,000
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Wolane
  +
|align="center"| <code>[http://www.ethnologue.com/language/wle wle]</code>
  +
|| Ethiopia
  +
|align="right"| ?
  +
|| 6a (Vigorous)
  +
|| -
  +
|-
  +
|| Sebat Bet Gurage
  +
|align="center"| <code>[http://www.ethnologue.com/language/sgw sgw]</code>
  +
|| Ethiopia
  +
|align="right"| 440000
  +
|| 5 (Developing)
  +
|| -
  +
|-
  +
|| Arabic, Omani Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/acx acx]</code>
  +
|| Oman
  +
|align="right"| 853,900
  +
|| 5 (Developing)
  +
|| -
  +
|-
  +
|| Silt’e
  +
|align="center"| <code>[http://www.ethnologue.com/language/stv stv]</code>
  +
|| Ethiopia
  +
|align="right"| 935,000
  +
|| 4 (Educational)
  +
|| -
  +
|-
  +
|| Tigré
  +
|align="center"| <code>[http://www.ethnologue.com/language/tig tig]</code>
  +
|| Eritrea
  +
|align="right"| 1,050,000
  +
|| 4 (Educational)
  +
|| -
  +
|-
  +
|| Arabic, Baharna Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/abv abv]</code>
  +
|| Bahrain
  +
|align="right"| 310,000
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, Chadian Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/shu shu]</code>
  +
|| Chad
  +
|align="right"| 1,139,100
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, Sudanese Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/apd apd]</code>
  +
|| Sudan
  +
|align="right"| 1,833,000
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Hassaniyya
  +
|align="center"| <code>[http://www.ethnologue.com/language/mey mey]</code>
  +
|| Mauritania
  +
|align="right"| 3,278,190
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, Libyan Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/ayl ayl]</code>
  +
|| Libya
  +
|align="right"| 4,320,500
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, South Levantine Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/ajp ajp]</code>
  +
|| Jordan
  +
|align="right"| 6,200,000
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, Tunisian Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/aeb aeb]</code>
  +
|| Tunisia
  +
|align="right"| 9,406,900
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, Najdi Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/ars ars]</code>
  +
|| Saudi Arabia
  +
|align="right"| 9,670,000
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, North Levantine Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/apc apc]</code>
  +
|| Syria
  +
|align="right"| 14,426,540
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, Mesopotamian Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/acm acm]</code>
  +
|| Iraq
  +
|align="right"| 15,100,000
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, Moroccan Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/ary ary]</code>
  +
|| Morocco
  +
|align="right"| 21,048,600
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, Algerian Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/arq arq]</code>
  +
|| Algeria
  +
|align="right"| 27,997,000
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Arabic, Egyptian Spoken
  +
|align="center"| <code>[http://www.ethnologue.com/language/arz arz]</code>
  +
|| Egypt
  +
|align="right"| 53,990,000
  +
|| 3 (Wider communication)
  +
|| -
  +
|-
  +
|| Tigrigna
  +
|align="center"| <code>[http://www.ethnologue.com/language/tir tir]</code>
  +
|| Ethiopia
  +
|align="right"| 6,915,000
  +
|| 2 (Provincial)
  +
|| -
  +
|-
  +
|| Maltese
  +
|align="center"| <code>[http://www.ethnologue.com/language/mlt mlt]</code>
  +
|| Malta
  +
|align="right"| 429,000
  +
|| 1 (National)
  +
|| -
  +
|-
  +
|| Hebrew
  +
|align="center"| <code>[http://www.ethnologue.com/language/heb heb]</code>
 
|| Israel
 
|| Israel
  +
|align="right"| 5,302,770
|| 2 - Definitely endangered
 
  +
|| 1 (National)
  +
|| -
 
|-
 
|-
  +
|| Amharic
|| Western Neo-Aramaic
 
|align="center"| <code>amw</code>
+
|align="center"| <code>[http://www.ethnologue.com/language/amh amh]</code>
  +
|| Ethiopia
|| Syrian Arab Republic
 
  +
|align="right"| 21,811,560
|| 2 - Definitely endangered
 
  +
|| 1 (National)
  +
|| -
  +
|-
  +
|| Arabic, Standard
  +
|align="center"| <code>[http://www.ethnologue.com/language/arb arb]</code>
  +
|| Saudi Arabia
  +
|align="right"| 206,000,000
  +
|| 1 (National)
  +
|| -
 
|}
 
|}
  +
  +
This article uses material from the Wikipedia article [https://en.wikipedia.org/wiki/Semitic_languages "Semitic languages"], which is released under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share-Alike License 3.0].

Latest revision as of 07:52, 3 January 2014

The Semitic languages (sem) constitute a group of related languages and a branch of the Afro-Asiatic language family. Spoken by more than 470 million people throughout North Africa and Southwest Asia, the most widely spoken Semitic languages are Arabic, Maltese, Hebrew, Amharic, and Tigrigna.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status[edit]

The ultimate goal is to have multi-purposable transducers for a variety of Semitic languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

Transducers[edit]

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name language native name ISO 639 formalism state stems paradigms coverage location primary authors
-2 -3
apertium-heb Hebrew עִבְרִית he heb lttoolbox development apertium-ara-heb (incubator) missmaryx
apertium-mlt Maltese Malti mt mlt lttoolbox development 7,371 758 apertium-mlt (languages) Fran, Unhammer, Fronczak
apertium-ara Arabic العربية ar ara lttoolbox development apertium-ara-heb (incubator) missmaryx

Existing language pairs[edit]

Text in italics denotes language pairs in the incubator. Regular text denotes a developing language pair in staging, while text in bold denotes a stable well-working language pair in trunk and text in bold and italics denotes a pair in staging. Bidix stems as counted with dixcounter are displayed below.

heb mlt ara
heb - mt-he
3,634
ara-heb
131
mlt mt-he
3,634
- mt-ar
7,570
ara ara-heb
131
mt-ar
7,570
-
eng en-mt
814
epo eo-he
1,505

Semitic languages by subgroup[edit]

There are six fairly uncontroversial nodes within the Semitic languages:

  • East Semitic languages: Akkadian, Eblaite (extinct)
  • Central Semitic languages
  • South Semitic languages
    • Western: Ethiopic languages (Amharic, Tigrinya, etc.) and Old South Arabian languages (Sabaean, Minaean, Qatabānian, Ḥaḑramitic, etc.)
    • Eastern: Modern South Arabian languages (Bathari, Harsusi, Hobyót, Mehri, Shehri, Soqotri)

Samples[edit]

Article 1 of the Universal Declaration of Human Rights:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Language Text
Arabic يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق. وقد وهبوا عقلاً وضميرًا وعليهم أن يعامل بعضهم بعضًا بروح الإخاء.
Maltese Il-bnedmin kollha jitwieldu ħielsa u ugwali fid-dinjità u d-drittijiet. Huma mogħnija bir-raġuni u bil-kuxjenza u għandhom iġibu ruħhom ma’ xulxin bi spirtu ta’ aħwa.
Hebrew כל בני אדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחוה.
Amharic የሰው፡ልጅ፡ሁሉ፡ሲወለድ፡ነጻና፡በክብርና፡በመብትም፡እኩልነት፡ያለው፡ነው።፡የተፈጥሮ፡ማስተዋልና፡ሕሊና፡ስላለው፡አንዱ፡ሌላውን፡በወንድማማችነት፡መንፈስ፡መመልከት፡ይገባዋል።
Tigrigna ብመንፅር ክብርን መሰልን ኩሎም ሰባት እንትውለዱ ነፃን ማዕሪን እዮም፡፡ ምስትውዓልን ሕልናን ዝተዓደሎም ብምዃኖም ንሕድሕዶም ብሕውነታዊ መንፈስ ክተሓላለዩ ኦለዎም፡፡

Vulnerability[edit]

This table summarizes the vulnerability of various Semitic languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.

Language ISO639-3 Location Speakers Status
Ethnologue UNESCO
Jewish Babylonian Aramaic tmr Iraq 0 10 (Extinct) -
Mlahsö lhs Syrian Arab Republic 0 10 (Extinct) 5 (Extinct)
Mandaic, Classical myz Iran 0 10 (Extinct) -
Mesmes mys Ethiopia 0 10 (Extinct) -
Syriac syc Turkey 0 9 (Dormant) -
Hebrew, Ancient hbo Israel 0 9 (Dormant) -
Geez gez Ethiopia 0 9 (Second language only) 5 (Extinct)
Samaritan Aramaic sam Palestine 620 9 (Dormant) -
Samaritan smp Palestine 620 9 (Dormant) -
Barzani Jewish Neo-Aramaic bjf Israel & Iraq 20 8b (Nearly extinct) 5 (Extinct)
Bathari bhm Oman 200 8b (Nearly extinct) 4 (Critically endangered)
Senaya syn Iran 460 8b (Nearly extinct) 4 (Critically endangered)
Hobyót hoh Oman, Yemen 100 8a (Moribund) 3 (Severely endangered)
Arabic, Uzbeki Spoken auz Uzbekistan 700 8a (Moribund) -
Hulaulá huy Israel & Iran 10,350 8a (Moribund) 5 (Extinct)
Soqotri sqt Yemen 64,000 8a (Moribund) 3 (Severely endangered)
Harsusi hss Oman 600 7 (Shifting) 2 (Definitely endangered)
Bohtan Neo-Aramaic bhn Georgia, Russian Federation 1,000 7 (Shifting) 3 (Severely endangered)
Arabic, Cypriot Spoken acy Cyprus 1,300 7 (Shifting) 3 (Severely endangered)
Lishanid Noshan aij Israel & Iraq 2,200 7 (Shifting) 5 (Extinct)
Lishán Didán trg Israel & Iran 4,450 7 (Shifting) 5 (Extinct)
Mandaic mid Iran, Iraq 5,500 7 (Shifting) 4 (Critically endangered)
Lishana Deni lsd Israel & Iraq 7,500 7 (Shifting) 5 (Extinct)
Western Neo-Aramaic amw Syrian Arab Republic 15,000 7 (Shifting) 2 (Definitely endangered)
Arabic, Judeo-Tripolitanian yud Israel 35,000 7 (Shifting) -
Arabic, Judeo-Tunisian ajt Israel 45,500 7 (Shifting) 3 (Severely endangered)
Mehri gdq Oman, Yemen 115,200 7 (Shifting) 2 (Definitely endangered)
Arabic, Judeo-Iraqi yhd Israel 151,820 7 (Shifting) -
Chaldean Neo-Aramaic cld Iraq 206,000 7 (Shifting) -
Arabic, Judeo-Moroccan aju Israel 258,930 7 (Shifting) 2 (Definitely endangered)
Zay zwa Ethiopia 4,880 6b (Threatened) 3 (Severely endangered)
Arabic, Tajiki Spoken abh Tajikistan 6,000 6b (Threatened) -
Shehri shv Oman 25,000 6b (Threatened) 3 (Severely endangered)
Argobba agj Ethiopia 43,700 6b (Threatened) 4 (Critically endangered)
Turoyo tru Syrian Arab Republic, Turkey 62,000 6b (Threatened) 3 (Severely endangered)
Assyrian Neo-Aramaic aii Iraq 232,300 6b (Threatened) -
Koy Sanjaq Surat kqd Iraq 800 6a (Vigorous) -
Hértevin hrt Turkey 1,000 6a (Vigorous) 4 (Critically endangered)
Dahalik dlk Eritrea 2,500 6a (Vigorous) -
Harari har Ethiopia 25,800 6a (Vigorous) -
Arabic, Shihhi Spoken ssh United Arab Emirates 27,000 6a (Vigorous) -
Arabic, Judeo-Yemeni jye Israel 51,000 6a (Vigorous) -
Arabic, Dhofari Spoken adf Oman 70,000 6a (Vigorous) -
Arabic, Algerian Saharan Spoken aao Algeria 130,500 6a (Vigorous) -
Mesqan mvz Ethiopia 195,000 6a (Vigorous) -
Kistane gru Ethiopia 255,000 6a (Vigorous) -
Inor ior Ethiopia 280,000 6a (Vigorous) -
Arabic, Hadrami Spoken ayh Yemen 410,000 6a (Vigorous) -
Arabic, Eastern Egyptian Bedawi Spoken avl Egypt 1,690,000 6a (Vigorous) -
Arabic, Gulf Spoken afb Iraq 3,601,000 6a (Vigorous) -
Arabic, Hijazi Spoken acw Saudi Arabia 6,023,900 6a (Vigorous) -
Arabic, North Mesopotamian Spoken ayp Iraq 6,300,000 6a (Vigorous) -
Arabic, Ta’izzi-Adeni Spoken acq Yemen 7,078,500 6a (Vigorous) -
Arabic, Sanaani Spoken ayn Yemen 7,600,000 6a (Vigorous) -
Arabic, Sa’idi Spoken aec Egypt 19,000,000 6a (Vigorous) -
Wolane wle Ethiopia ? 6a (Vigorous) -
Sebat Bet Gurage sgw Ethiopia 440000 5 (Developing) -
Arabic, Omani Spoken acx Oman 853,900 5 (Developing) -
Silt’e stv Ethiopia 935,000 4 (Educational) -
Tigré tig Eritrea 1,050,000 4 (Educational) -
Arabic, Baharna Spoken abv Bahrain 310,000 3 (Wider communication) -
Arabic, Chadian Spoken shu Chad 1,139,100 3 (Wider communication) -
Arabic, Sudanese Spoken apd Sudan 1,833,000 3 (Wider communication) -
Hassaniyya mey Mauritania 3,278,190 3 (Wider communication) -
Arabic, Libyan Spoken ayl Libya 4,320,500 3 (Wider communication) -
Arabic, South Levantine Spoken ajp Jordan 6,200,000 3 (Wider communication) -
Arabic, Tunisian Spoken aeb Tunisia 9,406,900 3 (Wider communication) -
Arabic, Najdi Spoken ars Saudi Arabia 9,670,000 3 (Wider communication) -
Arabic, North Levantine Spoken apc Syria 14,426,540 3 (Wider communication) -
Arabic, Mesopotamian Spoken acm Iraq 15,100,000 3 (Wider communication) -
Arabic, Moroccan Spoken ary Morocco 21,048,600 3 (Wider communication) -
Arabic, Algerian Spoken arq Algeria 27,997,000 3 (Wider communication) -
Arabic, Egyptian Spoken arz Egypt 53,990,000 3 (Wider communication) -
Tigrigna tir Ethiopia 6,915,000 2 (Provincial) -
Maltese mlt Malta 429,000 1 (National) -
Hebrew heb Israel 5,302,770 1 (National) -
Amharic amh Ethiopia 21,811,560 1 (National) -
Arabic, Standard arb Saudi Arabia 206,000,000 1 (National) -

This article uses material from the Wikipedia article "Semitic languages", which is released under the Creative Commons Attribution-Share-Alike License 3.0.