User:Sushain/SemeticLanguages
The Semitic languages (sem
) constitute a group of related languages and a branch of the Afro-Asiatic language family. Spoken by more than 470 million people throughout North Africa and Southwest Asia, the most widely spoken Semitic languages are Arabic, Maltese, Hebrew, Amharic, and Tigrigna.
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status
The ultimate goal is to have multi-purposable transducers for a variety of Semitic languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
name | language | native name | ISO 639 | formalism | state | stems | paradigms | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | ||||||||||
apertium-heb
|
Hebrew | עִבְרִית | he
|
heb
|
lttoolbox | development | apertium-ara-heb (incubator) | missmaryx | |||
apertium-mlt
|
Maltese | Malti | mt
|
mlt
|
lttoolbox | development | 7,371 | 758 | apertium-mlt (languages) | Fran, Unhammer, Fronczak | |
apertium-ara
|
Arabic | العربية | ar
|
ara
|
lttoolbox | development | apertium-ara-heb (incubator) | missmaryx |
Existing language pairs
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
mlt | heb | ara | |
---|---|---|---|
mlt | - | mt-he |
mt-ar |
heb | mt-he |
- | ara-heb |
ara | mt-ar |
ara-heb |
- |
eng | en-mt |
||
epo | eo-he |
Semitic languages by subgroup
There are six fairly uncontroversial nodes within the Semitic languages:
- East Semitic languages: Akkadian, Eblaite (extinct)
- Central Semitic languages
- Northwest Semitic languages: Aramaic, Canaanite languages, Hebrew
- Arabic languages: Classical Arabic, Standard Arabic, Maltese, etc.
- South Semitic languages
- Western: Ethiopic languages (Amharic, Tigrinya, etc.) and Old South Arabian languages (Sabaean, Minaean, Qatabānian, Ḥaḑramitic, etc.)
- Eastern: Modern South Arabian languages (Bathari, Harsusi, Hobyót, Mehri, Shehri, Soqotri)
Samples
Article 1 of the Universal Declaration of Human Rights:
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
Language | Text |
---|---|
Arabic |align="right"| يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق. وقد وهبوا عقلاً وضميرًا وعليهم أن يعامل بعضهم بعضًا بروح الإخاء. | |
Maltese | Il-bnedmin kollha jitwieldu ħielsa u ugwali fid-dinjità u d-drittijiet. Huma mogħnija bir-raġuni u bil-kuxjenza u għandhom iġibu ruħhom ma’ xulxin bi spirtu ta’ aħwa. |
Hebrew |align="right"| כל בני אדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחוה. | |
Amharic | የሰው፡ልጅ፡ሁሉ፡ሲወለድ፡ነጻና፡በክብርና፡በመብትም፡እኩልነት፡ያለው፡ነው።፡የተፈጥሮ፡ማስተዋልና፡ሕሊና፡ስላለው፡አንዱ፡ሌላውን፡በወንድማማችነት፡መንፈስ፡መመልከት፡ይገባዋል። |
Tigrigna | ብመንፅር ክብርን መሰልን ኩሎም ሰባት እንትውለዱ ነፃን ማዕሪን እዮም፡፡ ምስትውዓልን ሕልናን ዝተዓደሎም ብምዃኖም ንሕድሕዶም ብሕውነታዊ መንፈስ ክተሓላለዩ ኦለዎም፡፡ |
Vulnerability
This table summarizes the vulnerability of various Semitic languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.
Language | ISO639-3 | Areas | Vulnerability |
---|---|---|---|
Ge'ez | gez
|
Ethiopia | 5 - Extinct |
Mlahso (Syria) | lhs
|
Syrian Arab Republic | 5 - Extinct |
Lishanid Noshan (Iraq) | aij
|
Iraq | 5 - Extinct |
Lishana Deni (Iraq) | lsd
|
Iraq | 5 - Extinct |
Lishan Didan (Iran) | trg
|
Iran (Islamic Republic of) | 5 - Extinct |
Hulaula (Iran) | huy
|
Iran (Islamic Republic of) | 5 - Extinct |
Barzani Jewish Neo-Aramaic (Iraq) | bjf
|
Iraq | 5 - Extinct |
Bathari | bhm
|
Oman | 4 - Critically endangered |
Argobba | agj
|
Ethiopia | 4 - Critically endangered |
Mandaic | mid
|
Iran (Islamic Republic of), Iraq | 4 - Critically endangered |
Senaya | syn
|
Iran (Islamic Republic of) | 4 - Critically endangered |
Hértevin | hrt
|
Turkey | 4 - Critically endangered |
Soqotri | sqt
|
Yemen | 3 - Severely endangered |
Jibbali | shv
|
Oman | 3 - Severely endangered |
Hobyot | hoh
|
Oman, Yemen | 3 - Severely endangered |
Zay | zwa
|
Ethiopia | 3 - Severely endangered |
Tunisian Judeo-Arabic (Israel) | ajt
|
Israel | 3 - Severely endangered |
Cypriot Arabic | acy
|
Cyprus | 3 - Severely endangered |
Turoyo | tru
|
Syrian Arab Republic, Turkey | 3 - Severely endangered |
Bohtan Neo-Aramaic | bhn
|
Georgia, Russian Federation | 3 - Severely endangered |
Mehri | gdq
|
Oman, Yemen | 2 - Definitely endangered |
Harsusi | hss
|
Oman | 2 - Definitely endangered |
Moroccan Judeo-Arabic (Israel) | aju
|
Israel | 2 - Definitely endangered |
Western Neo-Aramaic | amw
|
Syrian Arab Republic | 2 - Definitely endangered |
Language | ISO639-3 | UNESCO | Ethnologue | |||
---|---|---|---|---|---|---|
Areas | Vulnerability | Speakers | Status | Location | ||
Mesmes | mys
|
? | ? | No remaining speakers. | 10 (Extinct) | Ethiopia |
Mandaic, Classical | myz
|
? | ? | No remaining speakers. | 10 (Extinct) | Iran |
Mlahsö | lhs
|
Syrian Arab Republic | 5 (Extinct) | No remaining speakers. | 10 (Extinct) | Syria |
Jewish Babylonian Aramaic | tmr
|
? | ? | No remaining speakers. | 10 (Extinct) | Iraq |
Geez | gez
|
Ethiopia | 5 (Extinct) | No known L1 speakers in Ethiopia. | 9 (Second language only) | Ethiopia |
Samaritan | smp
|
? | ? | No known L1 speakers in Palestine. Ethnic population: 620 (1999 H. Mutzafi). | 9 (Dormant) | Palestine |
Hebrew, Ancient | hbo
|
? | ? | No known L1 speakers. | 9 (Dormant) | Israel |
Samaritan Aramaic | sam
|
? | ? | No known L1 speakers in Palestine. Ethnic population: 620 (1999 H. Mutzafi). | 9 (Dormant) | Palestine |
Syriac | syc
|
? | ? | No known L1 speakers. | 9 (Dormant) | Turkey |
Soqotri | sqt
|
Yemen | 3 (Severely endangered) | Population total all countries: 64,000. | 8a (Moribund) | Yemen |
Hobyót | hoh
|
Oman, Yemen | 3 (Severely endangered) | 100 (1998 H. Mutzafi). | 8a (Moribund) | Oman |
Bathari | bhm
|
Oman | 4 (Critically endangered) | 200 in Oman (2011). | 8b (Nearly extinct) | Oman |
Arabic, Uzbeki Spoken | auz
|
? | ? | 700. | 8a (Moribund) | Uzbekistan |
Senaya | syn
|
Iran (Islamic Republic of) | 4 (Critically endangered) | Population total all countries: 460. | 8b (Nearly extinct) | Iran |
Hulaulá | huy
|
Iran (Islamic Republic of) | 5 (Extinct) | Population total all countries: 10,350. | 8a (Moribund) | Israel |
Barzani Jewish Neo-Aramaic | bjf
|
Iraq | 5 (Extinct) | 20 (2004 H. Mutzafi). | 8b (Nearly extinct) | Israel |
Mehri | gdq
|
Oman, Yemen | 2 (Definitely endangered) | Population total all countries: 115,200. | 7 (Shifting) | Yemen |
Harsusi | hss
|
Oman | 2 (Definitely endangered) | 600 (2011). | 7 (Shifting) | Oman |
Arabic, Judeo-Tunisian | ajt
|
Israel | 3 (Severely endangered) | Population total all countries: 45,500. | 7 (Shifting) | Israel |
Arabic, Judeo-Tripolitanian | yud
|
? | ? | Population total all countries: 35,000. | 7 (Shifting) | Israel |
Arabic, Judeo-Moroccan | aju
|
Israel | 2 (Definitely endangered) | Population total all countries: 258,930. | 7 (Shifting) | Israel |
Arabic, Judeo-Iraqi | yhd
|
? | ? | Population total all countries: 151,820. | 7 (Shifting) | Israel |
Arabic, Cypriot Spoken | acy
|
Cyprus | 3 (Severely endangered) | 1,300 (1995). Ethnic population: 6,000 in Cypriot Maronite ethnic group, 140 Maronites in Kormatiki, 80 to 100 in Limassol, the rest in the Maronite community in Nicosia. | 7 (Shifting) | Cyprus |
Western Neo-Aramaic | amw
|
Syrian Arab Republic | 2 (Definitely endangered) | 15,000 (1996). 8,000 in Maaloula. | 7 (Shifting) | Syria |
Mandaic | mid
|
Iran (Islamic Republic of), Iraq | 4 (Critically endangered) | Population total all countries: 5,500. | 7 (Shifting) | Iraq |
Lishanid Noshan | aij
|
Iraq | 5 (Extinct) | 2,200 (1994 H. Mutzafi). | 7 (Shifting) | Israel |
Lishana Deni | lsd
|
Iraq | 5 (Extinct) | 7,500 (1999 H. Mutzafi). Ethnic population: 9,060. | 7 (Shifting) | Israel |
Lishán Didán | trg
|
Iran (Islamic Republic of) | 5 (Extinct) | Population total all countries: 4,450. | 7 (Shifting) | Israel |
Chaldean Neo-Aramaic | cld
|
? | ? | Population total all countries: 206,000. | 7 (Shifting) | Iraq |
Bohtan Neo-Aramaic | bhn
|
Georgia, Russian Federation | 3 (Severely endangered) | 1,000 in Georgia (1999 S. Fox). | 7 (Shifting) | Georgia |
Shehri | shv
|
Oman | 3 (Severely endangered) | 25,000 (1993 census). | 6b (Threatened) | Oman |
Zay | zwa
|
Ethiopia | 3 (Severely endangered) | 4,880 (1994 SIL), decreasing. Ethnic population: 4,880. | 6b (Threatened) | Ethiopia |
Wolane | wle
|
? | ? | ? | 6a (Vigorous) | Ethiopia |
Harari | har
|
? | ? | 25,800 (2007 census). 2,350 monolinguals. 20,000 in Addis Ababa, outside Harar City (Hetzron 1997:486). | 6a (Vigorous) | Ethiopia |
Argobba | agj
|
Ethiopia | 4 (Critically endangered) | 43,700 (2007 census). 100 monolinguals. | 6b (Threatened) | Ethiopia |
Mesqan | mvz
|
? | ? | 195,000 (2007 SIL). Ethnic population: 205,000 (Woreda Farmers’ Cooperatives Office). | 6a (Vigorous) | Ethiopia |
Inor | ior
|
? | ? | 280,000. 50,000 Endegeny. | 6a (Vigorous) | Ethiopia |
Kistane | gru
|
? | ? | 255,000 (1994 census). Ethnic population: 364,000 (1994 census) including 4,000 Gogot. | 6a (Vigorous) | Ethiopia |
Dahalik | dlk
|
? | ? | 2,500 (2012 J. McLaughlin). | 6a (Vigorous) | Eritrea |
Arabic, Ta’izzi-Adeni Spoken | acq
|
? | ? | Population total all countries: 7,078,500. | 6a (Vigorous) | Yemen |
Arabic, Tajiki Spoken | abh
|
? | ? | Population total all countries: 6,000. | 6b (Threatened) | Tajikistan |
Arabic, Shihhi Spoken | ssh
|
? | ? | Population total all countries: 27,000. | 6a (Vigorous) | United Arab Emirates |
Arabic, Sa’idi Spoken | aec
|
? | ? | 19,000,000 (2006). | 6a (Vigorous) | Egypt |
Arabic, Sanaani Spoken | ayn
|
? | ? | 7,600,000 (1996). | 6a (Vigorous) | Yemen |
Arabic, North Mesopotamian Spoken | ayp
|
? | ? | Population total all countries: 6,300,000. | 6a (Vigorous) | Iraq |
Arabic, Judeo-Yemeni | jye
|
? | ? | Population total all countries: 51,000. | 6a (Vigorous) | Israel |
Arabic, Hijazi Spoken | acw
|
? | ? | Population total all countries: 6,023,900. | 6a (Vigorous) | Saudi Arabia |
Arabic, Hadrami Spoken | ayh
|
? | ? | Population total all countries: 410,000. | 6a (Vigorous) | Yemen |
Arabic, Gulf Spoken | afb
|
? | ? | Population total all countries: 3,601,000. | 6a (Vigorous) | Iraq |
Arabic, Eastern Egyptian Bedawi Spoken | avl
|
? | ? | Population total all countries: 1,690,000. | 6a (Vigorous) | Egypt |
Arabic, Dhofari Spoken | adf
|
? | ? | 70,000 (1996). | 6a (Vigorous) | Oman |
Arabic, Algerian Saharan Spoken | aao
|
? | ? | Population total all countries: 130,500. | 6a (Vigorous) | Algeria |
Turoyo | tru
|
Syrian Arab Republic, Turkey | 3 (Severely endangered) | Population total all countries: 62,000. | 6b (Threatened) | Turkey |
Koy Sanjaq Surat | kqd
|
? | ? | 800 (1995 H. Mutzafi). | 6a (Vigorous) | Iraq |
Hértevin | hrt
|
Turkey | 4 (Critically endangered) | 1,000 (1999 H. Mutzafi). | 6a (Vigorous) | Turkey |
Assyrian Neo-Aramaic | aii
|
? | ? | Population total all countries: 232,300. | 6b (Threatened) | Iraq |
Sebat Bet Gurage | sgw
|
? | ? | 440000. Chaha 130,000, Gura 20,000, Muher 90,000, Gyeto 80,000, Ezha 120,000. | 5 (Developing) | Ethiopia |
Arabic, Omani Spoken | acx
|
? | ? | Population total all countries: 853,900. | 5 (Developing) | Oman |
Silt’e | stv
|
? | ? | 935,000 (2007 census). | 4 (Educational) | Ethiopia |
Tigré | tig
|
? | ? | 1,050,000 in Eritrea (2006), increasing. | 4 (Educational) | Eritrea |
Hassaniyya | mey
|
? | ? | Population total all countries: 3,278,190. | 3 (Wider communication) | Mauritania |
Arabic, Tunisian Spoken | aeb
|
? | ? | Population total all countries: 9,406,900. | 3 (Wider communication) | Tunisia |
Arabic, Sudanese Spoken | apd
|
? | ? | Population total all countries: 1,833,000. | 3 (Wider communication) | Sudan |
Arabic, South Levantine Spoken | ajp
|
? | ? | Population total all countries: 6,200,000. | 3 (Wider communication) | Jordan |
Arabic, North Levantine Spoken | apc
|
? | ? | Population total all countries: 14,426,540. | 3 (Wider communication) | Syria |
Arabic, Najdi Spoken | ars
|
? | ? | Population total all countries: 9,670,000. | 3 (Wider communication) | Saudi Arabia |
Arabic, Moroccan Spoken | ary
|
? | ? | Population total all countries: 21,048,600. | 3 (Wider communication) | Morocco |
Arabic, Mesopotamian Spoken | acm
|
? | ? | Population total all countries: 15,100,000. | 3 (Wider communication) | Iraq |
Arabic, Libyan Spoken | ayl
|
? | ? | Population total all countries: 4,320,500. | 3 (Wider communication) | Libya |
Arabic, Egyptian Spoken | arz
|
? | ? | Population total all countries: 53,990,000. | 3 (Wider communication) | Egypt |
Arabic, Chadian Spoken | shu
|
? | ? | Population total all countries: 1,139,100. | 3 (Wider communication) | Chad |
Arabic, Baharna Spoken | abv
|
? | ? | Population total all countries: 310,000. | 3 (Wider communication) | Bahrain |
Arabic, Algerian Spoken | arq
|
? | ? | Population total all countries: 27,997,000. | 3 (Wider communication) | Algeria |
Tigrigna | tir
|
? | ? | Population total all countries: 6,915,000. | 2 (Provincial) | Ethiopia |
Amharic | amh
|
? | ? | Population total all countries: 21,811,560. | 1 (National) | Ethiopia |
Hebrew | heb
|
? | ? | Population total all countries: 5,302,770. | 1 (National) | Israel |
Maltese | mlt
|
? | ? | Population total all countries: 429,000. | 1 (National) | Malta |
Arabic, Standard | arb
|
? | ? | 206,000,000 L1 speakers of all Arabic varieties (Wiesenfeld 1999). | 1 (National) | Saudi Arabia |
This article uses material from the Wikipedia article "Semitic languages", which is released under the Creative Commons Attribution-Share-Alike License 3.0.