Difference between revisions of "User:Sushain/SemeticLanguages"
Line 81: | Line 81: | ||
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk. |
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk. |
||
{| style="text-align: center;" class="wikitable" |
|||
|- style="background: #ececec" |
|||
! !! mlt !! heb !! ara |
|||
|- |
|||
| '''mlt''' || - || [[Apertium-mt-he|mt-he]]<br>3,634 || [[Apertium-mt-ar|mt-ar]]<br>7,570 |
|||
|- |
|||
| '''heb''' || [[Apertium-mt-he|mt-he]]<br>3,634 || - || ''[[Apertium-ara-heb|ara-heb]]''<br>131 |
|||
|- |
|||
| '''ara''' || [[Apertium-mt-ar|mt-ar]]<br>7,570 || ''[[Apertium-ara-heb|ara-heb]]''<br>131 || - |
|||
|- |
|||
| || || || |
|||
|- |
|||
| '''eng''' || ''[[Apertium-en-mt|en-mt]]''<br>814 || || |
|||
|- |
|||
| '''epo''' || || ''[[Apertium-eo-he|eo-he]]''<br>1,505 || |
|||
|} |
|||
{| style="text-align: center;" class="wikitable" |
{| style="text-align: center;" class="wikitable" |
||
Line 114: | Line 98: | ||
| '''epo''' || || ''[[Apertium-eo-he|eo-he]]''<br>{{#lst:Apertium-eo-he/stats|eo-he-stems}} || |
| '''epo''' || || ''[[Apertium-eo-he|eo-he]]''<br>{{#lst:Apertium-eo-he/stats|eo-he-stems}} || |
||
|} |
|} |
||
==Samples== |
==Samples== |
Revision as of 22:42, 2 January 2014
The Semetic languages (sem
) constitute a language family and a branch of the Afro-Asiatic language family. Spoken by more than 470 million people throughout North Africa and Southwest Asia, the most widely spoken Semetic languages are Arabic, Maltese, Hebrew, Amharic, and Tigrigna.
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status
The ultimate goal is to have multi-purposable transducers for a variety of Semetic languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
name | language | native name | ISO 639 | formalism | state | stems | paradigms | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | ||||||||||
apertium-ara
|
Arabic | العربية | ar
|
ara
|
? | ? | ? | ? | ? | ? | |
apertium-heb
|
Hebrew | עִבְרִית | he
|
heb
|
? | ? | ? | ? | ? | ? | |
apertium-mlt
|
Maltese | Malti | mt
|
mlt
|
? | ? | ? | ? | ? | ? |
Semetic languages by subgroup
There are six fairly uncontroversial nodes within the Semitic languages:
- East Semitic languages: Akkadian, Eblaite (extinct)
- Central Semitic languages
- Northwest Semitic languages: Aramaic, Canaanite languages, Hebrew
- Arabic languages: Classical Arabic, Standard Arabic, Maltese, etc.
- South Semitic languages
- Western: Ethiopic languages (Amharic, Tigrinya, etc.) and Old South Arabian languages (Sabaean, Minaean, Qatabānian, Ḥaḑramitic, etc.)
- Eastern: Modern South Arabian languages (Bathari, Harsusi, Hobyót, Mehri, Shehri, Soqotri)
Existing language pairs
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
mlt | heb | ara | |
---|---|---|---|
mlt | - | mt-he |
mt-ar |
heb | mt-he |
- | ara-heb |
ara | mt-ar |
ara-heb |
- |
eng | en-mt |
||
epo | eo-he |
Samples
Article 1 of the Universal Declaration of Human Rights:
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
Language | Text |
---|---|
Maltese | Il-bnedmin kollha jitwieldu ħielsa u ugwali fid-dinjità u d-drittijiet. Huma mogħnija bir-raġuni u bil-kuxjenza u għandhom iġibu ruħhom ma’ xulxin bi spirtu ta’ aħwa. |
Hebrew | כל בני אדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחוה. |
Arabic | يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق. وقد وهبوا عقلاً وضميرًا وعليهم أن يعامل بعضهم بعضًا بروح الإخاء. |
Vulnerability
This table summarizes the vulnerability of various Semetic languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’.
Language | ISO639-3 | UNESCO | Ethnologue | |||
---|---|---|---|---|---|---|
Areas | Vulnerability | Speakers | Status | Location | ||
Tigrigna | tir
|
? | ? | 4,320,000 in Ethiopia (2007 census). 2,820,000 monolinguals. Population total all countries: 6,915,000. | 2 (Provincial) | Ethiopia |
Amharic | amh
|
? | ? | 21,600,000 in Ethiopia (2007 census). 14,750,000 monolinguals. Population total all countries: 21,811,560. | 1 (National) | Ethiopia |
Hebrew | heb
|
? | ? | 4,850,000 in Israel (1998). Population total all countries: 5,302,770. | 1 (National) | Israel |
Maltese | mlt
|
? | ? | 300,000 in Malta (Katzner 1975). Population total all countries: 429,000. | 1 (National) | Malta |
Arabic, Standard | arb
|
? | ? | 206,000,000 L1 speakers of all Arabic varieties (Wiesenfeld 1999). | 1 (National) | Saudi Arabia |
Language | ISO639-3 | Areas | Vulnerability |
---|---|---|---|
Ge'ez | gez
|
Ethiopia | 5 - Extinct |
Mlahso (Syria) | lhs
|
Syrian Arab Republic | 5 - Extinct |
Lishanid Noshan (Iraq) | aij
|
Iraq | 5 - Extinct |
Lishana Deni (Iraq) | lsd
|
Iraq | 5 - Extinct |
Lishan Didan (Iran) | trg
|
Iran (Islamic Republic of) | 5 - Extinct |
Hulaula (Iran) | huy
|
Iran (Islamic Republic of) | 5 - Extinct |
Barzani Jewish Neo-Aramaic (Iraq) | bjf
|
Iraq | 5 - Extinct |
Bathari | bhm
|
Oman | 4 - Critically endangered |
Argobba | agj
|
Ethiopia | 4 - Critically endangered |
Mandaic | mid
|
Iran (Islamic Republic of), Iraq | 4 - Critically endangered |
Senaya | syn
|
Iran (Islamic Republic of) | 4 - Critically endangered |
Hértevin | hrt
|
Turkey | 4 - Critically endangered |
Soqotri | sqt
|
Yemen | 3 - Severely endangered |
Jibbali | shv
|
Oman | 3 - Severely endangered |
Hobyot | hoh
|
Oman, Yemen | 3 - Severely endangered |
Zay | zwa
|
Ethiopia | 3 - Severely endangered |
Tunisian Judeo-Arabic (Israel) | ajt
|
Israel | 3 - Severely endangered |
Cypriot Arabic | acy
|
Cyprus | 3 - Severely endangered |
Turoyo | tru
|
Syrian Arab Republic, Turkey | 3 - Severely endangered |
Bohtan Neo-Aramaic | bhn
|
Georgia, Russian Federation | 3 - Severely endangered |
Mehri | gdq
|
Oman, Yemen | 2 - Definitely endangered |
Harsusi | hss
|
Oman | 2 - Definitely endangered |
Moroccan Judeo-Arabic (Israel) | aju
|
Israel | 2 - Definitely endangered |
Western Neo-Aramaic | amw
|
Syrian Arab Republic | 2 - Definitely endangered |
Language | ISO639-3 | UNESCO | Ethnologue | |||
---|---|---|---|---|---|---|
Areas | Vulnerability | Speakers | Status | Location | ||
Mesmes | mys
|
? | ? | No remaining speakers. | 10 (Extinct) | Ethiopia |
Mandaic, Classical | myz
|
? | ? | No remaining speakers. | 10 (Extinct) | Iran |
Mlahsö | lhs
|
Syrian Arab Republic | 5 (Extinct) | No remaining speakers. | 10 (Extinct) | Syria |
Jewish Babylonian Aramaic | tmr
|
? | ? | No remaining speakers. | 10 (Extinct) | Iraq |
Geez | gez
|
Ethiopia | 5 (Extinct) | No known L1 speakers in Ethiopia. | 9 (Second language only) | Ethiopia |
Samaritan | smp
|
? | ? | No known L1 speakers in Palestine. Ethnic population: 620 (1999 H. Mutzafi). | 9 (Dormant) | Palestine |
Hebrew, Ancient | hbo
|
? | ? | No known L1 speakers. | 9 (Dormant) | Israel |
Samaritan Aramaic | sam
|
? | ? | No known L1 speakers in Palestine. Ethnic population: 620 (1999 H. Mutzafi). | 9 (Dormant) | Palestine |
Syriac | syc
|
? | ? | No known L1 speakers. | 9 (Dormant) | Turkey |
Soqotri | sqt
|
Yemen | 3 (Severely endangered) | 57,000 in Yemen (1990 census). Population total all countries: 64,000. | 8a (Moribund) | Yemen |
Hobyót | hoh
|
Oman, Yemen | 3 (Severely endangered) | 100 (1998 H. Mutzafi). | 8a (Moribund) | Oman |
Bathari | bhm
|
Oman | 4 (Critically endangered) | 200 in Oman (2011). | 8b (Nearly extinct) | Oman |
Arabic, Uzbeki Spoken | auz
|
? | ? | 700. | 8a (Moribund) | Uzbekistan |
Senaya | syn
|
Iran (Islamic Republic of) | 4 (Critically endangered) | 60 in Iran (1997 H. Mutzafi). Population total all countries: 460. | 8b (Nearly extinct) | Iran |
Hulaulá | huy
|
Iran (Islamic Republic of) | 5 (Extinct) | 10,000 in Israel (1999 H. Mutzafi). Population total all countries: 10,350. | 8a (Moribund) | Israel |
Barzani Jewish Neo-Aramaic | bjf
|
Iraq | 5 (Extinct) | 20 (2004 H. Mutzafi). | 8b (Nearly extinct) | Israel |
Mehri | gdq
|
Oman, Yemen | 2 (Definitely endangered) | 50,000 in Yemen (2011). Population total all countries: 115,200. | 7 (Shifting) | Yemen |
Harsusi | hss
|
Oman | 2 (Definitely endangered) | 600 (2011). | 7 (Shifting) | Oman |
Arabic, Judeo-Tunisian | ajt
|
Israel | 3 (Severely endangered) | 45,000 in Israel (1995 H. Mutzafi). Population total all countries: 45,500. | 7 (Shifting) | Israel |
Arabic, Judeo-Tripolitanian | yud
|
? | ? | 30,000 in Israel (1994 H. Mutzafi). Population total all countries: 35,000. | 7 (Shifting) | Israel |
Arabic, Judeo-Moroccan | aju
|
Israel | 2 (Definitely endangered) | 250,000 in Israel (1992 H. Mutzafi). Population total all countries: 258,930. | 7 (Shifting) | Israel |
Arabic, Judeo-Iraqi | yhd
|
? | ? | 100,000 in Israel (1994). Population total all countries: 151,820. | 7 (Shifting) | Israel |
Arabic, Cypriot Spoken | acy
|
Cyprus | 3 (Severely endangered) | 1,300 (1995). Ethnic population: 6,000 in Cypriot Maronite ethnic group, 140 Maronites in Kormatiki, 80 to 100 in Limassol, the rest in the Maronite community in Nicosia. | 7 (Shifting) | Cyprus |
Western Neo-Aramaic | amw
|
Syrian Arab Republic | 2 (Definitely endangered) | 15,000 (1996). 8,000 in Maaloula. | 7 (Shifting) | Syria |
Mandaic | mid
|
Iran (Islamic Republic of), Iraq | 4 (Critically endangered) | 5,000 in Iraq (2006). Population total all countries: 5,500. Ethnic population: 30,000. | 7 (Shifting) | Iraq |
Lishanid Noshan | aij
|
Iraq | 5 (Extinct) | 2,200 (1994 H. Mutzafi). | 7 (Shifting) | Israel |
Lishana Deni | lsd
|
Iraq | 5 (Extinct) | 7,500 (1999 H. Mutzafi). Ethnic population: 9,060. | 7 (Shifting) | Israel |
Lishán Didán | trg
|
Iran (Islamic Republic of) | 5 (Extinct) | 4,230 in Israel (2001). Population total all countries: 4,450. | 7 (Shifting) | Israel |
Chaldean Neo-Aramaic | cld
|
? | ? | 100,000 in Iraq (1994 H. Mutzafi). Population total all countries: 206,000. | 7 (Shifting) | Iraq |
Bohtan Neo-Aramaic | bhn
|
Georgia, Russian Federation | 3 (Severely endangered) | 1,000 in Georgia (1999 S. Fox). | 7 (Shifting) | Georgia |
Shehri | shv
|
Oman | 3 (Severely endangered) | 25,000 (1993 census). | 6b (Threatened) | Oman |
Zay | zwa
|
Ethiopia | 3 (Severely endangered) | 4,880 (1994 SIL), decreasing. Ethnic population: 4,880. | 6b (Threatened) | Ethiopia |
Wolane | wle
|
? | ? | ? | 6a (Vigorous) | Ethiopia |
Harari | har
|
? | ? | 25,800 (2007 census). 2,350 monolinguals. 20,000 in Addis Ababa, outside Harar City (Hetzron 1997:486). | 6a (Vigorous) | Ethiopia |
Argobba | agj
|
Ethiopia | 4 (Critically endangered) | 43,700 (2007 census). 100 monolinguals. | 6b (Threatened) | Ethiopia |
Mesqan | mvz
|
? | ? | 195,000 (2007 SIL). Ethnic population: 205,000 (Woreda Farmers’ Cooperatives Office). | 6a (Vigorous) | Ethiopia |
Inor | ior
|
? | ? | 280,000. 50,000 Endegeny. | 6a (Vigorous) | Ethiopia |
Kistane | gru
|
? | ? | 255,000 (1994 census). Ethnic population: 364,000 (1994 census) including 4,000 Gogot. | 6a (Vigorous) | Ethiopia |
Dahalik | dlk
|
? | ? | 2,500 (2012 J. McLaughlin). | 6a (Vigorous) | Eritrea |
Arabic, Ta’izzi-Adeni Spoken | acq
|
? | ? | 6,760,000 in Yemen (1996). Population total all countries: 7,078,500. | 6a (Vigorous) | Yemen |
Arabic, Tajiki Spoken | abh
|
? | ? | 1,000 in Tajikistan. Population total all countries: 6,000. | 6b (Threatened) | Tajikistan |
Arabic, Shihhi Spoken | ssh
|
? | ? | 5,000 in United Arab Emirates (1995). Population total all countries: 27,000. | 6a (Vigorous) | United Arab Emirates |
Arabic, Sa’idi Spoken | aec
|
? | ? | 19,000,000 (2006). | 6a (Vigorous) | Egypt |
Arabic, Sanaani Spoken | ayn
|
? | ? | 7,600,000 (1996). | 6a (Vigorous) | Yemen |
Arabic, North Mesopotamian Spoken | ayp
|
? | ? | 5,400,000 in Iraq (1992). Population total all countries: 6,300,000. | 6a (Vigorous) | Iraq |
Arabic, Judeo-Yemeni | jye
|
? | ? | 50,000 in Israel (1995 Y. Kara). Population total all countries: 51,000. | 6a (Vigorous) | Israel |
Arabic, Hijazi Spoken | acw
|
? | ? | 6,000,000 in Saudi Arabia (1996). Population total all countries: 6,023,900. | 6a (Vigorous) | Saudi Arabia |
Arabic, Hadrami Spoken | ayh
|
? | ? | 300,000 in Yemen (1995). Population total all countries: 410,000. | 6a (Vigorous) | Yemen |
Arabic, Gulf Spoken | afb
|
? | ? | 40,000 in Iraq. Population total all countries: 3,601,000. | 6a (Vigorous) | Iraq |
Arabic, Eastern Egyptian Bedawi Spoken | avl
|
? | ? | 860,000 in Egypt (2006). Population total all countries: 1,690,000. | 6a (Vigorous) | Egypt |
Arabic, Dhofari Spoken | adf
|
? | ? | 70,000 (1996). | 6a (Vigorous) | Oman |
Arabic, Algerian Saharan Spoken | aao
|
? | ? | 100,000 in Algeria (1996). Population total all countries: 130,500. | 6a (Vigorous) | Algeria |
Turoyo | tru
|
Syrian Arab Republic, Turkey | 3 (Severely endangered) | 3,000 in Turkey (1994 H. Mutzafi). Population total all countries: 62,000. Ethnic population: 50,000–70,000 (1994). | 6b (Threatened) | Turkey |
Koy Sanjaq Surat | kqd
|
? | ? | 800 (1995 H. Mutzafi). | 6a (Vigorous) | Iraq |
Hértevin | hrt
|
Turkey | 4 (Critically endangered) | 1,000 (1999 H. Mutzafi). | 6a (Vigorous) | Turkey |
Assyrian Neo-Aramaic | aii
|
? | ? | 30,000 in Iraq (1994). Population total all countries: 232,300. Ethnic population: 4,250,000 (1994). | 6b (Threatened) | Iraq |
Sebat Bet Gurage | sgw
|
? | ? | 440000. Chaha 130,000, Gura 20,000, Muher 90,000, Gyeto 80,000, Ezha 120,000. | 5 (Developing) | Ethiopia |
Arabic, Omani Spoken | acx
|
? | ? | 720,000 in Oman (1996). Population total all countries: 853,900. | 5 (Developing) | Oman |
Silt’e | stv
|
? | ? | 935,000 (2007 census). | 4 (Educational) | Ethiopia |
Tigré | tig
|
? | ? | 1,050,000 in Eritrea (2006), increasing. | 4 (Educational) | Eritrea |
Hassaniyya | mey
|
? | ? | 2,770,000 in Mauritania (2006), increasing. Population total all countries: 3,278,190. | 3 (Wider communication) | Mauritania |
Arabic, Tunisian Spoken | aeb
|
? | ? | 9,000,000 in Tunisia (1995). Population total all countries: 9,406,900. | 3 (Wider communication) | Tunisia |
Arabic, Sudanese Spoken | apd
|
? | ? | 15,000,000 in South Sudan and Sudan. Population total all countries: 1,833,000. | 3 (Wider communication) | Sudan |
Arabic, South Levantine Spoken | ajp
|
? | ? | 3,500,000 in Jordan (1996). Population total all countries: 6,200,000. | 3 (Wider communication) | Jordan |
Arabic, North Levantine Spoken | apc
|
? | ? | 8,800,000 in Syria (1991). 6,000,000 in Lebanese-Central Syrian, 1,000,000 in North Syrian. Population total all countries: 14,426,540. | 3 (Wider communication) | Syria |
Arabic, Najdi Spoken | ars
|
? | ? | 8,000,000 in Saudi Arabia. Population total all countries: 9,670,000. | 3 (Wider communication) | Saudi Arabia |
Arabic, Moroccan Spoken | ary
|
? | ? | 18,800,000 in Morocco (1995). Population total all countries: 21,048,600. | 3 (Wider communication) | Morocco |
Arabic, Mesopotamian Spoken | acm
|
? | ? | 11,500,000 in Iraq. Population total all countries: 15,100,000. | 3 (Wider communication) | Iraq |
Arabic, Libyan Spoken | ayl
|
? | ? | 4,000,000 in Libya (2006), increasing. Population total all countries: 4,320,500. | 3 (Wider communication) | Libya |
Arabic, Egyptian Spoken | arz
|
? | ? | 52,500,000 in Egypt (2006). Population total all countries: 53,990,000. | 3 (Wider communication) | Egypt |
Arabic, Chadian Spoken | shu
|
? | ? | 896,100 in Chad (2006), increasing. Population total all countries: 1,139,100. | 3 (Wider communication) | Chad |
Arabic, Baharna Spoken | abv
|
? | ? | 300,000 in Bahrain (1995). Population total all countries: 310,000. | 3 (Wider communication) | Bahrain |
Arabic, Algerian Spoken | arq
|
? | ? | 26,000,000 in Algeria (2012 Sherbrooke University), increasing. Population total all countries: 27,997,000. | 3 (Wider communication) | Algeria |
Tigrigna | tir
|
? | ? | 4,320,000 in Ethiopia (2007 census). 2,820,000 monolinguals. Population total all countries: 6,915,000. | 2 (Provincial) | Ethiopia |
Amharic | amh
|
? | ? | 21,600,000 in Ethiopia (2007 census). 14,750,000 monolinguals. Population total all countries: 21,811,560. | 1 (National) | Ethiopia |
Hebrew | heb
|
? | ? | 4,850,000 in Israel (1998). Population total all countries: 5,302,770. | 1 (National) | Israel |
Maltese | mlt
|
? | ? | 300,000 in Malta (Katzner 1975). Population total all countries: 429,000. | 1 (National) | Malta |
Arabic, Standard | arb
|
? | ? | 206,000,000 L1 speakers of all Arabic varieties (Wiesenfeld 1999). | 1 (National) | Saudi Arabia |
This article uses material from the Wikipedia article "Semetic languages", which is released under the Creative Commons Attribution-Share-Alike License 3.0.