Dravidian languages
The Semetic languages (sem
) constitute a group of related languages and a branch of the Afro-Asiatic language family. Spoken by more than 470 million people throughout North Africa and Southwest Asia, the most widely spoken Semetic languages are Arabic, Maltese, Hebrew, Amharic, and Tigrigna.
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status
The ultimate goal is to have multi-purposable transducers for a variety of Semetic languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
name | language | native name | ISO 639 | formalism | state | stems | paradigms | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | ||||||||||
apertium-heb
|
Hebrew | עִבְרִית | he
|
heb
|
lttoolbox | development | apertium-ara-heb (incubator) | missmaryx | |||
apertium-mlt
|
Maltese | Malti | mt
|
mlt
|
lttoolbox | development | 7,371 | 758 | apertium-mlt (languages) | Fran, Unhammer, Fronczak | |
apertium-ara
|
Arabic | العربية | ar
|
ara
|
lttoolbox | development | apertium-ara-heb (incubator) | missmaryx |
Existing language pairs
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
tel | mal | |
---|---|---|
tel | - | |
mal | - | |
eng | eng-tel 1 |
mal-eng 109 |
Dravidian languages by subgroup
The Dravidian languages form a close-knit family. Most scholars agree on four groups: North, Central (Kolami–Parji), South-Central (Telugu–Kui) and South Dravidian:
- Central (Kolami–Parji) languages
- Kolami-Naiki languages (Kolami)
- Parji-Gadaba languages (Duruwa, Gadaba)
- Northern languages (Brahui, Kurux, Sauria)
- Southern (Tamil–Tulu) languages
- South-Central (Telugu-Kui) languages
- Gondi-Kui languages (Gondi, Konda-Kui)
- Telugu languages (Telugu, Savara, Chenchu)
Samples
Article 1 of the Universal Declaration of Human Rights:
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
Language | Text |
---|---|
Kannada | ಎಲ್ಲಾ ಮಾನವರೂ ಸ್ವತಂತ್ರರಾಗಿಯೇ ಜನಿಸಿದ್ಧಾರೆ. ಹಾಗೂ ಘನತೆ ಮತ್ತು ಹಕ್ಕುಗಳಲ್ಲಿ ಸಮಾನರಾಗಿದ್ದಾರೆ. ವಿವೇಕ ಮತ್ತು ಅಂತಃಕರಣ ಗಳನ್ನು ಪದೆದವರಾದ್ದ ರಿಂದ ಅವರು ಪರಸ್ಪರ ಸಹೋದರ ಭಾವದಿಂದ ವರ್ತಿಸಚೀಕು. |
Malayalam | മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും അന്തസ്സോടും സ്വാതന്ത്ര്യത്തോടുംകൂടി ജനിച്ചിട്ടുള്ളവരാണ്. അന്യോന്യം ഭ്രാതൃഭാവത്തോടെ പെരുമാറുവാനാണ് മനുഷ്യന്നു വിവേകബുദ്ധിയും മനസ്സാക്ഷിയും സിദ്ധമായിരിക്കുന്നത്. |
Tamil | மனிதப் பிறிவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர் ; அவர்கள் மதிப்பிலும், உரிமைகளிலும் சமமானவர்கள், அவர்கள் நியாயத்தையும் மனச்சாட்சியையும் இயற்பண்பாகப் பெற்றவர்கள். அவர்கள் ஒருவருடனொருவர் சகோதர உணர்வுப் பாங்கில் நடந்துகொள்ளல் வேண்டும். |
Vulnerability
This table summarizes the vulnerability of various Dravidian languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.
Language | ISO639-3 | UNESCO | Ethnologue | |||
---|---|---|---|---|---|---|
Areas | Vulnerability | Speakers | Status | Location | ||
Nagarchal | nbg
|
? | ? | No remaining speakers. | 10 (Extinct) | India |
Ullatan | ull
|
? | ? | No known L1 speakers. Ethnic population: 16,700 (2001 census). | 9 (Dormant) | India |
Malaryan | mjq
|
? | ? | No known L1 speakers. Ethnic population: 35,000 (2001 census). | 9 (Dormant) | India |
Urali | url
|
? | ? | No known L1 speakers. Ethnic population: 6,440 (2001 census). | 9 (Dormant) | India |
Bazigar | bfr
|
? | ? | 58,200 (1981 census). Ethnic population: 800,000. | 7 (Shifting) | India |
Bellari | brw
|
India | 4 (Critically endangered) | 1,000 (Van Driem 2007). | 7 (Shifting) | India |
Manna-Dora | mju
|
? | ? | Population unknown. Ethnic population: 30,000. | 7 (Shifting) | India |
Vishavan | vis
|
? | ? | 150 (Shashi and Shri 1994). | 6b (Threatened) | India |
Malankuravan | mjo
|
? | ? | 18,600 (2001 census). | 6a (Vigorous) | India |
Kamar | keq
|
? | ? | 40,000 (2003 BI). | 6a (Vigorous) | India |
Bharia | bha
|
? | ? | 197,000 (1981 census). | 6a (Vigorous) | India |
Allar | all
|
? | ? | 350 (Shashi and Shri 1994). | 6b (Threatened) | India |
Thachanadan | thn
|
? | ? | 3,000 (2004 survey). | 6b (Threatened) | India |
Malasar | ymr
|
? | ? | 7,760 (2001 census). | 6a (Vigorous) | India |
Mala Malasar | ima
|
? | ? | 1,000 (2004). | 6a (Vigorous) | India |
Koraga, Mudu | vmd
|
? | ? | ? | 6b (Threatened) | India |
Koraga, Korra | kfd
|
India | 4 (Critically endangered) | 14,000 (2007 census). | 6b (Threatened) | India |
Kudiya | kfg
|
? | ? | 2,800 (2007). | 6a (Vigorous) | India |
Toda | tcx
|
India | 4 (Critically endangered) | 1,560 (2001 census). | 6a (Vigorous) | India |
Sholaga | sle
|
? | ? | 24,000 (2006 IMB). | 6a (Vigorous) | India |
Kanikkaran | kev
|
? | ? | 19,000 (2007). Ethnic population: 19,000. | 6a (Vigorous) | India |
Kaikadi | kep
|
? | ? | 23,700 (2001 census). | 6a (Vigorous) | India |
Eravallan | era
|
? | ? | 5,000 (2001). Ethnic population: 5,440 (2001 census). 3,890 in Kerala and 1,560 in Tamil Nadu. | 6a (Vigorous) | India |
Paliyan | pcf
|
? | ? | 9,520 (2001 census). | 6a (Vigorous) | India |
Malavedan | mjr
|
? | ? | Total population unknown. 6,190 in Kerala, 6,410 in Tamil Nadu (2001 census). Ethnic population: 12,600 (2001 census). | 6b (Threatened) | India |
Malapandaram | mjp
|
? | ? | 5,850 (2001 census). | 6a (Vigorous) | India |
Kadar | kej
|
? | ? | 1,960 (2004 survey), decreasing. | 6b (Threatened) | India |
Aranadan | aaf
|
? | ? | 200 (2001 census). | 6b (Threatened) | India |
Mannan | mjv
|
? | ? | 7,850 (2001 census). | 6b (Threatened) | India |
Kurumba, Mullu | kpb
|
? | ? | 26,000 (2004 survey). 25,000 in Wayanad; 1,000 in Gudalur of Nilgiri. | 6a (Vigorous) | India |
Kurumba, Alu | xua
|
? | ? | 2,500 (1997). | 6a (Vigorous) | India |
Holiya | hoy
|
? | ? | 500 (2002 survey). | 6a (Vigorous) | India |
Pathiya | pty
|
? | ? | 1,000 (2004 SIL). | 6a (Vigorous) | India |
Kurichiya | kfh
|
? | ? | 29,400 (2004 survey). Ethnic population: 32,800 (2001 census). | 6b (Threatened) | India |
Kunduvadi | wku
|
? | ? | 1,000 (2004 SIL). | 6a (Vigorous) | India |
Kumbaran | wkb
|
? | ? | 10,000 (2004 NLCI). | 6b (Threatened) | India |
Kalanadi | wkl
|
? | ? | 750 (2004 survey). | 6a (Vigorous) | India |
Waddar | wbq
|
? | ? | 172,000 (2001 census). Ethnic population: In India, Pakistan, Nepal, and Sri Lanka is about 3 million (2003 IMA). | 6a (Vigorous) | India |
Savara | svr
|
? | ? | 253,000 (2001). | 6a (Vigorous) | India |
Chenchu | cde
|
? | ? | 26,000 (2007). | 6a (Vigorous) | India |
Pengo | peg
|
India | 4 (Critically endangered) | Ethnic population: 350,000 (2000). | 6a (Vigorous) | India |
Manda | mha
|
India | 4 (Critically endangered) | 4,040 (2000). | 6a (Vigorous) | India |
Kui | kxu
|
India | 1 (Vulnerable) | 916,000 (2001 census). Ethnic population: 410,000 ethnic Kui Khond who speak Kui plus additional ethnic groups. | 6b (Threatened) | India |
Mukha-Dora | mmk
|
? | ? | 29,700 (1991 census). | 6a (Vigorous) | India |
Pardhan | pch
|
? | ? | 135,000 (2007). Ethnic population: 347,000. | 6b (Threatened) | India |
Muria, Western | mut
|
? | ? | 400,000 (2000 IICCC). | 6a (Vigorous) | India |
Muria, Far Western | fmu
|
? | ? | 400,000 (2007). | 6a (Vigorous) | India |
Muria, Eastern | emu
|
? | ? | 200,000 (2007). | 6a (Vigorous) | India |
Maria, Dandami | daq
|
? | ? | 200,000 (2000). | 6a (Vigorous) | India |
Maria | mrr
|
? | ? | 165,000 (2000). 141,000 Maria and 23,700 Hill Maria. | 6a (Vigorous) | India |
Khirwar | kwx
|
? | ? | 34,300. | 6a (Vigorous) | India |
Kurux, Nepali | kxl
|
? | ? | 28,600 (2001 census), decreasing. No monolinguals (2002 UNESCO). Ethnic population: 41,800 Dhagar (Jhagar). | 6b (Threatened) | Nepal |
Kumarbhag Paharia | kmj
|
India | 2 (Definitely endangered) | 12,500 (Bhaskararao 2006). | 6a (Vigorous) | India |
Gadaba, Pottangi Ollar | gdb
|
? | ? | 15,000 (2002 M. Kurian). 4,000–7,000 in Koraput District, Pottangi block (1995). | 6a (Vigorous) | India |
Duruwa | pci
|
India | 4 (Critically endangered) | 51,200 (2001 census). Ethnic population: 100,000 (1986); 65% in Bastar, 35% in Koraput. | 6a (Vigorous) | India |
Kolami, Southeastern | nit
|
India | 4 (Critically endangered) | 10,000 (1989 F. Blair). 1,500 speakers of Naiki (Van Driem 2007). | 6a (Vigorous) | India |
Tulu | tcy
|
India | 1 (Vulnerable) | 1,720,000 (2001 census). | 5 (Developing) | India |
Chetti, Wayanad | ctt
|
? | ? | 5,000 (2004). | 5 (Developing) | India |
Kota | kfe
|
India | 4 (Critically endangered) | 930 (2001 census). Ethnic population: 1,400. | 5 (Developing) | India |
Yerukula | yeu
|
India | 2 (Definitely endangered) | 69,500 (2001 census). | 5 (Developing) | India |
Muthuvan | muv
|
? | ? | 16,800 (2006 IMB). | 5 (Developing) | India |
Kurumba, Betta | xub
|
? | ? | 32,000 (2003 NLCI), increasing. | 5 (Developing) | India |
Irula | iru
|
India | 1 (Vulnerable) | 200,000 (2003 E. Udayakumar). | 5 (Developing) | India |
Ravula | yea
|
? | ? | 26,900 (2007). Ethnic population: 47,000 (2007). | 5 (Developing) | India |
Paniya | pcg
|
? | ? | 94,000 (2003). | 5 (Developing) | India |
Kurumba, Kannada | kfi
|
India | 4 (Critically endangered) | 180,000 (2000). | 5 (Developing) | India |
Kurumba, Jennu | xuj
|
? | ? | 35,000 (IMA 1997). | 5 (Developing) | India |
Kodava | kfa
|
India | 2 (Definitely endangered) | 200,000 (2001). Ethnic population: 100,000 in Kodagu District plus 100,000 in other districts of Karnataka and major cities of India. | 5 (Developing) | India |
Badaga | bfq
|
India | 2 (Definitely endangered) | 135,000 (2001 census). | 5 (Developing) | India |
Muduga | udg
|
? | ? | 3,370 (1991 census). | 5 (Developing) | India |
Kurumba, Attapady | pkr
|
? | ? | 1,370 (1991 census). | 5 (Developing) | India |
Kuvi | kxv
|
India | 2 (Definitely endangered) | 158,000 (2001 census). | 5 (Developing) | India |
Koya | kff
|
? | ? | 362,000 (2001 census). | 5 (Developing) | India |
Konda-Dora | kfc
|
India | 2 (Definitely endangered) | 20,000 (2007 WFA). | 5 (Developing) | India |
Gondi, Southern | ggo
|
? | ? | 100,000 (2004 SIL). | 5 (Developing) | India |
Gondi, Northern | gno
|
? | ? | 1,950,000 (1997 BSI). 2,630,000 all Gondi. | 5 (Developing) | India |
Sauria Paharia | mjt
|
? | ? | 54,000 in India (Bhaskararao 2006). Population total all countries: 61,000. | 5 (Developing) | India |
Kurux | kru
|
Bangladesh | 1 (Vulnerable) | 1,890,000 in India (2001 census). 1,750,000 Kurukh or Oraon, 141,000 Kisan. Population total all countries: 1,944,200. | 5 (Developing) | India |
Brahui | brh
|
Iran (Islamic Republic of), Pakistan, Afghanistan | 1 (Vulnerable) | 4,000,000 in Pakistan (2011). Population total all countries: 4,220,000. | 5 (Developing) | Pakistan |
Gadaba, Mudhili | gau
|
? | ? | 8,000 (2000 IICCC). | 5 (Developing) | India |
Kolami, Northwestern | kfb
|
India | 2 (Definitely endangered) | 122,000 (2001 census). All Kolami 115,000 (1997). | 5 (Developing) | India |
Tamil | tam
|
? | ? | 60,700,000 in India (2001 census). Population total all countries: 68,763,360. | 2 (Provincial) | India |
Malayalam | mal
|
? | ? | 33,000,000 in India (2001 census). Population total all countries: 33,534,600. | 2 (Provincial) | India |
Kannada | kan
|
? | ? | 37,700,000 in India (2001 census). Population total all countries: 37,739,040. | 2 (Provincial) | India |
Telugu | tel
|
? | ? | 73,800,000 in India (2001 census). Population total all countries: 74,049,000. | 2 (Provincial) | India |
This article uses material from the Wikipedia article "Semetic languages", which is released under the Creative Commons Attribution-Share-Alike License 3.0.