Difference between revisions of "Dravidian languages"

From Apertium
Jump to navigation Jump to search
Line 110: Line 110:


==Vulnerability==
==Vulnerability==
This table summarizes the vulnerability of various Semetic languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, [http://www.unesco.org/culture/languages-atlas http://www.unesco.org/culture/languages-atlas]’ and [http://www.ethnologue.com/ Ethnologue].
This table summarizes the vulnerability of various Dravidian languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, [http://www.unesco.org/culture/languages-atlas http://www.unesco.org/culture/languages-atlas]’ and [http://www.ethnologue.com/ Ethnologue].


{| class="wikitable sortable"
{| class="wikitable sortable"

Revision as of 02:08, 3 January 2014

The Semetic languages (sem) constitute a group of related languages and a branch of the Afro-Asiatic language family. Spoken by more than 470 million people throughout North Africa and Southwest Asia, the most widely spoken Semetic languages are Arabic, Maltese, Hebrew, Amharic, and Tigrigna.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status

The ultimate goal is to have multi-purposable transducers for a variety of Semetic languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

Transducers

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name language native name ISO 639 formalism state stems paradigms coverage location primary authors
-2 -3
apertium-heb Hebrew עִבְרִית he heb lttoolbox development apertium-ara-heb (incubator) missmaryx
apertium-mlt Maltese Malti mt mlt lttoolbox development 7,371 758 apertium-mlt (languages) Fran, Unhammer, Fronczak
apertium-ara Arabic العربية ar ara lttoolbox development apertium-ara-heb (incubator) missmaryx

Existing language pairs

Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.

tel mal
tel -
mal -
eng eng-tel
1
mal-eng
109

Semetic languages by subgroup

There are six fairly uncontroversial nodes within the Semitic languages:

  • East Semitic languages: Akkadian, Eblaite (extinct)
  • Central Semitic languages
  • South Semitic languages
    • Western: Ethiopic languages (Amharic, Tigrinya, etc.) and Old South Arabian languages (Sabaean, Minaean, Qatabānian, Ḥaḑramitic, etc.)
    • Eastern: Modern South Arabian languages (Bathari, Harsusi, Hobyót, Mehri, Shehri, Soqotri)

Samples

Article 1 of the Universal Declaration of Human Rights:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Language Text
Kannada ಎಲ್ಲಾ ಮಾನವರೂ ಸ್ವತಂತ್ರರಾಗಿಯೇ ಜನಿಸಿದ್ಧಾರೆ. ಹಾಗೂ ಘನತೆ ಮತ್ತು ಹಕ್ಕುಗಳಲ್ಲಿ ಸಮಾನರಾಗಿದ್ದಾರೆ. ವಿವೇಕ ಮತ್ತು ಅಂತಃಕರಣ ಗಳನ್ನು ಪದೆದವರಾದ್ದ ರಿಂದ ಅವರು ಪರಸ್ಪರ ಸಹೋದರ ಭಾವದಿಂದ ವರ್ತಿಸಚೀಕು.
Malayalam മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും അന്തസ്സോടും സ്വാതന്ത്ര്യത്തോടുംകൂടി ജനിച്ചിട്ടുള്ളവരാണ്‌. അന്യോന്യം ഭ്രാതൃഭാവത്തോടെ പെരുമാറുവാനാണ്‌ മനുഷ്യന്നു വിവേകബുദ്ധിയും മനസ്സാക്ഷിയും സിദ്ധമായിരിക്കുന്നത്‌.
Tamil மனிதப் பிறிவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர் ; அவர்கள் மதிப்பிலும், உரிமைகளிலும் சமமானவர்கள், அவர்கள் நியாயத்தையும் மனச்சாட்சியையும் இயற்பண்பாகப் பெற்றவர்கள். அவர்கள் ஒருவருடனொருவர் சகோதர உணர்வுப் பாங்கில் நடந்துகொள்ளல் வேண்டும்.

Vulnerability

This table summarizes the vulnerability of various Dravidian languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.

Language ISO639-3 UNESCO Ethnologue 
Areas Vulnerability Speakers Status Location
Nagarchal nbg ? ? No remaining speakers. 10 (Extinct) India
Ullatan ull ? ? No known L1 speakers. Ethnic population: 16,700 (2001 census). 9 (Dormant) India
Malaryan mjq ? ? No known L1 speakers. Ethnic population: 35,000 (2001 census). 9 (Dormant) India
Urali url ? ? No known L1 speakers. Ethnic population: 6,440 (2001 census). 9 (Dormant) India
Bazigar bfr ? ? 58,200 (1981 census). Ethnic population: 800,000. 7 (Shifting) India
Bellari brw India 4 (Critically endangered) 1,000 (Van Driem 2007). 7 (Shifting) India
Manna-Dora mju ? ? Population unknown. Ethnic population: 30,000. 7 (Shifting) India
Vishavan vis ? ? 150 (Shashi and Shri 1994). 6b (Threatened) India
Malankuravan mjo ? ? 18,600 (2001 census). 6a (Vigorous) India
Kamar keq ? ? 40,000 (2003 BI). 6a (Vigorous) India
Bharia bha ? ? 197,000 (1981 census). 6a (Vigorous) India
Allar all ? ? 350 (Shashi and Shri 1994). 6b (Threatened) India
Thachanadan thn ? ? 3,000 (2004 survey). 6b (Threatened) India
Malasar ymr ? ? 7,760 (2001 census). 6a (Vigorous) India
Mala Malasar ima ? ? 1,000 (2004). 6a (Vigorous) India
Koraga, Mudu vmd ? ? ? 6b (Threatened) India
Koraga, Korra kfd India 4 (Critically endangered) 14,000 (2007 census). 6b (Threatened) India
Kudiya kfg ? ? 2,800 (2007). 6a (Vigorous) India
Toda tcx India 4 (Critically endangered) 1,560 (2001 census). 6a (Vigorous) India
Sholaga sle ? ? 24,000 (2006 IMB). 6a (Vigorous) India
Kanikkaran kev ? ? 19,000 (2007). Ethnic population: 19,000. 6a (Vigorous) India
Kaikadi kep ? ? 23,700 (2001 census). 6a (Vigorous) India
Eravallan era ? ? 5,000 (2001). Ethnic population: 5,440 (2001 census). 3,890 in Kerala and 1,560 in Tamil Nadu. 6a (Vigorous) India
Paliyan pcf ? ? 9,520 (2001 census). 6a (Vigorous) India
Malavedan mjr ? ? Total population unknown. 6,190 in Kerala, 6,410 in Tamil Nadu (2001 census). Ethnic population: 12,600 (2001 census). 6b (Threatened) India
Malapandaram mjp ? ? 5,850 (2001 census). 6a (Vigorous) India
Kadar kej ? ? 1,960 (2004 survey), decreasing. 6b (Threatened) India
Aranadan aaf ? ? 200 (2001 census). 6b (Threatened) India
Mannan mjv ? ? 7,850 (2001 census). 6b (Threatened) India
Kurumba, Mullu kpb ? ? 26,000 (2004 survey). 25,000 in Wayanad; 1,000 in Gudalur of Nilgiri. 6a (Vigorous) India
Kurumba, Alu xua ? ? 2,500 (1997). 6a (Vigorous) India
Holiya hoy ? ? 500 (2002 survey). 6a (Vigorous) India
Pathiya pty ? ? 1,000 (2004 SIL). 6a (Vigorous) India
Kurichiya kfh ? ? 29,400 (2004 survey). Ethnic population: 32,800 (2001 census). 6b (Threatened) India
Kunduvadi wku ? ? 1,000 (2004 SIL). 6a (Vigorous) India
Kumbaran wkb ? ? 10,000 (2004 NLCI). 6b (Threatened) India
Kalanadi wkl ? ? 750 (2004 survey). 6a (Vigorous) India
Waddar wbq ? ? 172,000 (2001 census). Ethnic population: In India, Pakistan, Nepal, and Sri Lanka is about 3 million (2003 IMA). 6a (Vigorous) India
Savara svr ? ? 253,000 (2001). 6a (Vigorous) India
Chenchu cde ? ? 26,000 (2007). 6a (Vigorous) India
Pengo peg India 4 (Critically endangered) Ethnic population: 350,000 (2000). 6a (Vigorous) India
Manda mha India 4 (Critically endangered) 4,040 (2000). 6a (Vigorous) India
Kui kxu India 1 (Vulnerable) 916,000 (2001 census). Ethnic population: 410,000 ethnic Kui Khond who speak Kui plus additional ethnic groups. 6b (Threatened) India
Mukha-Dora mmk ? ? 29,700 (1991 census). 6a (Vigorous) India
Pardhan pch ? ? 135,000 (2007). Ethnic population: 347,000. 6b (Threatened) India
Muria, Western mut ? ? 400,000 (2000 IICCC). 6a (Vigorous) India
Muria, Far Western fmu ? ? 400,000 (2007). 6a (Vigorous) India
Muria, Eastern emu ? ? 200,000 (2007). 6a (Vigorous) India
Maria, Dandami daq ? ? 200,000 (2000). 6a (Vigorous) India
Maria mrr ? ? 165,000 (2000). 141,000 Maria and 23,700 Hill Maria. 6a (Vigorous) India
Khirwar kwx ? ? 34,300. 6a (Vigorous) India
Kurux, Nepali kxl ? ? 28,600 (2001 census), decreasing. No monolinguals (2002 UNESCO). Ethnic population: 41,800 Dhagar (Jhagar). 6b (Threatened) Nepal
Kumarbhag Paharia kmj India 2 (Definitely endangered) 12,500 (Bhaskararao 2006). 6a (Vigorous) India
Gadaba, Pottangi Ollar gdb ? ? 15,000 (2002 M. Kurian). 4,000–7,000 in Koraput District, Pottangi block (1995). 6a (Vigorous) India
Duruwa pci India 4 (Critically endangered) 51,200 (2001 census). Ethnic population: 100,000 (1986); 65% in Bastar, 35% in Koraput. 6a (Vigorous) India
Kolami, Southeastern nit India 4 (Critically endangered) 10,000 (1989 F. Blair). 1,500 speakers of Naiki (Van Driem 2007). 6a (Vigorous) India
Tulu tcy India 1 (Vulnerable) 1,720,000 (2001 census). 5 (Developing) India
Chetti, Wayanad ctt ? ? 5,000 (2004). 5 (Developing) India
Kota kfe India 4 (Critically endangered) 930 (2001 census). Ethnic population: 1,400. 5 (Developing) India
Yerukula yeu India 2 (Definitely endangered) 69,500 (2001 census). 5 (Developing) India
Muthuvan muv ? ? 16,800 (2006 IMB). 5 (Developing) India
Kurumba, Betta xub ? ? 32,000 (2003 NLCI), increasing. 5 (Developing) India
Irula iru India 1 (Vulnerable) 200,000 (2003 E. Udayakumar). 5 (Developing) India
Ravula yea ? ? 26,900 (2007). Ethnic population: 47,000 (2007). 5 (Developing) India
Paniya pcg ? ? 94,000 (2003). 5 (Developing) India
Kurumba, Kannada kfi India 4 (Critically endangered) 180,000 (2000). 5 (Developing) India
Kurumba, Jennu xuj ? ? 35,000 (IMA 1997). 5 (Developing) India
Kodava kfa India 2 (Definitely endangered) 200,000 (2001). Ethnic population: 100,000 in Kodagu District plus 100,000 in other districts of Karnataka and major cities of India. 5 (Developing) India
Badaga bfq India 2 (Definitely endangered) 135,000 (2001 census). 5 (Developing) India
Muduga udg ? ? 3,370 (1991 census). 5 (Developing) India
Kurumba, Attapady pkr ? ? 1,370 (1991 census). 5 (Developing) India
Kuvi kxv India 2 (Definitely endangered) 158,000 (2001 census). 5 (Developing) India
Koya kff ? ? 362,000 (2001 census). 5 (Developing) India
Konda-Dora kfc India 2 (Definitely endangered) 20,000 (2007 WFA). 5 (Developing) India
Gondi, Southern ggo ? ? 100,000 (2004 SIL). 5 (Developing) India
Gondi, Northern gno ? ? 1,950,000 (1997 BSI). 2,630,000 all Gondi. 5 (Developing) India
Sauria Paharia mjt ? ? 54,000 in India (Bhaskararao 2006). Population total all countries: 61,000. 5 (Developing) India
Kurux kru Bangladesh 1 (Vulnerable) 1,890,000 in India (2001 census). 1,750,000 Kurukh or Oraon, 141,000 Kisan. Population total all countries: 1,944,200. 5 (Developing) India
Brahui brh Iran (Islamic Republic of), Pakistan, Afghanistan 1 (Vulnerable) 4,000,000 in Pakistan (2011). Population total all countries: 4,220,000. 5 (Developing) Pakistan
Gadaba, Mudhili gau ? ? 8,000 (2000 IICCC). 5 (Developing) India
Kolami, Northwestern kfb India 2 (Definitely endangered) 122,000 (2001 census). All Kolami 115,000 (1997). 5 (Developing) India
Tamil tam ? ? 60,700,000 in India (2001 census). Population total all countries: 68,763,360. 2 (Provincial) India
Malayalam mal ? ? 33,000,000 in India (2001 census). Population total all countries: 33,534,600. 2 (Provincial) India
Kannada kan ? ? 37,700,000 in India (2001 census). Population total all countries: 37,739,040. 2 (Provincial) India
Telugu tel ? ? 73,800,000 in India (2001 census). Population total all countries: 74,049,000. 2 (Provincial) India

This article uses material from the Wikipedia article "Semetic languages", which is released under the Creative Commons Attribution-Share-Alike License 3.0.