https://wiki.apertium.org/w/api.php?action=feedcontributions&user=Rocky+734&feedformat=atomApertium - User contributions [en]2024-03-28T12:09:59ZUser contributionsMediaWiki 1.34.1https://wiki.apertium.org/w/index.php?title=Apertium-sat&diff=73856Apertium-sat2022-01-30T05:06:30Z<p>Rocky 734: ᱜᱚᱲᱚ ᱠᱚ ᱥᱮᱞᱮᱫ ᱦᱩᱭᱮᱱᱟ ᱾</p>
<hr />
<div>[[Category:Santali]]<br />
<br />
ᱱᱟᱶᱟ ᱟᱹᱲᱟᱹ ᱟᱦᱟᱨ ᱯᱮᱠᱮᱡᱽ ᱨᱮ ᱫᱚ ᱟᱭᱢᱟ ᱨᱮᱫ ᱠᱚ ᱢᱮᱱᱟᱜᱼᱟ ᱡᱟᱦᱟᱸ ᱵᱷᱤᱛᱨᱭ ᱠᱷᱚᱱ .dix ᱨᱮᱫ ᱨᱮᱭᱟᱜ ᱢᱚᱦᱚᱛ ᱟᱹᱰᱤᱜᱟᱱ ᱢᱮᱱᱟᱜᱼᱟ ᱾ ᱡᱩᱫᱤ ᱠᱷᱟᱹᱞᱤ ᱱᱟᱶᱟ ᱟᱹᱲᱟᱹ ᱥᱮᱞᱮᱫ ᱨᱮᱭᱟᱜ ᱛᱟᱦᱮᱸᱱ ᱠᱷᱟᱱ .dix ᱨᱮᱫ ᱨᱮᱜᱮ ᱟᱹᱲᱟᱹ ᱥᱮᱞᱮᱫ ᱦᱩᱭᱩᱜᱼᱟ ᱾ ᱟᱨᱦᱚᱸ ᱮᱴᱟᱜ ᱨᱮᱫ ᱠᱚ ᱢᱮᱱᱟᱜᱼᱟ ᱡᱮᱢᱚᱱ .rlx ᱡᱟᱦᱟᱸ ᱛᱮ ᱫᱚ ᱱᱤᱭᱟᱢ ᱠᱚ ᱚᱞ ᱠᱚᱜ ᱠᱟᱱᱟ, ᱱᱟᱶᱟ ᱨᱮᱫ ᱫᱚ ᱠᱚᱢ ᱜᱮ ᱥᱟᱯᱲᱟᱣ ᱦᱩᱭᱩᱜᱼᱟ ᱾<br />
<br />
<br />
ᱱᱚᱶᱟ Apertium-sat ᱯᱮᱠᱮᱡᱽ ᱫᱚ ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮᱭᱟᱜ ᱢᱤᱫᱴᱟᱹᱝ ᱮᱯᱟᱹᱨᱴᱤᱭᱟᱹᱢ ᱢᱚᱱᱚᱞᱤᱝᱜᱩᱣᱟᱞ ᱯᱮᱠᱮᱡᱽ ᱠᱟᱱᱟ, ᱱᱚᱶᱟ ᱛᱮ ᱟᱯᱮ ᱱᱚᱶᱟ ᱡᱤᱱᱤᱥ ᱠᱚᱯᱮ ᱠᱚᱨᱟᱣ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱼ<br />
<br />
* ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱮᱱᱟᱞᱤᱥᱭᱥ<br />
* ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱡᱮᱱᱮᱨᱮᱥᱚᱱ<br />
* ᱥᱟᱱᱛᱟᱲᱤ ᱛᱮ ᱯᱟᱨᱴᱼᱚᱯᱷᱼᱥᱯᱤᱪᱷ ᱴᱮᱜᱤᱝ<br />
<br />
== ᱜᱚᱲᱚ ==<br />
<br />
ᱡᱟᱦᱱᱟᱜ ᱮᱴᱠᱮᱴᱚᱬᱮ ᱟᱨ ᱵᱟᱝᱠᱷᱟᱱ ᱡᱟᱦᱟᱸ ᱜᱮ ᱵᱟᱯᱮ ᱵᱩᱡᱷᱟᱹᱣ ᱫᱟᱲᱮᱭᱟᱜ ᱠᱷᱟᱱ ᱫᱚ ᱱᱟᱶᱟ [https://wiki.apertium.org/wiki/IRC IRC] ᱪᱮᱱᱮᱞ ᱨᱮ ᱢᱩᱞ(ᱤᱝᱨᱟᱡᱤ) ᱯᱟᱹᱨᱥᱤ ᱛᱮ ᱠᱩᱠᱞᱤ ᱫᱚᱦᱚ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱯᱮ ᱾ <br />
<br />
ᱚᱱᱟ IRC ᱪᱮᱱᱮᱞ ᱨᱮ ᱫᱚ ᱞᱚᱜᱚᱱ ᱜᱮ ᱠᱩᱠᱞᱤ ᱨᱮᱭᱟᱜ ᱛᱮᱞᱟ ᱯᱮ ᱧᱟᱢᱟ ᱯᱮ ᱾ ᱚᱱᱟ IRC ᱨᱮᱭᱟᱜ ᱢᱟᱨᱮ ᱪᱟᱴ ᱦᱚᱸ ᱯᱮ ᱧᱟᱢ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱱᱟᱶᱟ [https://tinodidriksen.com/pisg/OFTC/logs/#apertium/ ᱞᱤᱝᱠ] ᱨᱮ ᱾<br />
<br />
== ᱟᱨᱦᱚᱸ ᱧᱮᱞᱢᱮ ==<br />
*[[Santali]]<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Apertium-sat&diff=73855Apertium-sat2022-01-30T04:57:34Z<p>Rocky 734: ᱚᱞ ᱥᱮᱞᱮᱫ ᱦᱩᱭᱮᱱᱟ</p>
<hr />
<div>[[Category:Santali]]<br />
<br />
ᱱᱟᱶᱟ ᱟᱹᱲᱟᱹ ᱟᱦᱟᱨ ᱯᱮᱠᱮᱡᱽ ᱨᱮ ᱫᱚ ᱟᱭᱢᱟ ᱨᱮᱫ ᱠᱚ ᱢᱮᱱᱟᱜᱼᱟ ᱡᱟᱦᱟᱸ ᱵᱷᱤᱛᱨᱭ ᱠᱷᱚᱱ .dix ᱨᱮᱫ ᱨᱮᱭᱟᱜ ᱢᱚᱦᱚᱛ ᱟᱹᱰᱤᱜᱟᱱ ᱢᱮᱱᱟᱜᱼᱟ ᱾ ᱡᱩᱫᱤ ᱠᱷᱟᱹᱞᱤ ᱱᱟᱶᱟ ᱟᱹᱲᱟᱹ ᱥᱮᱞᱮᱫ ᱨᱮᱭᱟᱜ ᱛᱟᱦᱮᱸᱱ ᱠᱷᱟᱱ .dix ᱨᱮᱫ ᱨᱮᱜᱮ ᱟᱹᱲᱟᱹ ᱥᱮᱞᱮᱫ ᱦᱩᱭᱩᱜᱼᱟ ᱾ ᱟᱨᱦᱚᱸ ᱮᱴᱟᱜ ᱨᱮᱫ ᱠᱚ ᱢᱮᱱᱟᱜᱼᱟ ᱡᱮᱢᱚᱱ .rlx ᱡᱟᱦᱟᱸ ᱛᱮ ᱫᱚ ᱱᱤᱭᱟᱢ ᱠᱚ ᱚᱞ ᱠᱚᱜ ᱠᱟᱱᱟ, ᱱᱟᱶᱟ ᱨᱮᱫ ᱫᱚ ᱠᱚᱢ ᱜᱮ ᱥᱟᱯᱲᱟᱣ ᱦᱩᱭᱩᱜᱼᱟ ᱾<br />
<br />
<br />
ᱱᱚᱶᱟ Apertium-sat ᱯᱮᱠᱮᱡᱽ ᱫᱚ ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮᱭᱟᱜ ᱢᱤᱫᱴᱟᱹᱝ ᱮᱯᱟᱹᱨᱴᱤᱭᱟᱹᱢ ᱢᱚᱱᱚᱞᱤᱝᱜᱩᱣᱟᱞ ᱯᱮᱠᱮᱡᱽ ᱠᱟᱱᱟ, ᱱᱚᱶᱟ ᱛᱮ ᱟᱯᱮ ᱱᱚᱶᱟ ᱡᱤᱱᱤᱥ ᱠᱚᱯᱮ ᱠᱚᱨᱟᱣ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱼ<br />
<br />
* ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱮᱱᱟᱞᱤᱥᱭᱥ<br />
* ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱡᱮᱱᱮᱨᱮᱥᱚᱱ<br />
* ᱥᱟᱱᱛᱟᱲᱤ ᱛᱮ ᱯᱟᱨᱴᱼᱚᱯᱷᱼᱥᱯᱤᱪᱷ ᱴᱮᱜᱤᱝ<br />
<br />
== ᱟᱨᱦᱚᱸ ᱧᱮᱞᱢᱮ ==<br />
*[[Santali]]<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Apertium-eng-sat&diff=73852Apertium-eng-sat2022-01-18T09:35:53Z<p>Rocky 734: </p>
<hr />
<div>[[Category:Santali]]<br />
<br />
ᱱᱚᱶᱟ ᱫᱚ ᱤᱝᱨᱟᱡᱤ ᱠᱷᱚᱱ ᱥᱟᱱᱛᱟᱲᱤ ᱛᱮ ᱛᱚᱨᱡᱚᱢᱟ ᱨᱮᱭᱟᱜ ᱢᱤᱫᱴᱟᱹᱝ ᱮᱯᱟᱹᱨᱴᱤᱭᱟᱹᱢ ᱯᱟᱹᱨᱥᱤ ᱡᱚᱲ ᱠᱟᱱᱟ, ᱱᱚᱶᱟ ᱛᱮ ᱟᱢ ᱱᱚᱶᱟ ᱡᱤᱱᱤᱥ ᱠᱚᱢ ᱠᱚᱨᱟᱣ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱼ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱛᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮ ᱛᱚᱨᱡᱚᱢᱟ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱛᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱮᱱᱟᱞᱤᱥᱭᱥ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱛᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮᱭᱟᱜ ᱯᱟᱨᱴᱼᱚᱯᱷᱼᱥᱯᱤᱪᱷ ᱴᱮᱜᱚᱨ<br />
<br />
== ᱟᱨᱦᱚᱸ ᱧᱮᱞᱢᱮ ==<br />
*[[Santali]]<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Santali&diff=73851Santali2022-01-18T09:26:58Z<p>Rocky 734: /* Books */</p>
<hr />
<div>[[Category:Santali]]<br />
[[Category:Languages]]<br />
<br />
'''Santali''' or '''Santhali''' is the most widely spoken language of the Munda subfamily of the Austroasiatic languages, related to Ho and Mundari, spoken mainly in the Indian states of Assam, Bihar, Jharkhand, Mizoram, Odisha, Tripura and West Bengal. It is a recognized regional language of India per the Eighth Schedule of the Indian Constitution. It is spoken by around 7.6&nbsp;million people in India, Bangladesh, Bhutan and Nepal, making it the third most-spoken Austroasiatic language after Vietnamese and Khmer. Santali was a mainly oral language until the development of Ol Chiki by '''Pandit Raghunath Murmu''' in 1925. Ol Chiki is an alphabetic script, sharing none of the syllabic properties of the other Indic scripts, and is now widely used to write Santali language in India. Before the invention of Ol Chiki script Santali language was used to be written in Roman/latin, Devanagari and Kalinga script.<br />
<br />
== Resources ==<br />
=== Apertium Resources ===<br />
* One Monolingual Dictionary - [[Apertium-sat]]<br />
* One English-Santali Bilingual Dictionay - [[Apertium-eng-sat]]<br />
=== literature ===<br />
* Neukom, L. (2000). Argument marking in Santali. MonKhmer Studies, 95-114.<br />
* Marandi, C., & Maringanti, H. B. [https://d1wqtxts1xzle7.cloudfront.net/54279769/barii_pda_university_conference_papaer-with-cover-page-v2.pdf?Expires=1641826728&Signature=DS5JzbqCFQRw9hEPjL~xTaFLfKURQiaiZzOoMI5sxfBX5lTPNKC5o88s8a4bwvcAesVu9zu1qBPcpaQ1UPdND3hOuh9g41xGu2VqnFviBY0t29CHC9ZTh05D9qmbRh7uuNArYatcI-xvG0cF3Mr2VUk4lR7DAGwAVrNnrQrjrxHnEC-0stsujQQXvDB-3rR9mYTb6fFm6OIA1T-CN4hzbTdiSY-86UuPHXyRgmipxticu0Ss2D~SO7~Fz5kXcHZXApBehMbT-nw0QScCigLtFoHb~BoaxhUiEPZ38-DNfETNjQEQ3pgoaRtXsZybnpcn1wCk72Ofpw75jwBUlf03jw__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA#page=73 Santali Morphological Analysis]. Prof.(Dr.) HIMA BINDU MARINGANTI, 52. pg. no: 74<br />
* Akhtar, M. A. K., Kumar, M., & Sahoo, G. (2017, September). [https://ieeexplore.ieee.org/abstract/document/8125962 Automata for santali language processing]. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 939-943). IEEE.<br />
* Sahoo, S. K., Mishra, B. K., Parida, S., Dash, S. R., Besra, J. N., & Tello, E. V. [https://publications.idiap.ch/attachments/papers/2021/Sahoo_OITSINTERNATIONALCONFERENCEONINFORMATIONTECHNOLOGY(OCIT)_2021.pdf Automatic Dialect Detection for Low Resource Santali Language].<br />
* Dash, S., Sunil Sahoo, Brojo Kishore Mishra, Shantipriya Parida, Jatindra Nath Besra, & Atul Kr. Ojha. (2021). Universal Dependency Treebank for Santali Language. SPAST Abstracts, 1(01). Retrieved from https://spast.org/techrep/article/view/2111<br />
* Basua, J., Hrangkhawlb, T. R., Basuc, T. K., & Majumderd, S. (2021, June). Identification of two tribal languages of India: An experimental study. In Artificial Intelligence and Speech Technology: Proceedings of the 2nd International Conference on Artificial Intelligence and Speech Technology,(AIST2020), 19-20 November, 2020, Delhi, India (p. 221). CRC Press.<br />
<br />
=== Books ===<br />
* Puxley, E. L. (1868). [https://books.google.co.in/books?hl=en&lr=&id=kKcIAAAAQAAJ&oi=fnd&pg=PA1&dq=santali+machine&ots=yrcw6-Z_nv&sig=QKA3nGTTM-8BAIjetrbO9kdw0O8&redir_esc=y#v=onepage&q&f=false A Vocabulary of the Santali Language]. WM Watts.<br />
<br />
*[https://repository.tribal.gov.in/handle/123456789/74101?viewItem=search&cat_handle=123456789/73706 Trilingual Multilingual Odia Tribal Language Dictionary Santali]<br />
<br />
=== Dictionary ===<br />
* Campbell, Andrew. A Santali-English Dictionary. Santal mission press, 1899.<br />
* Campbell, A., & MACPHAIL, R. M. (1933). A Santali-English and English-Santali Dictionary... Edited by RM Macphail. Santal Mission Press.<br />
* Bodding, P. O. 1932–1936. A Santali dictionary (5 volumes).<br />
* Bhaduri, Manindra Bhusan. A Mundari-English Dictionary. Asian Educational Services, 1994.<br />
* Hansdah, R. C., and N. C. Murmu. "A Concise Santali-English Dictionary." (2003).<br />
<br />
=== Other closly related Language Dicitionary === <br />
* Ho Dictionary - Deeney, J. J. (1978). Ho-English dictionary. Xavier Ho Publications.<br />
<br />
== Also See ==<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Apertium-sat&diff=73850Apertium-sat2022-01-18T09:23:46Z<p>Rocky 734: corrected spelling</p>
<hr />
<div>[[Category:Santali]]<br />
<br />
ᱱᱚᱶᱟ ᱫᱚ ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮᱭᱟᱜ ᱢᱤᱫᱴᱟᱹᱝ ᱮᱯᱟᱹᱨᱴᱤᱭᱟᱹᱢ ᱢᱚᱱᱚᱞᱤᱝᱜᱩᱣᱟᱞ ᱯᱮᱠᱮᱡᱽ ᱠᱟᱱᱟ, ᱱᱚᱶᱟ ᱛᱮ ᱟᱢ ᱱᱚᱶᱟ ᱡᱤᱱᱤᱥ ᱠᱚᱢ ᱠᱚᱨᱟᱣ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱼ<br />
<br />
* ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱮᱱᱟᱞᱤᱥᱭᱥ<br />
* ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱡᱮᱱᱮᱨᱮᱥᱚᱱ<br />
* ᱥᱟᱱᱛᱟᱲᱤ ᱛᱮ ᱯᱟᱨᱴᱼᱚᱯᱷᱼᱥᱯᱤᱪᱷ ᱴᱮᱜᱤᱝ<br />
<br />
== ᱟᱨᱦᱚᱸ ᱧᱮᱞᱢᱮ ==<br />
*[[Santali]]<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Indic_languages&diff=73849Indic languages2022-01-18T09:06:59Z<p>Rocky 734: /* Samples */ added Santali sentence and corrected Panjabi to Punjabi</p>
<hr />
<div>{{TOCD}}<br />
The '''Indic languages''' include [[Hindi]], [[Urdu]], [[Bengali]], [[Sanskrit]], and a number of other languages. These languages are the dominant language family of the Indian subcontinent. The number of people that speak an Indic language is upwards of 900,000,000.<br />
<br />
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.<br />
<br />
==Status==<br />
The ultimate goal is to have multi-purposable transducers for a variety of Indic languages. These can then be paired for X→Y translation with the addition of a [[Constraint Grammar|CG]] for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.<br />
<br />
=== Transducers ===<br />
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!rowspan=2| name<br />
!rowspan=2| Language<br />
!colspan=2 class="unsortable"| ISO 639<br />
!rowspan=2| formalism<br />
!rowspan=2| state<br />
!rowspan=2| stems<br />
!rowspan=2| paradigms<br />
!rowspan=2| coverage<br />
!rowspan=2| location<br />
!rowspan=2 class="unsortable"| primary authors<br />
|-class="sortbottom"<br />
! -2<br />
! -3<br />
|-<br />
|| <code>[[apertium-san]]</code><br />
|| [[Sanskrit]]<br />
|| <code>sa</code> <br />
|| <code>san</code> <br />
|| [[lttoolbox]] <br />
|| working <br />
|align="right"| {{#lst:Apertium-san/stats|stems}}<br />
|align="right"| {{#lst:Apertium-san/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-san]] ([[languages]])<br />
|| Amba Kulkarni<br />
|-<br />
<br />
|| <code>[[apertium-hin]]</code><br />
|| [[Hindi]] <br />
|| <code>hi</code> <br />
|| <code>hin</code> <br />
|| [[lttoolbox]] <br />
|| working <br />
|align="right"| {{#lst:apertium-hin/stats|stems}}<br />
|align="right"| {{#lst:apertium-hin/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-hin/stats/average}}%<br />
|| [[apertium-hin]]&nbsp;([[languages]])<br />
|| [[User:Nikant|Nikant]], [[User:darthxaher|Abu Zaher Md. Faridee]], [[User:Francis Tyers|Fran]]<br />
|-<br />
|| <code>[[apertium-ben]]</code><br />
|| [[Bengali]] <br />
|| <code>bn</code> <br />
|| <code>ben</code> <br />
|| [[lttoolbox]] <br />
|| development <br />
|align="right"| {{#lst:apertium-ben/stats|stems}}<br />
|align="right"| {{#lst:apertium-ben/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-ben/stats/average}}%<br />
|| [[apertium-ben]]&nbsp;([[languages]])<br />
|| [[User:darthxaher|Abu Zaher Md. Faridee]]<br />
|-<br />
|| <code>[[apertium-urd]]</code><br />
|| [[Urdu]] <br />
|| <code>ur</code> <br />
|| <code>urd</code> <br />
|| [[lttoolbox]]<br />
|| development <br />
|align="right"| {{#lst:apertium-urd/stats|stems}}<br />
|align="right"| {{#lst:apertium-urd/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-urd/stats/average}}%<br />
|| [[apertium-urd]]&nbsp;([[languages]])<br />
|| Muhammad Humayoun<br />
|-<br />
|| <code>[[apertium-nep]]</code><br />
|| [[Nepali]]<br />
|| <code>ne</code><br />
|| <code>nep</code><br />
|| [[lttoolbox]]<br />
|| prototype<br />
|align="right"| {{#lst:apertium-nep/stats|stems}}<br />
|align="right"| {{#lst:apertium-nep/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-ne-en]]&nbsp;([[incubator]])<br />[[apertium-eo-ne]]&nbsp;([[incubator]])<br />
||<br />
|-<br />
|| <code>[[apertium-mar]]</code><br />
|| [[Marathi]]<br />
|| <code>mr</code><br />
|| <code>mar</code><br />
|| [[lttoolbox]]<br />
|| prototype<br />
|align="right"| {{#lst:apertium-mar/stats|stems}}<br />
|align="right"| {{#lst:apertium-mar/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-mar-eng]]&nbsp;([[incubator]])<br />[[apertium-mr-hi]]&nbsp;([[incubator]])<br />
||<br />
|-<br />
|| <code>[[apertium-sin]]</code><br />
|| [[Sinhala]]<br />
|| <code>si</code><br />
|| <code>sin</code><br />
|| [[lttoolbox]]<br />
|| {{#lst:apertium-sin/stats|state}}<br />
|align="right"| {{#lst:apertium-sin/stats|stems}}<br />
|align="right"| {{#lst:apertium-sin/stats|paradigms}}<br />
|align="center"| -<br />
|| {{#lst:apertium-sin/stats|location}}<br />
|| {{#lst:apertium-sin/stats|authors}}<br />
|-<br />
|-<br />
|| <code>[[apertium-pan]]</code><br />
|| [[Punjabi]]<br />
|| <code>pa</code><br />
|| <code>pan</code><br />
|| —<br />
|| possibly non-existant<br />
|align="right"| -<br />
|align="right"| -<br />
|align="center"| -<br />
|| [[apertium-pa-hi]]&nbsp;([[incubator]])<br />[[apertium-ur-pa]]&nbsp;([[incubator]])<br />
|-<br />
|-<br />
|| <code>[[apertium-sat]]</code><br />
|| [[Santali]]<br />
|| <code>-</code><br />
|| <code>sat</code><br />
|| [[lttoolbox]]<br />
|| development<br />
|align="right"| -<br />
|align="right"| -<br />
|align="center"| -<br />
|| [[apertium-eng-sat]]<br />
|| [[User:Rocky 734|Prasanta Hembram]]<br />
|}<br />
<br />
=== Indic Language Classification ===<br />
* Dardic: [[Pahayi]], [[Khowar]], [[Kohistani]], [[Shina language]], [[Kashiri]]<br />
* Austroasiatic: [[Santali]]<br />
* Northern Zone: <br />
**Central Pahari: [[Garhwali]], [[Kumauni]]<br />
**Eastern Pahari: [[Nepali]]<br />
* North-Western Zone: [[Panjabi]], [[Lahnda]], [[Sindhi]] <br />
**Dogri-Kangri: [[Dogri]], [[Kangri]], [[Mandeali]], etc.<br />
* Western Zone: [[Gujarati]], [[Bhil]], [[Khandeshi]], [[Domari-Romani]]<br />
** Rajasthani: [[Marwari]], [[Rajasthani]]<br />
* [[Hindi]]<br />
* [[Sanskrit]]<br />
* Southern Zone: [[Marathi]], [[Konkani]], [[Urdu]]<br />
** Insular Indic: [[Sinhala]], [[Maldivian]]<br />
* Eastern Zone: [[Bengali]], [[Oriya]], [[Tharu]], [[Santali]]<br />
** Bihari: [[Bhojpuri]], [[Maithili]], etc.<br />
<br />
=== Existing language pairs ===<br />
Text in ''italics'' denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in '''bold''' denotes a stable well-working language pair in trunk and text in '''''bold and italics''''' denotes a pair in staging. Bidix stems as counted with [[dixcounter]] are displayed below.<br />
<br />
{| style="text-align: center;" class="wikitable dixtable"<br />
|- style="background: #ececec"<br />
! !! hin !! ben !! urd !! san !! nep !! mar !! pan !! sin !! asm !! eng !! epo !! fas !! sat<br />
|-<br />
| '''hin''' || - || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_stems}} || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-hin/stats|urd-hin_stems}}''' || || || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/stats|mar-hin_stems}} || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|pa-hi_stems}} || || || || || ||<br />
|-<br />
| '''ben''' || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_stems}} || - || || || || || || || || || || ||<br />
|-<br />
| '''urd''' || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-hin/stats|urd-hin_stems}}''' || || - || || || || ''[[Apertium-ur-pa|ur-pa]]''<br>{{#lst:Apertium-ur-pa/stats|ur-pa_stems}} || || || || || ||<br />
|-<br />
| '''san''' || || || || - || || || || || || || || ||<br />
|-<br />
| '''nep''' || || || || || - || || || || || || || ||<br />
|-<br />
| '''mar''' || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/stats|mar-hin_stems}} || || || || || - || || || || || || ||<br />
|-<br />
| '''pan''' || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|pa-hi_stems}} || || ''[[Apertium-ur-pa|ur-pa]]''<br>{{#lst:Apertium-ur-pa/stats|ur-pa_stems}} || || || || - || || || || || ||<br />
|-<br />
| '''sin''' || || || || || || || || - || || || || ||<br />
|-<br />
| '''asm''' || ''[[Apertium-as-hi|as-hi]]''<br>{{#lst:Apertium-as-hi/stats|as-hi_stems}} || ''[[Apertium-asm-ben|asm-ben]]''<br>{{#lst:Apertium-asm-ben/stats|asm-ben_stems}} || || || || || || || || || || ||<br />
|-<br />
| '''eng''' || [[Apertium-eng-hin|eng-hin]]<br>{{#lst:Apertium-eng-hin/stats|eng-hin_stems}} || ''[[Apertium-bn-en|bn-en]]''<br>{{#lst:Apertium-bn-en/stats|bn-en_stems}} || || || ''[[Apertium-ne-en|ne-en]]''<br>{{#lst:Apertium-ne-en/stats|ne-en_stems}} || ''[[Apertium-mar-eng|mar-eng]]''<br>{{#lst:Apertium-mar-eng/stats|mar-eng_stems}} || || ''[[Apertium-si-en|si-en]]''<br>{{#lst:Apertium-si-en/stats|si-en_stems}} || || || || || ''[[Apertium-eng-sat|eng-sat]]''<br />
|-<br />
| '''epo''' || || || || || ''[[Apertium-eo-ne|eo-ne]]''<br>{{#lst:Apertium-eo-ne/stats|eo-ne_stems}} || || || || || || || ||<br />
|-<br />
| '''fas''' || || || ''[[Apertium-ur-fa|ur-fa]]''<br>{{#lst:Apertium-ur-fa/stats|ur-fa_stems}} || || || || || || || || || || <br />
|-<br />
| '''sat''' || || || || || || || || || || ''[[Apertium-eng-sat|sat-eng]]'' || || ||<br />
|}<br />
<br />
==Samples==<br />
Article 1 of the Universal Declaration of Human Rights:<br />
<br />
''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.''<br />
<br />
{|class=wikitable<br />
! Language !! Text<br />
|-<br />
|| Bengali || সমস্ত মানুষ স্বাধীনভাবে সমান মর্যাদা এবং অধিকার নিয়ে জন্মগ্রহণ করে। তাঁদের বিবেক এবং বুদ্ধি আছে ; সুতরাং সকলেরই একে অপরের প্রতি ভ্রাতৃত্বসুলভ মনোভাব নিয়ে আচরণ করা উচিত্।<br />
|-<br />
|| Hindi || सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है । उन्हें बुद्घि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिए ।<br />
|-<br />
|| Marathi || सर्व मानवी व्यक्ति जन्मतःच स्वतंत्र आहेत व त्यांना समान प्रतिष्ठा व समान अधिकार आहेत. त्यांना विचारशक्ति व सदसविद्वेकबुद्धि लाभलेली आहे. व त्यांनी एकमेकांशी बंधुत्याच्या भावनेने आचरण करावे.<br />
|-<br />
|| Nepali || सबै व्यक्ति हरू जन्मजात स्वतन्त्र हुन ती सबैको समान अधिकार र महत्व छ। निजहरूमा विचार शक्ति र सद्धिचार भएकोले निजहरूले आपसमा भातृत्वको भावना बाट व्यवहार गर्नु पर्छ।<br />
|-<br />
|| Punjabi, Eastern || ਸਾਰਾ ਮਨੁੱਖੀ ਪਰਿਵਾਰ ਆਪਣੀ ਮਹਿਮਾ, ਸ਼ਾਨ ਅਤੇ ਹੱਕਾਂ ਦੇ ਪੱਖੋਂ ਜਨਮ ਤੋਂ ਹੀ ਆਜ਼ਾਦ ਹੈ ਅਤੇ ਸੁਤੇ ਸਿੱਧ ਸਾਰੇ ਲੋਕ ਬਰਾਬਰ ਹਨ । ਉਨ੍ਹਾਂ ਸਭਨਾ ਨੂੰ ਤਰਕ ਅਤੇ ਜ਼ਮੀਰ ਦੀ ਸੌਗਾਤ ਮਿਲੀ ਹੋਈ ਹੈ ਅਤੇ ਉਨ੍ਹਾਂ ਨੂੰ ਭਰਾਤਰੀਭਾਵ ਦੀ ਭਾਵਨਾ ਰਖਦਿਆਂ ਆਪਸ ਵਿਚ ਵਿਚਰਣਾ ਚਾਹੀਦਾ ਹੈ ।<br />
|-<br />
|| Sanskrit || सर्वे मानवा: स्वतन्त्रा: समुत्पन्ना: वर्तन्ते अपि च, गौरवदृशा अधिकारदृशा च समाना: एव वर्तन्ते। एते सर्वे चेतना-तर्क-शक्तिभ्यां सुसम्पन्ना: सन्ति। अपि च, सर्वेऽपि बन्धुत्व-भावनया परस्परं व्यवहरन्तु।<br />
|-<br />
|| Santali (Ol Chiki) || ᱡᱷᱚᱛᱚ ᱦᱚᱲ ᱠᱚ ᱜᱚᱨᱚᱡᱽ ᱟᱨ ᱚᱫᱷᱤᱠᱟᱹᱨ ᱢᱟᱹᱢᱞᱟᱹ ᱨᱮ ᱡᱟᱱᱟᱢ ᱡᱟᱛ ᱥᱟᱢᱮᱞᱟᱭ ᱧᱟᱢ ᱠᱟᱱᱟ ᱾ ᱩᱱᱤᱠᱩ ᱠᱚ ᱛᱚᱨᱠ ᱟᱨ ᱣᱤᱣᱮᱠ ᱛᱮ ᱯᱩᱨᱟᱹᱣ ᱠᱟᱱᱟ ᱠᱚ ᱟᱨ ᱩᱱᱠᱩ ᱫᱚ ᱵᱚᱭᱦᱟ ᱦᱤᱥᱟᱹᱵ ᱛᱮ ᱱᱤᱡᱚᱨ ᱦᱤᱥᱟᱹᱵ ᱛᱮ ᱠᱟᱹᱢᱤ ᱠᱟᱛᱷᱟ ᱾<br />
|-<br />
|| Urdu || تمام انسان آزادی اور حقوق و عزت کے اعتبار سے برابر پیدا ہویٔے ہیں۔ انہیں ضمیر اور عقل و دیعت ہویٔی ہے۔ اس لیٔے انہیں ایک دوسرے کے ساتھ بھایٔی چارے کا سلوک کرنا چاہیء۔<br />
|}<br />
<br />
== Also see ==<br />
*[[Languages of India]]<br />
<br />
== Helpful Pages ==<br />
*[[List_of_symbols|The General Tagset List]]<br />
<!--<br />
==Tagset==<br />
<br />
Rough guide to tagsets in various Indic language transducers, with an eye to keeping stuff that is basically the same tagged the same (see also [[List_of_symbols|The General Tagset List]]).<br />
<br />
{|class="wikitable"<br />
! Phenomenon !! Morphology !! Description !! Tag(s) !! Language(s) !! Notes <br />
|-<br />
|colspan=6 align="center"|'''Part of speech'''<br />
|-<br />
| Noun || || || {{tag|n}} || ||<br />
|}<br />
--><br />
<br />
[[Category:Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Languages_of_India&diff=73848Languages of India2022-01-12T19:27:11Z<p>Rocky 734: Corrected spellings and added Also see</p>
<hr />
<div>The Indian languages belong to six language families: Indo-Aryan language family,Austroasiatic language family,Dravidian language family, Sino-Tibetan language family,Tai–Kadai language family and Great Andamanese languages.<br />
<br />
== Status ==<br />
{| class="wikitable"<br />
|-<br />
! Name<br />
! Language<br />
! Stems<br />
! Covergae<br />
! Loctaion<br />
|-<br />
| Apertium-hin<br />
| Hindi<br />
| 37,833<br />
| ~83.1%<br />
| [http://wiki.apertium.org/wiki/Apertium-hin Apertium-hin]<br />
|-<br />
| Apertium-urd<br />
| Urdu<br />
| 14,943<br />
| ~64.6%<br />
| [http://wiki.apertium.org/wiki/Apertium-urd Apertium-urd]<br />
|-<br />
|}<br />
<br />
<br />
== Existing language pairs ==<br />
Text in ''italics'' denotes language pairs in the incubator. Regular text denotes a developing language pair in staging, while text in '''bold''' denotes a stable well-working language pair in trunk and text in '''''bold and italics''''' denotes a pair in staging. Bidix stems as counted with [[dixcounter]] are displayed below.<br />
<br />
<br />
{| style="text-align: center;" class="wikitable dixtable"<br />
|- style="background: #ececec"<br />
! !! eng !! hin !! asm !! ben !! guj !! mal !! mar !! pan !! tel !! urd<br />
|-<br />
| '''eng''' || - || [[Apertium-eng-hin|eng-hin]]<br>{{#lst:Apertium-eng-hin/stat<br />
s|eng-hin_stems}} || ''[[Apertium-asm-eng|asm-eng]]''<br>{{#lst:Apertium-asm-eng<br />
/stats|asm-eng_stems}} || ''[[Apertium-bn-en|bn-en]]''<br>{{#lst:Apertium-bn-en/<br />
stats|bn-en_stems}} || || ''[[Apertium-mal-eng|mal-eng]]''<br>{{#lst:Apertium-m<br />
al-eng/stats|mal-eng_stems}} || ''[[Apertium-mar-eng|mar-eng]]''<br>{{#lst:Apert<br />
ium-mar-eng/stats|mar-eng_stems}} || || ''[[Apertium-eng-tel|eng-tel]]''||<br />
|-<br />
| '''hin''' || [[Apertium-eng-hin|eng-hin]]<br>{{#lst:Apertium-eng-hin/stats|eng<br />
-hin_stems}} || - || ''[[Apertium-as-hi|as-hi]]''<br>{{#lst:Apertium-as-hi/stats<br />
|as-hi_stems}} || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn<br />
-hi_stems}} || ''[[Apertium-guj-hin|guj-hin]]''<br><br />
|| || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-<br />
hin/stats|mar-hin_stems}} || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-<br />
hi/stats|pa-hi_stems}} || || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Ap<br />
ertium-urd-hin/stats|urd-hin_stems}}'''<br />
|-<br />
| '''asm''' || ''[[Apertium-asm-eng|asm-eng]]''<br>{{#lst:Apertium-asm-eng/stats<br />
|asm-eng_stems}} || ''[[Apertium-as-hi|as-hi]]''<br>{{#lst:Apertium-as-hi/stats|<br />
as-hi_stems}} || - || ''[[Apertium-asm-ben|asm-ben]]''<br>{{#lst:Apertium-asm-be<br />
n/stats|asm-ben_stems}} || || || || || ||<br />
|-<br />
| '''ben''' || ''[[Apertium-bn-en|bn-en]]''<br>{{#lst:Apertium-bn-en/stats|bn-en<br />
_stems}} || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_st<br />
ems}} || ''[[Apertium-asm-ben|asm-ben]]''<br>{{#lst:Apertium-asm-ben/stats|asm-b<br />
en_stems}} || - || || || || || ||<br />
|-<br />
| '''guj''' || || ''[[Apertium-guj-hin|guj-hin]]''<br>{{#lst:Apertium-guj-hin/s<br />
tats|guj-hin_stems}} || || || - || || || || ||<br />
|-<br />
| '''mal''' || ''[[Apertium-mal-eng|mal-eng]]''<br>{{#lst:Apertium-mal-eng/stats<br />
|mal-eng_stems}} || || || || || - || || || ||<br />
|-<br />
| '''mar''' || ''[[Apertium-mar-eng|mar-eng]]''<br>{{#lst:Apertium-mar-eng/stats<br />
|mar-eng_stems}} || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/<br />
stats|mar-hin_stems}} || || || || || - || || ''[[Apertium-tel-mar|tel-mar]]'' ||<br />
|-<br />
| '''pan''' || || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|p<br />
a-hi_stems}} || || || || || || - || || ''[[Apertium-ur-pa|ur-pa]]''<br />
|-<br />
| '''tel''' || ''[[Apertium-eng-tel|eng-tel]]''<br>{{#lst:Apertium-eng-tel/stats<br />
|eng-tel_stems}} || || || || || || || || - || <br />
|-<br />
| '''urd''' || || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-<br />
hin/stats|urd-hin_stems}}''' || || || || || || ''[[Apertium-ur-pa|ur-pa]]''<br />
|| || - <br />
|-<br />
|}<br />
<br />
== Indian languages by subgroup ==<br />
<br />
*Indo-Aryan language family: [[Hindi]], [[Bengali]], [[Marathi]], [[Urdu]], [[Gujarati]], [[Punjabi]], [[Kashmiri]], [[Rajasthani]], [[Sindhi]],[[Assamese]], [[Maithili]] and [[Odia]].<br />
*Dravidian language family: [[Telugu]], [[Tamil]], [[Kannada]] and [[Malayalam]].<br />
*Austroasiatic language family: [[Khasi]],[[Munda]] and [[Santali]].<br />
*Sino-Tibetan language family: [[Meitei]],[[Bodo]], [[Karbi]] and [[Lepcha]].<br />
*Tai-Kadai language family: [[Ahom]]<br />
*Great Andamanese language family: [[Sentinelese]],[[Önge]] and [[Jarawa]].<br />
<br />
<br />
Most widely spoken languages of India<br />
#Hindi<br />
#English<br />
#Bengali<br />
#Marathi<br />
#Telegu<br />
#Tamil<br />
#Malayalam<br />
#Kashmiri<br />
#Urdu<br />
#Sanskrit<br />
<br />
==Samples==<br />
Article 1 of the Universal Declaration of Human Rights:<br />
<br />
''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.''<br />
<br />
{|class=wikitable<br />
! Language !! Text<br />
|-<br />
|| Hindi || सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है । उन्हें बुद्धि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिए ।<br />
|-<br />
|| Marathi || सर्व मानवी व्यक्ति जन्मतःच स्वतंत्र आहेत व त्यांना समान प्रतिष्ठा व समान अधिकार आहेत. त्यांना विचारशक्ति व सदसविद्वेकबुद्धि लाभलेली आहे. व त्यांनी एकमेकांशी बंधुत्याच्या भावनेने आचरण करावे.<br />
|-<br />
|| Sanskrit || सर्वे मानवाः स्वतन्त्राः समुत्पन्नाः वर्तन्ते अपि च, गौरवदृशा अधिकारदृशा च समानाः एव वर्तन्ते। एते सर्वे चेतना-तर्क-शक्तिभ्यां सुसम्पन्नाः सन्ति। अपि च, सर्वेऽपि बन्धुत्व-भावनया परस्परं व्यवहरन्तु।<br />
|-<br />
|| Urdu || تمام انسان آزاد اور حقوق و عزت کے اعتبار سے برابر پیدا ہوئے ہیں۔ انہیں ضمیر اور عقل ودیعت ہوئی ہے۔ اس لئے انہیں ایک دوسرے کے ساتھ بھائی چارے کا سلوک کرنا چاہیئے۔<br />
|-<br />
|| Bangla || সমস্ত মানুষ স্বাধীনভাবে সমান মর্যাদা এবং অধিকার নিয়ে জন্মগ্রহণ করে। তাঁদের বিবেক এবং বুদ্ধি আছে; সুতরাং সকলেরই একে অপরের প্রতি ভ্রাতৃত্বসুলভ মনোভাব নিয়ে আচরণ করা উচিত।<br />
|-<br />
|| Gujarati || પ્રતિષ્ઠા અને અધિકારોની દૃષ્ટિએ સર્વ માનવો જન્મથી સ્વતંત્ર અને સમાન હોય છે. તેમનામાં વિચારશક્તિ અને અંતઃકરણ હોય છે અને તેમણે પરસ્પર બંધુત્વની ભાવનાથી વર્તવું જોઇએ.<br />
|-<br />
|| Punjabi || ਸਾਰਾ ਮਨੁੱਖੀ ਪਰਿਵਾਰ ਆਪਣੀ ਮਹਿਮਾ, ਸ਼ਾਨ ਅਤੇ ਹੱਕਾਂ ਦੇ ਪੱਖੋਂ ਜਨਮ ਤੋਂ ਹੀ ਆਜ਼ਾਦ ਹੈ ਅਤੇ ਸੁਤੇ ਸਿੱਧ ਸਾਰੇ ਲੋਕ ਬਰਾਬਰ ਹਨ । ਉਨ੍ਹਾਂ ਸਭਨਾ ਨੂੰ ਤਰਕ ਅਤੇ ਜ਼ਮੀਰ ਦੀ ਸੌਗਾਤ ਮਿਲੀ ਹੋਈ ਹੈ ਅਤੇ ਉਨ੍ਹਾਂ ਨੂੰ ਭਰਾਤਰੀਭਾਵ ਦੀ ਭਾਵਨਾ ਰਖਦਿਆਂ ਆਪਸ ਵਿਚ ਵਿਚਰਣਾ ਚਾਹੀਦਾ ਹੈ ।<br />
|-<br />
|| Kannada || ಎಲ್ಲಾ ಮಾನವರೂ ಸ್ವತಂತ್ರರಾಗಿಯೇ ಜನಿಸಿದ್ದಾರೆ. ಹಾಗೂ ಘನತೆ ಮತ್ತು ಹಕ್ಕುಗಳಲ್ಲಿ ಸಮಾನರಾಗಿದ್ದಾರೆ. ವಿವೇಕ ಮತ್ತು ಅಂತಃಕರಣಗಳನ್ನು ಪಡೆದವರಾದ್ದರಿಂದ ಅವರು ಪರಸ್ಪರ ಸಹೋದರ ಭಾವದಿಂದ ವರ್ತಿಸಬೇಕು.<br />
|-<br />
|| Malayalam || മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും അന്തസ്സോടും സ്വാതന്ത്ര്യത്തോടുംകൂടി ജനിച്ചിട്ടുള്ളവരാണ്. അന്യോന്യം ഭ്രാതൃഭാവത്തോടെ പെരുമാറുവാനാണ് മനുഷ്യന്നു വിവേകബുദ്ധിയും മനസ്സാക്ഷിയും സിദ്ധമായിരിക്കുന്നത്.<br />
|-<br />
|| Telegu || ప్రతిపత్తిస్వత్వముల విషయమున మానవులెల్లరును జన్మతః స్వతంత్రులును సమానులును నగుదురు. వారు వివేచన-అంతఃకరణ సంపన్నులగుటచే పరస్పరము భ్రాతృభావముతో వర్తింపవలయును.<br />
|-<br />
|| Tamil || எல்லா மனிதர்களும் ஒருவருக்கொருவர் தொடர்பில் சுதந்திரமாகவும், சமமாகவும், சமமாகவும் பிறந்தவர்கள். அவர்கள் புத்திஜீவிகள் வாரியாக இருக்க பரஸ்பரம் சாய்ந்திருக்க வேண்டும்.<br />
|-<br />
|}<br />
<br />
== Also See ==<br />
*[[Indic languages]]<br />
<br />
[[Category:Language families]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Indic_languages&diff=73847Indic languages2022-01-12T19:05:38Z<p>Rocky 734: Added 'Added see' and hided the unfinished work (instead replaced with Helpful pages)</p>
<hr />
<div>{{TOCD}}<br />
The '''Indic languages''' include [[Hindi]], [[Urdu]], [[Bengali]], [[Sanskrit]], and a number of other languages. These languages are the dominant language family of the Indian subcontinent. The number of people that speak an Indic language is upwards of 900,000,000.<br />
<br />
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.<br />
<br />
==Status==<br />
The ultimate goal is to have multi-purposable transducers for a variety of Indic languages. These can then be paired for X→Y translation with the addition of a [[Constraint Grammar|CG]] for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.<br />
<br />
=== Transducers ===<br />
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!rowspan=2| name<br />
!rowspan=2| Language<br />
!colspan=2 class="unsortable"| ISO 639<br />
!rowspan=2| formalism<br />
!rowspan=2| state<br />
!rowspan=2| stems<br />
!rowspan=2| paradigms<br />
!rowspan=2| coverage<br />
!rowspan=2| location<br />
!rowspan=2 class="unsortable"| primary authors<br />
|-class="sortbottom"<br />
! -2<br />
! -3<br />
|-<br />
|| <code>[[apertium-san]]</code><br />
|| [[Sanskrit]]<br />
|| <code>sa</code> <br />
|| <code>san</code> <br />
|| [[lttoolbox]] <br />
|| working <br />
|align="right"| {{#lst:Apertium-san/stats|stems}}<br />
|align="right"| {{#lst:Apertium-san/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-san]] ([[languages]])<br />
|| Amba Kulkarni<br />
|-<br />
<br />
|| <code>[[apertium-hin]]</code><br />
|| [[Hindi]] <br />
|| <code>hi</code> <br />
|| <code>hin</code> <br />
|| [[lttoolbox]] <br />
|| working <br />
|align="right"| {{#lst:apertium-hin/stats|stems}}<br />
|align="right"| {{#lst:apertium-hin/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-hin/stats/average}}%<br />
|| [[apertium-hin]]&nbsp;([[languages]])<br />
|| [[User:Nikant|Nikant]], [[User:darthxaher|Abu Zaher Md. Faridee]], [[User:Francis Tyers|Fran]]<br />
|-<br />
|| <code>[[apertium-ben]]</code><br />
|| [[Bengali]] <br />
|| <code>bn</code> <br />
|| <code>ben</code> <br />
|| [[lttoolbox]] <br />
|| development <br />
|align="right"| {{#lst:apertium-ben/stats|stems}}<br />
|align="right"| {{#lst:apertium-ben/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-ben/stats/average}}%<br />
|| [[apertium-ben]]&nbsp;([[languages]])<br />
|| [[User:darthxaher|Abu Zaher Md. Faridee]]<br />
|-<br />
|| <code>[[apertium-urd]]</code><br />
|| [[Urdu]] <br />
|| <code>ur</code> <br />
|| <code>urd</code> <br />
|| [[lttoolbox]]<br />
|| development <br />
|align="right"| {{#lst:apertium-urd/stats|stems}}<br />
|align="right"| {{#lst:apertium-urd/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-urd/stats/average}}%<br />
|| [[apertium-urd]]&nbsp;([[languages]])<br />
|| Muhammad Humayoun<br />
|-<br />
|| <code>[[apertium-nep]]</code><br />
|| [[Nepali]]<br />
|| <code>ne</code><br />
|| <code>nep</code><br />
|| [[lttoolbox]]<br />
|| prototype<br />
|align="right"| {{#lst:apertium-nep/stats|stems}}<br />
|align="right"| {{#lst:apertium-nep/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-ne-en]]&nbsp;([[incubator]])<br />[[apertium-eo-ne]]&nbsp;([[incubator]])<br />
||<br />
|-<br />
|| <code>[[apertium-mar]]</code><br />
|| [[Marathi]]<br />
|| <code>mr</code><br />
|| <code>mar</code><br />
|| [[lttoolbox]]<br />
|| prototype<br />
|align="right"| {{#lst:apertium-mar/stats|stems}}<br />
|align="right"| {{#lst:apertium-mar/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-mar-eng]]&nbsp;([[incubator]])<br />[[apertium-mr-hi]]&nbsp;([[incubator]])<br />
||<br />
|-<br />
|| <code>[[apertium-sin]]</code><br />
|| [[Sinhala]]<br />
|| <code>si</code><br />
|| <code>sin</code><br />
|| [[lttoolbox]]<br />
|| {{#lst:apertium-sin/stats|state}}<br />
|align="right"| {{#lst:apertium-sin/stats|stems}}<br />
|align="right"| {{#lst:apertium-sin/stats|paradigms}}<br />
|align="center"| -<br />
|| {{#lst:apertium-sin/stats|location}}<br />
|| {{#lst:apertium-sin/stats|authors}}<br />
|-<br />
|-<br />
|| <code>[[apertium-pan]]</code><br />
|| [[Punjabi]]<br />
|| <code>pa</code><br />
|| <code>pan</code><br />
|| —<br />
|| possibly non-existant<br />
|align="right"| -<br />
|align="right"| -<br />
|align="center"| -<br />
|| [[apertium-pa-hi]]&nbsp;([[incubator]])<br />[[apertium-ur-pa]]&nbsp;([[incubator]])<br />
|-<br />
|-<br />
|| <code>[[apertium-sat]]</code><br />
|| [[Santali]]<br />
|| <code>-</code><br />
|| <code>sat</code><br />
|| [[lttoolbox]]<br />
|| development<br />
|align="right"| -<br />
|align="right"| -<br />
|align="center"| -<br />
|| [[apertium-eng-sat]]<br />
|| [[User:Rocky 734|Prasanta Hembram]]<br />
|}<br />
<br />
=== Indic Language Classification ===<br />
* Dardic: [[Pahayi]], [[Khowar]], [[Kohistani]], [[Shina language]], [[Kashiri]]<br />
* Austroasiatic: [[Santali]]<br />
* Northern Zone: <br />
**Central Pahari: [[Garhwali]], [[Kumauni]]<br />
**Eastern Pahari: [[Nepali]]<br />
* North-Western Zone: [[Panjabi]], [[Lahnda]], [[Sindhi]] <br />
**Dogri-Kangri: [[Dogri]], [[Kangri]], [[Mandeali]], etc.<br />
* Western Zone: [[Gujarati]], [[Bhil]], [[Khandeshi]], [[Domari-Romani]]<br />
** Rajasthani: [[Marwari]], [[Rajasthani]]<br />
* [[Hindi]]<br />
* [[Sanskrit]]<br />
* Southern Zone: [[Marathi]], [[Konkani]], [[Urdu]]<br />
** Insular Indic: [[Sinhala]], [[Maldivian]]<br />
* Eastern Zone: [[Bengali]], [[Oriya]], [[Tharu]], [[Santali]]<br />
** Bihari: [[Bhojpuri]], [[Maithili]], etc.<br />
<br />
=== Existing language pairs ===<br />
Text in ''italics'' denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in '''bold''' denotes a stable well-working language pair in trunk and text in '''''bold and italics''''' denotes a pair in staging. Bidix stems as counted with [[dixcounter]] are displayed below.<br />
<br />
{| style="text-align: center;" class="wikitable dixtable"<br />
|- style="background: #ececec"<br />
! !! hin !! ben !! urd !! san !! nep !! mar !! pan !! sin !! asm !! eng !! epo !! fas !! sat<br />
|-<br />
| '''hin''' || - || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_stems}} || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-hin/stats|urd-hin_stems}}''' || || || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/stats|mar-hin_stems}} || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|pa-hi_stems}} || || || || || ||<br />
|-<br />
| '''ben''' || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_stems}} || - || || || || || || || || || || ||<br />
|-<br />
| '''urd''' || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-hin/stats|urd-hin_stems}}''' || || - || || || || ''[[Apertium-ur-pa|ur-pa]]''<br>{{#lst:Apertium-ur-pa/stats|ur-pa_stems}} || || || || || ||<br />
|-<br />
| '''san''' || || || || - || || || || || || || || ||<br />
|-<br />
| '''nep''' || || || || || - || || || || || || || ||<br />
|-<br />
| '''mar''' || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/stats|mar-hin_stems}} || || || || || - || || || || || || ||<br />
|-<br />
| '''pan''' || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|pa-hi_stems}} || || ''[[Apertium-ur-pa|ur-pa]]''<br>{{#lst:Apertium-ur-pa/stats|ur-pa_stems}} || || || || - || || || || || ||<br />
|-<br />
| '''sin''' || || || || || || || || - || || || || ||<br />
|-<br />
| '''asm''' || ''[[Apertium-as-hi|as-hi]]''<br>{{#lst:Apertium-as-hi/stats|as-hi_stems}} || ''[[Apertium-asm-ben|asm-ben]]''<br>{{#lst:Apertium-asm-ben/stats|asm-ben_stems}} || || || || || || || || || || ||<br />
|-<br />
| '''eng''' || [[Apertium-eng-hin|eng-hin]]<br>{{#lst:Apertium-eng-hin/stats|eng-hin_stems}} || ''[[Apertium-bn-en|bn-en]]''<br>{{#lst:Apertium-bn-en/stats|bn-en_stems}} || || || ''[[Apertium-ne-en|ne-en]]''<br>{{#lst:Apertium-ne-en/stats|ne-en_stems}} || ''[[Apertium-mar-eng|mar-eng]]''<br>{{#lst:Apertium-mar-eng/stats|mar-eng_stems}} || || ''[[Apertium-si-en|si-en]]''<br>{{#lst:Apertium-si-en/stats|si-en_stems}} || || || || || ''[[Apertium-eng-sat|eng-sat]]''<br />
|-<br />
| '''epo''' || || || || || ''[[Apertium-eo-ne|eo-ne]]''<br>{{#lst:Apertium-eo-ne/stats|eo-ne_stems}} || || || || || || || ||<br />
|-<br />
| '''fas''' || || || ''[[Apertium-ur-fa|ur-fa]]''<br>{{#lst:Apertium-ur-fa/stats|ur-fa_stems}} || || || || || || || || || || <br />
|-<br />
| '''sat''' || || || || || || || || || || ''[[Apertium-eng-sat|sat-eng]]'' || || ||<br />
|}<br />
<br />
==Samples==<br />
Article 1 of the Universal Declaration of Human Rights:<br />
<br />
''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.''<br />
<br />
{|class=wikitable<br />
! Language !! Text<br />
|-<br />
|| Sanskrit || सर्वे मानवा: स्वतन्त्रा: समुत्पन्ना: वर्तन्ते अपि च, गौरवदृशा अधिकारदृशा च समाना: एव वर्तन्ते। एते सर्वे चेतना-तर्क-शक्तिभ्यां सुसम्पन्ना: सन्ति। अपि च, सर्वेऽपि बन्धुत्व-भावनया परस्परं व्यवहरन्तु।<br />
|-<br />
|| Hindi || सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है । उन्हें बुद्घि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिए ।<br />
|-<br />
|| Bengali || সমস্ত মানুষ স্বাধীনভাবে সমান মর্যাদা এবং অধিকার নিয়ে জন্মগ্রহণ করে। তাঁদের বিবেক এবং বুদ্ধি আছে ; সুতরাং সকলেরই একে অপরের প্রতি ভ্রাতৃত্বসুলভ মনোভাব নিয়ে আচরণ করা উচিত্।<br />
|-<br />
|| Urdu || تمام انسان آزادی اور حقوق و عزت کے اعتبار سے برابر پیدا ہویٔے ہیں۔ انہیں ضمیر اور عقل و دیعت ہویٔی ہے۔ اس لیٔے انہیں ایک دوسرے کے ساتھ بھایٔی چارے کا سلوک کرنا چاہیء۔<br />
|-<br />
|| Nepali || सबै व्यक्ति हरू जन्मजात स्वतन्त्र हुन ती सबैको समान अधिकार र महत्व छ। निजहरूमा विचार शक्ति र सद्धिचार भएकोले निजहरूले आपसमा भातृत्वको भावना बाट व्यवहार गर्नु पर्छ।<br />
|-<br />
|| Marathi || सर्व मानवी व्यक्ति जन्मतःच स्वतंत्र आहेत व त्यांना समान प्रतिष्ठा व समान अधिकार आहेत. त्यांना विचारशक्ति व सदसविद्वेकबुद्धि लाभलेली आहे. व त्यांनी एकमेकांशी बंधुत्याच्या भावनेने आचरण करावे.<br />
|-<br />
|| Panjabi, Eastern || ਸਾਰਾ ਮਨੁੱਖੀ ਪਰਿਵਾਰ ਆਪਣੀ ਮਹਿਮਾ, ਸ਼ਾਨ ਅਤੇ ਹੱਕਾਂ ਦੇ ਪੱਖੋਂ ਜਨਮ ਤੋਂ ਹੀ ਆਜ਼ਾਦ ਹੈ ਅਤੇ ਸੁਤੇ ਸਿੱਧ ਸਾਰੇ ਲੋਕ ਬਰਾਬਰ ਹਨ । ਉਨ੍ਹਾਂ ਸਭਨਾ ਨੂੰ ਤਰਕ ਅਤੇ ਜ਼ਮੀਰ ਦੀ ਸੌਗਾਤ ਮਿਲੀ ਹੋਈ ਹੈ ਅਤੇ ਉਨ੍ਹਾਂ ਨੂੰ ਭਰਾਤਰੀਭਾਵ ਦੀ ਭਾਵਨਾ ਰਖਦਿਆਂ ਆਪਸ ਵਿਚ ਵਿਚਰਣਾ ਚਾਹੀਦਾ ਹੈ ।<br />
|}<br />
<br />
== Also see ==<br />
*[[Languages of India]]<br />
<br />
== Helpful Pages ==<br />
*[[List_of_symbols|The General Tagset List]]<br />
<!--<br />
==Tagset==<br />
<br />
Rough guide to tagsets in various Indic language transducers, with an eye to keeping stuff that is basically the same tagged the same (see also [[List_of_symbols|The General Tagset List]]).<br />
<br />
{|class="wikitable"<br />
! Phenomenon !! Morphology !! Description !! Tag(s) !! Language(s) !! Notes <br />
|-<br />
|colspan=6 align="center"|'''Part of speech'''<br />
|-<br />
| Noun || || || {{tag|n}} || ||<br />
|}<br />
--><br />
<br />
[[Category:Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Indic_languages&diff=73844Indic languages2022-01-10T18:14:01Z<p>Rocky 734: /* Existing language pairs */ added data</p>
<hr />
<div>{{TOCD}}<br />
The '''Indic languages''' include [[Hindi]], [[Urdu]], [[Bengali]], [[Sanskrit]], and a number of other languages. These languages are the dominant language family of the Indian subcontinent. The number of people that speak an Indic language is upwards of 900,000,000.<br />
<br />
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.<br />
<br />
==Status==<br />
The ultimate goal is to have multi-purposable transducers for a variety of Indic languages. These can then be paired for X→Y translation with the addition of a [[Constraint Grammar|CG]] for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.<br />
<br />
=== Transducers ===<br />
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!rowspan=2| name<br />
!rowspan=2| Language<br />
!colspan=2 class="unsortable"| ISO 639<br />
!rowspan=2| formalism<br />
!rowspan=2| state<br />
!rowspan=2| stems<br />
!rowspan=2| paradigms<br />
!rowspan=2| coverage<br />
!rowspan=2| location<br />
!rowspan=2 class="unsortable"| primary authors<br />
|-class="sortbottom"<br />
! -2<br />
! -3<br />
|-<br />
|| <code>[[apertium-san]]</code><br />
|| [[Sanskrit]]<br />
|| <code>sa</code> <br />
|| <code>san</code> <br />
|| [[lttoolbox]] <br />
|| working <br />
|align="right"| {{#lst:Apertium-san/stats|stems}}<br />
|align="right"| {{#lst:Apertium-san/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-san]] ([[languages]])<br />
|| Amba Kulkarni<br />
|-<br />
<br />
|| <code>[[apertium-hin]]</code><br />
|| [[Hindi]] <br />
|| <code>hi</code> <br />
|| <code>hin</code> <br />
|| [[lttoolbox]] <br />
|| working <br />
|align="right"| {{#lst:apertium-hin/stats|stems}}<br />
|align="right"| {{#lst:apertium-hin/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-hin/stats/average}}%<br />
|| [[apertium-hin]]&nbsp;([[languages]])<br />
|| [[User:Nikant|Nikant]], [[User:darthxaher|Abu Zaher Md. Faridee]], [[User:Francis Tyers|Fran]]<br />
|-<br />
|| <code>[[apertium-ben]]</code><br />
|| [[Bengali]] <br />
|| <code>bn</code> <br />
|| <code>ben</code> <br />
|| [[lttoolbox]] <br />
|| development <br />
|align="right"| {{#lst:apertium-ben/stats|stems}}<br />
|align="right"| {{#lst:apertium-ben/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-ben/stats/average}}%<br />
|| [[apertium-ben]]&nbsp;([[languages]])<br />
|| [[User:darthxaher|Abu Zaher Md. Faridee]]<br />
|-<br />
|| <code>[[apertium-urd]]</code><br />
|| [[Urdu]] <br />
|| <code>ur</code> <br />
|| <code>urd</code> <br />
|| [[lttoolbox]]<br />
|| development <br />
|align="right"| {{#lst:apertium-urd/stats|stems}}<br />
|align="right"| {{#lst:apertium-urd/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-urd/stats/average}}%<br />
|| [[apertium-urd]]&nbsp;([[languages]])<br />
|| Muhammad Humayoun<br />
|-<br />
|| <code>[[apertium-nep]]</code><br />
|| [[Nepali]]<br />
|| <code>ne</code><br />
|| <code>nep</code><br />
|| [[lttoolbox]]<br />
|| prototype<br />
|align="right"| {{#lst:apertium-nep/stats|stems}}<br />
|align="right"| {{#lst:apertium-nep/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-ne-en]]&nbsp;([[incubator]])<br />[[apertium-eo-ne]]&nbsp;([[incubator]])<br />
||<br />
|-<br />
|| <code>[[apertium-mar]]</code><br />
|| [[Marathi]]<br />
|| <code>mr</code><br />
|| <code>mar</code><br />
|| [[lttoolbox]]<br />
|| prototype<br />
|align="right"| {{#lst:apertium-mar/stats|stems}}<br />
|align="right"| {{#lst:apertium-mar/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-mar-eng]]&nbsp;([[incubator]])<br />[[apertium-mr-hi]]&nbsp;([[incubator]])<br />
||<br />
|-<br />
|| <code>[[apertium-sin]]</code><br />
|| [[Sinhala]]<br />
|| <code>si</code><br />
|| <code>sin</code><br />
|| [[lttoolbox]]<br />
|| {{#lst:apertium-sin/stats|state}}<br />
|align="right"| {{#lst:apertium-sin/stats|stems}}<br />
|align="right"| {{#lst:apertium-sin/stats|paradigms}}<br />
|align="center"| -<br />
|| {{#lst:apertium-sin/stats|location}}<br />
|| {{#lst:apertium-sin/stats|authors}}<br />
|-<br />
|-<br />
|| <code>[[apertium-pan]]</code><br />
|| [[Punjabi]]<br />
|| <code>pa</code><br />
|| <code>pan</code><br />
|| —<br />
|| possibly non-existant<br />
|align="right"| -<br />
|align="right"| -<br />
|align="center"| -<br />
|| [[apertium-pa-hi]]&nbsp;([[incubator]])<br />[[apertium-ur-pa]]&nbsp;([[incubator]])<br />
|-<br />
|-<br />
|| <code>[[apertium-sat]]</code><br />
|| [[Santali]]<br />
|| <code>-</code><br />
|| <code>sat</code><br />
|| [[lttoolbox]]<br />
|| development<br />
|align="right"| -<br />
|align="right"| -<br />
|align="center"| -<br />
|| [[apertium-eng-sat]]<br />
|| [[User:Rocky 734|Prasanta Hembram]]<br />
|}<br />
<br />
=== Indic Language Classification ===<br />
* Dardic: [[Pahayi]], [[Khowar]], [[Kohistani]], [[Shina language]], [[Kashiri]]<br />
* Austroasiatic: [[Santali]]<br />
* Northern Zone: <br />
**Central Pahari: [[Garhwali]], [[Kumauni]]<br />
**Eastern Pahari: [[Nepali]]<br />
* North-Western Zone: [[Panjabi]], [[Lahnda]], [[Sindhi]] <br />
**Dogri-Kangri: [[Dogri]], [[Kangri]], [[Mandeali]], etc.<br />
* Western Zone: [[Gujarati]], [[Bhil]], [[Khandeshi]], [[Domari-Romani]]<br />
** Rajasthani: [[Marwari]], [[Rajasthani]]<br />
* [[Hindi]]<br />
* [[Sanskrit]]<br />
* Southern Zone: [[Marathi]], [[Konkani]], [[Urdu]]<br />
** Insular Indic: [[Sinhala]], [[Maldivian]]<br />
* Eastern Zone: [[Bengali]], [[Oriya]], [[Tharu]], [[Santali]]<br />
** Bihari: [[Bhojpuri]], [[Maithili]], etc.<br />
<br />
=== Existing language pairs ===<br />
Text in ''italics'' denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in '''bold''' denotes a stable well-working language pair in trunk and text in '''''bold and italics''''' denotes a pair in staging. Bidix stems as counted with [[dixcounter]] are displayed below.<br />
<br />
{| style="text-align: center;" class="wikitable dixtable"<br />
|- style="background: #ececec"<br />
! !! hin !! ben !! urd !! san !! nep !! mar !! pan !! sin !! asm !! eng !! epo !! fas !! sat<br />
|-<br />
| '''hin''' || - || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_stems}} || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-hin/stats|urd-hin_stems}}''' || || || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/stats|mar-hin_stems}} || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|pa-hi_stems}} || || || || || ||<br />
|-<br />
| '''ben''' || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_stems}} || - || || || || || || || || || || ||<br />
|-<br />
| '''urd''' || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-hin/stats|urd-hin_stems}}''' || || - || || || || ''[[Apertium-ur-pa|ur-pa]]''<br>{{#lst:Apertium-ur-pa/stats|ur-pa_stems}} || || || || || ||<br />
|-<br />
| '''san''' || || || || - || || || || || || || || ||<br />
|-<br />
| '''nep''' || || || || || - || || || || || || || ||<br />
|-<br />
| '''mar''' || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/stats|mar-hin_stems}} || || || || || - || || || || || || ||<br />
|-<br />
| '''pan''' || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|pa-hi_stems}} || || ''[[Apertium-ur-pa|ur-pa]]''<br>{{#lst:Apertium-ur-pa/stats|ur-pa_stems}} || || || || - || || || || || ||<br />
|-<br />
| '''sin''' || || || || || || || || - || || || || ||<br />
|-<br />
| '''asm''' || ''[[Apertium-as-hi|as-hi]]''<br>{{#lst:Apertium-as-hi/stats|as-hi_stems}} || ''[[Apertium-asm-ben|asm-ben]]''<br>{{#lst:Apertium-asm-ben/stats|asm-ben_stems}} || || || || || || || || || || ||<br />
|-<br />
| '''eng''' || [[Apertium-eng-hin|eng-hin]]<br>{{#lst:Apertium-eng-hin/stats|eng-hin_stems}} || ''[[Apertium-bn-en|bn-en]]''<br>{{#lst:Apertium-bn-en/stats|bn-en_stems}} || || || ''[[Apertium-ne-en|ne-en]]''<br>{{#lst:Apertium-ne-en/stats|ne-en_stems}} || ''[[Apertium-mar-eng|mar-eng]]''<br>{{#lst:Apertium-mar-eng/stats|mar-eng_stems}} || || ''[[Apertium-si-en|si-en]]''<br>{{#lst:Apertium-si-en/stats|si-en_stems}} || || || || || ''[[Apertium-eng-sat|eng-sat]]''<br />
|-<br />
| '''epo''' || || || || || ''[[Apertium-eo-ne|eo-ne]]''<br>{{#lst:Apertium-eo-ne/stats|eo-ne_stems}} || || || || || || || ||<br />
|-<br />
| '''fas''' || || || ''[[Apertium-ur-fa|ur-fa]]''<br>{{#lst:Apertium-ur-fa/stats|ur-fa_stems}} || || || || || || || || || || <br />
|-<br />
| '''sat''' || || || || || || || || || || ''[[Apertium-eng-sat|sat-eng]]'' || || ||<br />
|}<br />
<br />
==Samples==<br />
Article 1 of the Universal Declaration of Human Rights:<br />
<br />
''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.''<br />
<br />
{|class=wikitable<br />
! Language !! Text<br />
|-<br />
|| Sanskrit || सर्वे मानवा: स्वतन्त्रा: समुत्पन्ना: वर्तन्ते अपि च, गौरवदृशा अधिकारदृशा च समाना: एव वर्तन्ते। एते सर्वे चेतना-तर्क-शक्तिभ्यां सुसम्पन्ना: सन्ति। अपि च, सर्वेऽपि बन्धुत्व-भावनया परस्परं व्यवहरन्तु।<br />
|-<br />
|| Hindi || सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है । उन्हें बुद्घि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिए ।<br />
|-<br />
|| Bengali || সমস্ত মানুষ স্বাধীনভাবে সমান মর্যাদা এবং অধিকার নিয়ে জন্মগ্রহণ করে। তাঁদের বিবেক এবং বুদ্ধি আছে ; সুতরাং সকলেরই একে অপরের প্রতি ভ্রাতৃত্বসুলভ মনোভাব নিয়ে আচরণ করা উচিত্।<br />
|-<br />
|| Urdu || تمام انسان آزادی اور حقوق و عزت کے اعتبار سے برابر پیدا ہویٔے ہیں۔ انہیں ضمیر اور عقل و دیعت ہویٔی ہے۔ اس لیٔے انہیں ایک دوسرے کے ساتھ بھایٔی چارے کا سلوک کرنا چاہیء۔<br />
|-<br />
|| Nepali || सबै व्यक्ति हरू जन्मजात स्वतन्त्र हुन ती सबैको समान अधिकार र महत्व छ। निजहरूमा विचार शक्ति र सद्धिचार भएकोले निजहरूले आपसमा भातृत्वको भावना बाट व्यवहार गर्नु पर्छ।<br />
|-<br />
|| Marathi || सर्व मानवी व्यक्ति जन्मतःच स्वतंत्र आहेत व त्यांना समान प्रतिष्ठा व समान अधिकार आहेत. त्यांना विचारशक्ति व सदसविद्वेकबुद्धि लाभलेली आहे. व त्यांनी एकमेकांशी बंधुत्याच्या भावनेने आचरण करावे.<br />
|-<br />
|| Panjabi, Eastern || ਸਾਰਾ ਮਨੁੱਖੀ ਪਰਿਵਾਰ ਆਪਣੀ ਮਹਿਮਾ, ਸ਼ਾਨ ਅਤੇ ਹੱਕਾਂ ਦੇ ਪੱਖੋਂ ਜਨਮ ਤੋਂ ਹੀ ਆਜ਼ਾਦ ਹੈ ਅਤੇ ਸੁਤੇ ਸਿੱਧ ਸਾਰੇ ਲੋਕ ਬਰਾਬਰ ਹਨ । ਉਨ੍ਹਾਂ ਸਭਨਾ ਨੂੰ ਤਰਕ ਅਤੇ ਜ਼ਮੀਰ ਦੀ ਸੌਗਾਤ ਮਿਲੀ ਹੋਈ ਹੈ ਅਤੇ ਉਨ੍ਹਾਂ ਨੂੰ ਭਰਾਤਰੀਭਾਵ ਦੀ ਭਾਵਨਾ ਰਖਦਿਆਂ ਆਪਸ ਵਿਚ ਵਿਚਰਣਾ ਚਾਹੀਦਾ ਹੈ ।<br />
|}<br />
<br />
==Tagset==<br />
<br />
Rough guide to tagsets in various Indic language transducers, with an eye to keeping stuff that is basically the same tagged the same (see also [[List_of_symbols|the general tagset list]]).<br />
<br />
{|class="wikitable"<br />
! Phenomenon !! Morphology !! Description !! Tag(s) !! Language(s) !! Notes <br />
|-<br />
|colspan=6 align="center"|'''Part of speech'''<br />
|-<br />
| Noun || || || {{tag|n}} || ||<br />
|}<br />
<br />
[[Category:Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Indic_languages&diff=73843Indic languages2022-01-10T16:09:21Z<p>Rocky 734: /* Existing language pairs */ added santali in table</p>
<hr />
<div>{{TOCD}}<br />
The '''Indic languages''' include [[Hindi]], [[Urdu]], [[Bengali]], [[Sanskrit]], and a number of other languages. These languages are the dominant language family of the Indian subcontinent. The number of people that speak an Indic language is upwards of 900,000,000.<br />
<br />
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.<br />
<br />
==Status==<br />
The ultimate goal is to have multi-purposable transducers for a variety of Indic languages. These can then be paired for X→Y translation with the addition of a [[Constraint Grammar|CG]] for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.<br />
<br />
=== Transducers ===<br />
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!rowspan=2| name<br />
!rowspan=2| Language<br />
!colspan=2 class="unsortable"| ISO 639<br />
!rowspan=2| formalism<br />
!rowspan=2| state<br />
!rowspan=2| stems<br />
!rowspan=2| paradigms<br />
!rowspan=2| coverage<br />
!rowspan=2| location<br />
!rowspan=2 class="unsortable"| primary authors<br />
|-class="sortbottom"<br />
! -2<br />
! -3<br />
|-<br />
|| <code>[[apertium-san]]</code><br />
|| [[Sanskrit]]<br />
|| <code>sa</code> <br />
|| <code>san</code> <br />
|| [[lttoolbox]] <br />
|| working <br />
|align="right"| {{#lst:Apertium-san/stats|stems}}<br />
|align="right"| {{#lst:Apertium-san/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-san]] ([[languages]])<br />
|| Amba Kulkarni<br />
|-<br />
<br />
|| <code>[[apertium-hin]]</code><br />
|| [[Hindi]] <br />
|| <code>hi</code> <br />
|| <code>hin</code> <br />
|| [[lttoolbox]] <br />
|| working <br />
|align="right"| {{#lst:apertium-hin/stats|stems}}<br />
|align="right"| {{#lst:apertium-hin/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-hin/stats/average}}%<br />
|| [[apertium-hin]]&nbsp;([[languages]])<br />
|| [[User:Nikant|Nikant]], [[User:darthxaher|Abu Zaher Md. Faridee]], [[User:Francis Tyers|Fran]]<br />
|-<br />
|| <code>[[apertium-ben]]</code><br />
|| [[Bengali]] <br />
|| <code>bn</code> <br />
|| <code>ben</code> <br />
|| [[lttoolbox]] <br />
|| development <br />
|align="right"| {{#lst:apertium-ben/stats|stems}}<br />
|align="right"| {{#lst:apertium-ben/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-ben/stats/average}}%<br />
|| [[apertium-ben]]&nbsp;([[languages]])<br />
|| [[User:darthxaher|Abu Zaher Md. Faridee]]<br />
|-<br />
|| <code>[[apertium-urd]]</code><br />
|| [[Urdu]] <br />
|| <code>ur</code> <br />
|| <code>urd</code> <br />
|| [[lttoolbox]]<br />
|| development <br />
|align="right"| {{#lst:apertium-urd/stats|stems}}<br />
|align="right"| {{#lst:apertium-urd/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-urd/stats/average}}%<br />
|| [[apertium-urd]]&nbsp;([[languages]])<br />
|| Muhammad Humayoun<br />
|-<br />
|| <code>[[apertium-nep]]</code><br />
|| [[Nepali]]<br />
|| <code>ne</code><br />
|| <code>nep</code><br />
|| [[lttoolbox]]<br />
|| prototype<br />
|align="right"| {{#lst:apertium-nep/stats|stems}}<br />
|align="right"| {{#lst:apertium-nep/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-ne-en]]&nbsp;([[incubator]])<br />[[apertium-eo-ne]]&nbsp;([[incubator]])<br />
||<br />
|-<br />
|| <code>[[apertium-mar]]</code><br />
|| [[Marathi]]<br />
|| <code>mr</code><br />
|| <code>mar</code><br />
|| [[lttoolbox]]<br />
|| prototype<br />
|align="right"| {{#lst:apertium-mar/stats|stems}}<br />
|align="right"| {{#lst:apertium-mar/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-mar-eng]]&nbsp;([[incubator]])<br />[[apertium-mr-hi]]&nbsp;([[incubator]])<br />
||<br />
|-<br />
|| <code>[[apertium-sin]]</code><br />
|| [[Sinhala]]<br />
|| <code>si</code><br />
|| <code>sin</code><br />
|| [[lttoolbox]]<br />
|| {{#lst:apertium-sin/stats|state}}<br />
|align="right"| {{#lst:apertium-sin/stats|stems}}<br />
|align="right"| {{#lst:apertium-sin/stats|paradigms}}<br />
|align="center"| -<br />
|| {{#lst:apertium-sin/stats|location}}<br />
|| {{#lst:apertium-sin/stats|authors}}<br />
|-<br />
|-<br />
|| <code>[[apertium-pan]]</code><br />
|| [[Punjabi]]<br />
|| <code>pa</code><br />
|| <code>pan</code><br />
|| —<br />
|| possibly non-existant<br />
|align="right"| -<br />
|align="right"| -<br />
|align="center"| -<br />
|| [[apertium-pa-hi]]&nbsp;([[incubator]])<br />[[apertium-ur-pa]]&nbsp;([[incubator]])<br />
|-<br />
|-<br />
|| <code>[[apertium-sat]]</code><br />
|| [[Santali]]<br />
|| <code>-</code><br />
|| <code>sat</code><br />
|| [[lttoolbox]]<br />
|| development<br />
|align="right"| -<br />
|align="right"| -<br />
|align="center"| -<br />
|| [[apertium-eng-sat]]<br />
|| [[User:Rocky 734|Prasanta Hembram]]<br />
|}<br />
<br />
=== Indic Language Classification ===<br />
* Dardic: [[Pahayi]], [[Khowar]], [[Kohistani]], [[Shina language]], [[Kashiri]]<br />
* Austroasiatic: [[Santali]]<br />
* Northern Zone: <br />
**Central Pahari: [[Garhwali]], [[Kumauni]]<br />
**Eastern Pahari: [[Nepali]]<br />
* North-Western Zone: [[Panjabi]], [[Lahnda]], [[Sindhi]] <br />
**Dogri-Kangri: [[Dogri]], [[Kangri]], [[Mandeali]], etc.<br />
* Western Zone: [[Gujarati]], [[Bhil]], [[Khandeshi]], [[Domari-Romani]]<br />
** Rajasthani: [[Marwari]], [[Rajasthani]]<br />
* [[Hindi]]<br />
* [[Sanskrit]]<br />
* Southern Zone: [[Marathi]], [[Konkani]], [[Urdu]]<br />
** Insular Indic: [[Sinhala]], [[Maldivian]]<br />
* Eastern Zone: [[Bengali]], [[Oriya]], [[Tharu]], [[Santali]]<br />
** Bihari: [[Bhojpuri]], [[Maithili]], etc.<br />
<br />
=== Existing language pairs ===<br />
Text in ''italics'' denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in '''bold''' denotes a stable well-working language pair in trunk and text in '''''bold and italics''''' denotes a pair in staging. Bidix stems as counted with [[dixcounter]] are displayed below.<br />
<br />
{| style="text-align: center;" class="wikitable dixtable"<br />
|- style="background: #ececec"<br />
! !! hin !! ben !! urd !! san !! nep !! mar !! pan !! sin !! asm !! eng !! epo !! fas !! sat<br />
|-<br />
| '''hin''' || - || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_stems}} || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-hin/stats|urd-hin_stems}}''' || || || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/stats|mar-hin_stems}} || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|pa-hi_stems}} || || || || || ||<br />
|-<br />
| '''ben''' || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_stems}} || - || || || || || || || || || || ||<br />
|-<br />
| '''urd''' || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-hin/stats|urd-hin_stems}}''' || || - || || || || ''[[Apertium-ur-pa|ur-pa]]''<br>{{#lst:Apertium-ur-pa/stats|ur-pa_stems}} || || || || || ||<br />
|-<br />
| '''san''' || || || || - || || || || || || || || ||<br />
|-<br />
| '''nep''' || || || || || - || || || || || || || ||<br />
|-<br />
| '''mar''' || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/stats|mar-hin_stems}} || || || || || - || || || || || || ||<br />
|-<br />
| '''pan''' || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|pa-hi_stems}} || || ''[[Apertium-ur-pa|ur-pa]]''<br>{{#lst:Apertium-ur-pa/stats|ur-pa_stems}} || || || || - || || || || || ||<br />
|-<br />
| '''sin''' || || || || || || || || - || || || || ||<br />
|-<br />
| '''asm''' || ''[[Apertium-as-hi|as-hi]]''<br>{{#lst:Apertium-as-hi/stats|as-hi_stems}} || ''[[Apertium-asm-ben|asm-ben]]''<br>{{#lst:Apertium-asm-ben/stats|asm-ben_stems}} || || || || || || || || || || ||<br />
|-<br />
| '''eng''' || [[Apertium-eng-hin|eng-hin]]<br>{{#lst:Apertium-eng-hin/stats|eng-hin_stems}} || ''[[Apertium-bn-en|bn-en]]''<br>{{#lst:Apertium-bn-en/stats|bn-en_stems}} || || || ''[[Apertium-ne-en|ne-en]]''<br>{{#lst:Apertium-ne-en/stats|ne-en_stems}} || ''[[Apertium-mar-eng|mar-eng]]''<br>{{#lst:Apertium-mar-eng/stats|mar-eng_stems}} || || ''[[Apertium-si-en|si-en]]''<br>{{#lst:Apertium-si-en/stats|si-en_stems}} || || || || ||<br />
|-<br />
| '''epo''' || || || || || ''[[Apertium-eo-ne|eo-ne]]''<br>{{#lst:Apertium-eo-ne/stats|eo-ne_stems}} || || || || || || || ||<br />
|-<br />
| '''fas''' || || || ''[[Apertium-ur-fa|ur-fa]]''<br>{{#lst:Apertium-ur-fa/stats|ur-fa_stems}} || || || || || || || || || || <br />
|-<br />
| '''sat''' || || || || || || || || || || || || ||<br />
|}<br />
<br />
==Samples==<br />
Article 1 of the Universal Declaration of Human Rights:<br />
<br />
''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.''<br />
<br />
{|class=wikitable<br />
! Language !! Text<br />
|-<br />
|| Sanskrit || सर्वे मानवा: स्वतन्त्रा: समुत्पन्ना: वर्तन्ते अपि च, गौरवदृशा अधिकारदृशा च समाना: एव वर्तन्ते। एते सर्वे चेतना-तर्क-शक्तिभ्यां सुसम्पन्ना: सन्ति। अपि च, सर्वेऽपि बन्धुत्व-भावनया परस्परं व्यवहरन्तु।<br />
|-<br />
|| Hindi || सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है । उन्हें बुद्घि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिए ।<br />
|-<br />
|| Bengali || সমস্ত মানুষ স্বাধীনভাবে সমান মর্যাদা এবং অধিকার নিয়ে জন্মগ্রহণ করে। তাঁদের বিবেক এবং বুদ্ধি আছে ; সুতরাং সকলেরই একে অপরের প্রতি ভ্রাতৃত্বসুলভ মনোভাব নিয়ে আচরণ করা উচিত্।<br />
|-<br />
|| Urdu || تمام انسان آزادی اور حقوق و عزت کے اعتبار سے برابر پیدا ہویٔے ہیں۔ انہیں ضمیر اور عقل و دیعت ہویٔی ہے۔ اس لیٔے انہیں ایک دوسرے کے ساتھ بھایٔی چارے کا سلوک کرنا چاہیء۔<br />
|-<br />
|| Nepali || सबै व्यक्ति हरू जन्मजात स्वतन्त्र हुन ती सबैको समान अधिकार र महत्व छ। निजहरूमा विचार शक्ति र सद्धिचार भएकोले निजहरूले आपसमा भातृत्वको भावना बाट व्यवहार गर्नु पर्छ।<br />
|-<br />
|| Marathi || सर्व मानवी व्यक्ति जन्मतःच स्वतंत्र आहेत व त्यांना समान प्रतिष्ठा व समान अधिकार आहेत. त्यांना विचारशक्ति व सदसविद्वेकबुद्धि लाभलेली आहे. व त्यांनी एकमेकांशी बंधुत्याच्या भावनेने आचरण करावे.<br />
|-<br />
|| Panjabi, Eastern || ਸਾਰਾ ਮਨੁੱਖੀ ਪਰਿਵਾਰ ਆਪਣੀ ਮਹਿਮਾ, ਸ਼ਾਨ ਅਤੇ ਹੱਕਾਂ ਦੇ ਪੱਖੋਂ ਜਨਮ ਤੋਂ ਹੀ ਆਜ਼ਾਦ ਹੈ ਅਤੇ ਸੁਤੇ ਸਿੱਧ ਸਾਰੇ ਲੋਕ ਬਰਾਬਰ ਹਨ । ਉਨ੍ਹਾਂ ਸਭਨਾ ਨੂੰ ਤਰਕ ਅਤੇ ਜ਼ਮੀਰ ਦੀ ਸੌਗਾਤ ਮਿਲੀ ਹੋਈ ਹੈ ਅਤੇ ਉਨ੍ਹਾਂ ਨੂੰ ਭਰਾਤਰੀਭਾਵ ਦੀ ਭਾਵਨਾ ਰਖਦਿਆਂ ਆਪਸ ਵਿਚ ਵਿਚਰਣਾ ਚਾਹੀਦਾ ਹੈ ।<br />
|}<br />
<br />
==Tagset==<br />
<br />
Rough guide to tagsets in various Indic language transducers, with an eye to keeping stuff that is basically the same tagged the same (see also [[List_of_symbols|the general tagset list]]).<br />
<br />
{|class="wikitable"<br />
! Phenomenon !! Morphology !! Description !! Tag(s) !! Language(s) !! Notes <br />
|-<br />
|colspan=6 align="center"|'''Part of speech'''<br />
|-<br />
| Noun || || || {{tag|n}} || ||<br />
|}<br />
<br />
[[Category:Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Indic_languages&diff=73842Indic languages2022-01-10T15:59:43Z<p>Rocky 734: Added Santali language</p>
<hr />
<div>{{TOCD}}<br />
The '''Indic languages''' include [[Hindi]], [[Urdu]], [[Bengali]], [[Sanskrit]], and a number of other languages. These languages are the dominant language family of the Indian subcontinent. The number of people that speak an Indic language is upwards of 900,000,000.<br />
<br />
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.<br />
<br />
==Status==<br />
The ultimate goal is to have multi-purposable transducers for a variety of Indic languages. These can then be paired for X→Y translation with the addition of a [[Constraint Grammar|CG]] for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.<br />
<br />
=== Transducers ===<br />
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!rowspan=2| name<br />
!rowspan=2| Language<br />
!colspan=2 class="unsortable"| ISO 639<br />
!rowspan=2| formalism<br />
!rowspan=2| state<br />
!rowspan=2| stems<br />
!rowspan=2| paradigms<br />
!rowspan=2| coverage<br />
!rowspan=2| location<br />
!rowspan=2 class="unsortable"| primary authors<br />
|-class="sortbottom"<br />
! -2<br />
! -3<br />
|-<br />
|| <code>[[apertium-san]]</code><br />
|| [[Sanskrit]]<br />
|| <code>sa</code> <br />
|| <code>san</code> <br />
|| [[lttoolbox]] <br />
|| working <br />
|align="right"| {{#lst:Apertium-san/stats|stems}}<br />
|align="right"| {{#lst:Apertium-san/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-san]] ([[languages]])<br />
|| Amba Kulkarni<br />
|-<br />
<br />
|| <code>[[apertium-hin]]</code><br />
|| [[Hindi]] <br />
|| <code>hi</code> <br />
|| <code>hin</code> <br />
|| [[lttoolbox]] <br />
|| working <br />
|align="right"| {{#lst:apertium-hin/stats|stems}}<br />
|align="right"| {{#lst:apertium-hin/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-hin/stats/average}}%<br />
|| [[apertium-hin]]&nbsp;([[languages]])<br />
|| [[User:Nikant|Nikant]], [[User:darthxaher|Abu Zaher Md. Faridee]], [[User:Francis Tyers|Fran]]<br />
|-<br />
|| <code>[[apertium-ben]]</code><br />
|| [[Bengali]] <br />
|| <code>bn</code> <br />
|| <code>ben</code> <br />
|| [[lttoolbox]] <br />
|| development <br />
|align="right"| {{#lst:apertium-ben/stats|stems}}<br />
|align="right"| {{#lst:apertium-ben/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-ben/stats/average}}%<br />
|| [[apertium-ben]]&nbsp;([[languages]])<br />
|| [[User:darthxaher|Abu Zaher Md. Faridee]]<br />
|-<br />
|| <code>[[apertium-urd]]</code><br />
|| [[Urdu]] <br />
|| <code>ur</code> <br />
|| <code>urd</code> <br />
|| [[lttoolbox]]<br />
|| development <br />
|align="right"| {{#lst:apertium-urd/stats|stems}}<br />
|align="right"| {{#lst:apertium-urd/stats|paradigms}}<br />
|align="center"| ~{{#lst:apertium-urd/stats/average}}%<br />
|| [[apertium-urd]]&nbsp;([[languages]])<br />
|| Muhammad Humayoun<br />
|-<br />
|| <code>[[apertium-nep]]</code><br />
|| [[Nepali]]<br />
|| <code>ne</code><br />
|| <code>nep</code><br />
|| [[lttoolbox]]<br />
|| prototype<br />
|align="right"| {{#lst:apertium-nep/stats|stems}}<br />
|align="right"| {{#lst:apertium-nep/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-ne-en]]&nbsp;([[incubator]])<br />[[apertium-eo-ne]]&nbsp;([[incubator]])<br />
||<br />
|-<br />
|| <code>[[apertium-mar]]</code><br />
|| [[Marathi]]<br />
|| <code>mr</code><br />
|| <code>mar</code><br />
|| [[lttoolbox]]<br />
|| prototype<br />
|align="right"| {{#lst:apertium-mar/stats|stems}}<br />
|align="right"| {{#lst:apertium-mar/stats|paradigms}}<br />
|align="center"| -<br />
|| [[apertium-mar-eng]]&nbsp;([[incubator]])<br />[[apertium-mr-hi]]&nbsp;([[incubator]])<br />
||<br />
|-<br />
|| <code>[[apertium-sin]]</code><br />
|| [[Sinhala]]<br />
|| <code>si</code><br />
|| <code>sin</code><br />
|| [[lttoolbox]]<br />
|| {{#lst:apertium-sin/stats|state}}<br />
|align="right"| {{#lst:apertium-sin/stats|stems}}<br />
|align="right"| {{#lst:apertium-sin/stats|paradigms}}<br />
|align="center"| -<br />
|| {{#lst:apertium-sin/stats|location}}<br />
|| {{#lst:apertium-sin/stats|authors}}<br />
|-<br />
|-<br />
|| <code>[[apertium-pan]]</code><br />
|| [[Punjabi]]<br />
|| <code>pa</code><br />
|| <code>pan</code><br />
|| —<br />
|| possibly non-existant<br />
|align="right"| -<br />
|align="right"| -<br />
|align="center"| -<br />
|| [[apertium-pa-hi]]&nbsp;([[incubator]])<br />[[apertium-ur-pa]]&nbsp;([[incubator]])<br />
|-<br />
|-<br />
|| <code>[[apertium-sat]]</code><br />
|| [[Santali]]<br />
|| <code>-</code><br />
|| <code>sat</code><br />
|| [[lttoolbox]]<br />
|| development<br />
|align="right"| -<br />
|align="right"| -<br />
|align="center"| -<br />
|| [[apertium-eng-sat]]<br />
|| [[User:Rocky 734|Prasanta Hembram]]<br />
|}<br />
<br />
=== Indic Language Classification ===<br />
* Dardic: [[Pahayi]], [[Khowar]], [[Kohistani]], [[Shina language]], [[Kashiri]]<br />
* Austroasiatic: [[Santali]]<br />
* Northern Zone: <br />
**Central Pahari: [[Garhwali]], [[Kumauni]]<br />
**Eastern Pahari: [[Nepali]]<br />
* North-Western Zone: [[Panjabi]], [[Lahnda]], [[Sindhi]] <br />
**Dogri-Kangri: [[Dogri]], [[Kangri]], [[Mandeali]], etc.<br />
* Western Zone: [[Gujarati]], [[Bhil]], [[Khandeshi]], [[Domari-Romani]]<br />
** Rajasthani: [[Marwari]], [[Rajasthani]]<br />
* [[Hindi]]<br />
* [[Sanskrit]]<br />
* Southern Zone: [[Marathi]], [[Konkani]], [[Urdu]]<br />
** Insular Indic: [[Sinhala]], [[Maldivian]]<br />
* Eastern Zone: [[Bengali]], [[Oriya]], [[Tharu]], [[Santali]]<br />
** Bihari: [[Bhojpuri]], [[Maithili]], etc.<br />
<br />
=== Existing language pairs ===<br />
Text in ''italics'' denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in '''bold''' denotes a stable well-working language pair in trunk and text in '''''bold and italics''''' denotes a pair in staging. Bidix stems as counted with [[dixcounter]] are displayed below.<br />
<br />
{| style="text-align: center;" class="wikitable dixtable"<br />
|- style="background: #ececec"<br />
! !! hin !! ben !! urd !! san !! nep !! mar !! pan !! sin<br />
|-<br />
| '''hin''' || - || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_stems}} || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-hin/stats|urd-hin_stems}}''' || || || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/stats|mar-hin_stems}} || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|pa-hi_stems}} || <br />
|-<br />
| '''ben''' || ''[[Apertium-bn-hi|bn-hi]]''<br>{{#lst:Apertium-bn-hi/stats|bn-hi_stems}} || - || || || || || || <br />
|-<br />
| '''urd''' || '''[[Apertium-urd-hin|urd-hin]]'''<br>'''{{#lst:Apertium-urd-hin/stats|urd-hin_stems}}''' || || - || || || || ''[[Apertium-ur-pa|ur-pa]]''<br>{{#lst:Apertium-ur-pa/stats|ur-pa_stems}} || <br />
|-<br />
| '''san''' || || || || - || || || || <br />
|-<br />
| '''nep''' || || || || || - || || || <br />
|-<br />
| '''mar''' || ''[[Apertium-mar-hin|mar-hin]]''<br>{{#lst:Apertium-mar-hin/stats|mar-hin_stems}} || || || || || - || || <br />
|-<br />
| '''pan''' || ''[[Apertium-pa-hi|pa-hi]]''<br>{{#lst:Apertium-pa-hi/stats|pa-hi_stems}} || || ''[[Apertium-ur-pa|ur-pa]]''<br>{{#lst:Apertium-ur-pa/stats|ur-pa_stems}} || || || || - || <br />
|-<br />
| '''sin''' || || || || || || || || -<br />
|-<br />
| || || || || || || || || <br />
|-<br />
| '''asm''' || ''[[Apertium-as-hi|as-hi]]''<br>{{#lst:Apertium-as-hi/stats|as-hi_stems}} || ''[[Apertium-asm-ben|asm-ben]]''<br>{{#lst:Apertium-asm-ben/stats|asm-ben_stems}} || || || || || || <br />
|-<br />
| '''eng''' || [[Apertium-eng-hin|eng-hin]]<br>{{#lst:Apertium-eng-hin/stats|eng-hin_stems}} || ''[[Apertium-bn-en|bn-en]]''<br>{{#lst:Apertium-bn-en/stats|bn-en_stems}} || || || ''[[Apertium-ne-en|ne-en]]''<br>{{#lst:Apertium-ne-en/stats|ne-en_stems}} || ''[[Apertium-mar-eng|mar-eng]]''<br>{{#lst:Apertium-mar-eng/stats|mar-eng_stems}} || || ''[[Apertium-si-en|si-en]]''<br>{{#lst:Apertium-si-en/stats|si-en_stems}}<br />
|-<br />
| '''epo''' || || || || || ''[[Apertium-eo-ne|eo-ne]]''<br>{{#lst:Apertium-eo-ne/stats|eo-ne_stems}} || || || <br />
|-<br />
| '''fas''' || || || ''[[Apertium-ur-fa|ur-fa]]''<br>{{#lst:Apertium-ur-fa/stats|ur-fa_stems}} || || || || || <br />
|}<br />
<br />
==Samples==<br />
Article 1 of the Universal Declaration of Human Rights:<br />
<br />
''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.''<br />
<br />
{|class=wikitable<br />
! Language !! Text<br />
|-<br />
|| Sanskrit || सर्वे मानवा: स्वतन्त्रा: समुत्पन्ना: वर्तन्ते अपि च, गौरवदृशा अधिकारदृशा च समाना: एव वर्तन्ते। एते सर्वे चेतना-तर्क-शक्तिभ्यां सुसम्पन्ना: सन्ति। अपि च, सर्वेऽपि बन्धुत्व-भावनया परस्परं व्यवहरन्तु।<br />
|-<br />
|| Hindi || सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है । उन्हें बुद्घि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिए ।<br />
|-<br />
|| Bengali || সমস্ত মানুষ স্বাধীনভাবে সমান মর্যাদা এবং অধিকার নিয়ে জন্মগ্রহণ করে। তাঁদের বিবেক এবং বুদ্ধি আছে ; সুতরাং সকলেরই একে অপরের প্রতি ভ্রাতৃত্বসুলভ মনোভাব নিয়ে আচরণ করা উচিত্।<br />
|-<br />
|| Urdu || تمام انسان آزادی اور حقوق و عزت کے اعتبار سے برابر پیدا ہویٔے ہیں۔ انہیں ضمیر اور عقل و دیعت ہویٔی ہے۔ اس لیٔے انہیں ایک دوسرے کے ساتھ بھایٔی چارے کا سلوک کرنا چاہیء۔<br />
|-<br />
|| Nepali || सबै व्यक्ति हरू जन्मजात स्वतन्त्र हुन ती सबैको समान अधिकार र महत्व छ। निजहरूमा विचार शक्ति र सद्धिचार भएकोले निजहरूले आपसमा भातृत्वको भावना बाट व्यवहार गर्नु पर्छ।<br />
|-<br />
|| Marathi || सर्व मानवी व्यक्ति जन्मतःच स्वतंत्र आहेत व त्यांना समान प्रतिष्ठा व समान अधिकार आहेत. त्यांना विचारशक्ति व सदसविद्वेकबुद्धि लाभलेली आहे. व त्यांनी एकमेकांशी बंधुत्याच्या भावनेने आचरण करावे.<br />
|-<br />
|| Panjabi, Eastern || ਸਾਰਾ ਮਨੁੱਖੀ ਪਰਿਵਾਰ ਆਪਣੀ ਮਹਿਮਾ, ਸ਼ਾਨ ਅਤੇ ਹੱਕਾਂ ਦੇ ਪੱਖੋਂ ਜਨਮ ਤੋਂ ਹੀ ਆਜ਼ਾਦ ਹੈ ਅਤੇ ਸੁਤੇ ਸਿੱਧ ਸਾਰੇ ਲੋਕ ਬਰਾਬਰ ਹਨ । ਉਨ੍ਹਾਂ ਸਭਨਾ ਨੂੰ ਤਰਕ ਅਤੇ ਜ਼ਮੀਰ ਦੀ ਸੌਗਾਤ ਮਿਲੀ ਹੋਈ ਹੈ ਅਤੇ ਉਨ੍ਹਾਂ ਨੂੰ ਭਰਾਤਰੀਭਾਵ ਦੀ ਭਾਵਨਾ ਰਖਦਿਆਂ ਆਪਸ ਵਿਚ ਵਿਚਰਣਾ ਚਾਹੀਦਾ ਹੈ ।<br />
|}<br />
<br />
==Tagset==<br />
<br />
Rough guide to tagsets in various Indic language transducers, with an eye to keeping stuff that is basically the same tagged the same (see also [[List_of_symbols|the general tagset list]]).<br />
<br />
{|class="wikitable"<br />
! Phenomenon !! Morphology !! Description !! Tag(s) !! Language(s) !! Notes <br />
|-<br />
|colspan=6 align="center"|'''Part of speech'''<br />
|-<br />
| Noun || || || {{tag|n}} || ||<br />
|}<br />
<br />
[[Category:Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Apertium-eng-sat&diff=73841Apertium-eng-sat2022-01-10T15:50:39Z<p>Rocky 734: Added Category Category:Santali</p>
<hr />
<div>[[Category:Santali]]<br />
<br />
ᱱᱚᱶᱟ ᱫᱚ ᱤᱝᱨᱟᱡᱤ ᱠᱷᱚᱱ ᱥᱟᱱᱛᱟᱲᱤ ᱛᱮ ᱛᱚᱨᱡᱚᱢᱟ ᱨᱮᱭᱟᱜ ᱢᱤᱫᱴᱟᱹᱝ ᱮᱯᱟᱹᱨᱴᱤᱭᱟᱹᱢ ᱯᱟᱹᱨᱥᱤ ᱡᱚᱲ ᱠᱟᱱᱟ, ᱱᱚᱶᱟ ᱛᱮ ᱟᱢ ᱱᱚᱶᱟ ᱡᱤᱱᱤᱥ ᱠᱚᱢ ᱠᱚᱨᱟᱣ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱼ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱴᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮ ᱛᱚᱨᱡᱚᱢᱟ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱴᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱮᱱᱟᱞᱤᱥᱭᱥ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱴᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮᱭᱟᱜ ᱯᱟᱨᱴᱼᱚᱯᱷᱼᱥᱯᱤᱪᱷ ᱴᱮᱜᱚᱨ<br />
<br />
== ᱟᱨᱦᱚᱸ ᱧᱮᱞᱢᱮ ==<br />
*[[Santali]]<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Apertium-sat&diff=73840Apertium-sat2022-01-10T15:49:59Z<p>Rocky 734: Added one category Category:Santali</p>
<hr />
<div>[[Category:Santali]]<br />
<br />
ᱱᱚᱶᱟ ᱫᱚ ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮᱭᱟᱜ ᱢᱤᱫᱴᱟᱹᱝ ᱮᱯᱟᱹᱨᱴᱤᱭᱟᱹᱢ ᱢᱚᱱᱚᱞᱤᱝᱜᱩᱣᱟᱞ ᱯᱮᱠᱮᱡᱽ ᱠᱟᱱᱟ, ᱱᱚᱶᱟ ᱛᱮ ᱟᱢ ᱱᱚᱶᱟ ᱡᱤᱱᱤᱥ ᱠᱚᱢ ᱠᱚᱨᱟᱣ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱼ<br />
<br />
* ᱥᱟᱱᱴᱟᱲᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱮᱱᱟᱞᱤᱥᱭᱥ<br />
* ᱥᱟᱱᱴᱟᱲᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱡᱮᱱᱮᱨᱮᱥᱚᱱ<br />
* ᱥᱟᱱᱴᱟᱲᱤ ᱛᱮ ᱯᱟᱨᱴᱼᱚᱯᱷᱼᱥᱯᱤᱪᱷ ᱴᱮᱜᱤᱝ<br />
<br />
== ᱟᱨᱦᱚᱸ ᱧᱮᱞᱢᱮ ==<br />
*[[Santali]]<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Santali&diff=73839Santali2022-01-10T15:45:21Z<p>Rocky 734: /* Resources */ added Apertium resources</p>
<hr />
<div>[[Category:Santali]]<br />
[[Category:Languages]]<br />
<br />
'''Santali''' or '''Santhali''' is the most widely spoken language of the Munda subfamily of the Austroasiatic languages, related to Ho and Mundari, spoken mainly in the Indian states of Assam, Bihar, Jharkhand, Mizoram, Odisha, Tripura and West Bengal. It is a recognized regional language of India per the Eighth Schedule of the Indian Constitution. It is spoken by around 7.6&nbsp;million people in India, Bangladesh, Bhutan and Nepal, making it the third most-spoken Austroasiatic language after Vietnamese and Khmer. Santali was a mainly oral language until the development of Ol Chiki by '''Pandit Raghunath Murmu''' in 1925. Ol Chiki is an alphabetic script, sharing none of the syllabic properties of the other Indic scripts, and is now widely used to write Santali language in India. Before the invention of Ol Chiki script Santali language was used to be written in Roman/latin, Devanagari and Kalinga script.<br />
<br />
== Resources ==<br />
=== Apertium Resources ===<br />
* One Monolingual Dictionary - [[Apertium-sat]]<br />
* One English-Santali Bilingual Dictionay - [[Apertium-eng-sat]]<br />
=== literature ===<br />
* Neukom, L. (2000). Argument marking in Santali. MonKhmer Studies, 95-114.<br />
* Marandi, C., & Maringanti, H. B. [https://d1wqtxts1xzle7.cloudfront.net/54279769/barii_pda_university_conference_papaer-with-cover-page-v2.pdf?Expires=1641826728&Signature=DS5JzbqCFQRw9hEPjL~xTaFLfKURQiaiZzOoMI5sxfBX5lTPNKC5o88s8a4bwvcAesVu9zu1qBPcpaQ1UPdND3hOuh9g41xGu2VqnFviBY0t29CHC9ZTh05D9qmbRh7uuNArYatcI-xvG0cF3Mr2VUk4lR7DAGwAVrNnrQrjrxHnEC-0stsujQQXvDB-3rR9mYTb6fFm6OIA1T-CN4hzbTdiSY-86UuPHXyRgmipxticu0Ss2D~SO7~Fz5kXcHZXApBehMbT-nw0QScCigLtFoHb~BoaxhUiEPZ38-DNfETNjQEQ3pgoaRtXsZybnpcn1wCk72Ofpw75jwBUlf03jw__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA#page=73 Santali Morphological Analysis]. Prof.(Dr.) HIMA BINDU MARINGANTI, 52. pg. no: 74<br />
* Akhtar, M. A. K., Kumar, M., & Sahoo, G. (2017, September). [https://ieeexplore.ieee.org/abstract/document/8125962 Automata for santali language processing]. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 939-943). IEEE.<br />
* Sahoo, S. K., Mishra, B. K., Parida, S., Dash, S. R., Besra, J. N., & Tello, E. V. [https://publications.idiap.ch/attachments/papers/2021/Sahoo_OITSINTERNATIONALCONFERENCEONINFORMATIONTECHNOLOGY(OCIT)_2021.pdf Automatic Dialect Detection for Low Resource Santali Language].<br />
* Dash, S., Sunil Sahoo, Brojo Kishore Mishra, Shantipriya Parida, Jatindra Nath Besra, & Atul Kr. Ojha. (2021). Universal Dependency Treebank for Santali Language. SPAST Abstracts, 1(01). Retrieved from https://spast.org/techrep/article/view/2111<br />
* Basua, J., Hrangkhawlb, T. R., Basuc, T. K., & Majumderd, S. (2021, June). Identification of two tribal languages of India: An experimental study. In Artificial Intelligence and Speech Technology: Proceedings of the 2nd International Conference on Artificial Intelligence and Speech Technology,(AIST2020), 19-20 November, 2020, Delhi, India (p. 221). CRC Press.<br />
<br />
=== Books ===<br />
* Puxley, E. L. (1868). [https://books.google.co.in/books?hl=en&lr=&id=kKcIAAAAQAAJ&oi=fnd&pg=PA1&dq=santali+machine&ots=yrcw6-Z_nv&sig=QKA3nGTTM-8BAIjetrbO9kdw0O8&redir_esc=y#v=onepage&q&f=false A Vocabulary of the Santali Language]. WM Watts.<br />
<br />
=== Dictionary ===<br />
* Campbell, Andrew. A Santali-English Dictionary. Santal mission press, 1899.<br />
* Campbell, A., & MACPHAIL, R. M. (1933). A Santali-English and English-Santali Dictionary... Edited by RM Macphail. Santal Mission Press.<br />
* Bodding, P. O. 1932–1936. A Santali dictionary (5 volumes).<br />
* Bhaduri, Manindra Bhusan. A Mundari-English Dictionary. Asian Educational Services, 1994.<br />
* Hansdah, R. C., and N. C. Murmu. "A Concise Santali-English Dictionary." (2003).<br />
<br />
=== Other closly related Language Dicitionary === <br />
* Ho Dictionary - Deeney, J. J. (1978). Ho-English dictionary. Xavier Ho Publications.<br />
<br />
== Also See ==<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Apertium-sat&diff=73838Apertium-sat2022-01-10T15:43:50Z<p>Rocky 734: Created a small guide in native script</p>
<hr />
<div>ᱱᱚᱶᱟ ᱫᱚ ᱥᱟᱱᱛᱟᱲᱤ ᱨᱮᱭᱟᱜ ᱢᱤᱫᱴᱟᱹᱝ ᱮᱯᱟᱹᱨᱴᱤᱭᱟᱹᱢ ᱢᱚᱱᱚᱞᱤᱝᱜᱩᱣᱟᱞ ᱯᱮᱠᱮᱡᱽ ᱠᱟᱱᱟ, ᱱᱚᱶᱟ ᱛᱮ ᱟᱢ ᱱᱚᱶᱟ ᱡᱤᱱᱤᱥ ᱠᱚᱢ ᱠᱚᱨᱟᱣ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱼ<br />
<br />
* ᱥᱟᱱᱴᱟᱲᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱮᱱᱟᱞᱤᱥᱭᱥ<br />
* ᱥᱟᱱᱴᱟᱲᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱡᱮᱱᱮᱨᱮᱥᱚᱱ<br />
* ᱥᱟᱱᱴᱟᱲᱤ ᱛᱮ ᱯᱟᱨᱴᱼᱚᱯᱷᱼᱥᱯᱤᱪᱷ ᱴᱮᱜᱤᱝ<br />
<br />
== ᱟᱨᱦᱚᱸ ᱧᱮᱞᱢᱮ ==<br />
*[[Santali]]<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Apertium-eng-sat&diff=73837Apertium-eng-sat2022-01-10T15:36:49Z<p>Rocky 734: </p>
<hr />
<div>ᱱᱚᱶᱟ ᱫᱚ ᱤᱝᱨᱟᱡᱤ ᱠᱷᱚᱱ ᱥᱟᱱᱛᱟᱲᱤ ᱛᱮ ᱛᱚᱨᱡᱚᱢᱟ ᱨᱮᱭᱟᱜ ᱢᱤᱫᱴᱟᱹᱝ ᱮᱯᱟᱹᱨᱴᱤᱭᱟᱹᱢ ᱯᱟᱹᱨᱥᱤ ᱡᱚᱲ ᱠᱟᱱᱟ, ᱱᱚᱶᱟ ᱛᱮ ᱟᱢ ᱱᱚᱶᱟ ᱡᱤᱱᱤᱥ ᱠᱚᱢ ᱠᱚᱨᱟᱣ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱼ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱴᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮ ᱛᱚᱨᱡᱚᱢᱟ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱴᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱮᱱᱟᱞᱤᱥᱭᱥ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱴᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮᱭᱟᱜ ᱯᱟᱨᱴᱼᱚᱯᱷᱼᱥᱯᱤᱪᱷ ᱴᱮᱜᱚᱨ<br />
<br />
== ᱟᱨᱦᱚᱸ ᱧᱮᱞᱢᱮ ==<br />
*[[Santali]]<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Apertium-eng-sat&diff=73836Apertium-eng-sat2022-01-10T15:33:36Z<p>Rocky 734: Created a New page for guidelines in native language</p>
<hr />
<div>ᱱᱚᱶᱟ ᱫᱚ ᱤᱝᱨᱟᱡᱤ ᱠᱷᱚᱱ ᱥᱟᱱᱛᱟᱲᱤ ᱛᱮ ᱛᱚᱨᱡᱚᱢᱟ ᱨᱮᱭᱟᱜ ᱢᱤᱫᱴᱟᱹᱝ ᱮᱯᱟᱹᱨᱴᱤᱭᱟᱹᱢ ᱯᱟᱹᱨᱥᱤ ᱡᱚᱲ ᱠᱟᱱᱟ, ᱱᱚᱶᱟ ᱛᱮ ᱟᱢ ᱱᱚᱶᱟ ᱡᱤᱱᱤᱥ ᱠᱚᱢ ᱠᱚᱨᱟᱣ ᱫᱟᱲᱮᱭᱟᱜᱼᱟ ᱼ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱴᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮ ᱛᱚᱨᱡᱚᱢᱟ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱴᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮ ᱢᱚᱨᱯᱷᱚᱞᱚᱡᱤᱠᱟᱞ ᱮᱱᱟᱞᱤᱥᱭᱥ<br />
* ᱤᱝᱨᱟᱡᱤ ᱟᱨ ᱥᱟᱱᱴᱟᱲᱤ ᱵᱷᱤᱛᱨᱤ ᱨᱮᱭᱟᱜ ᱯᱟᱨᱴᱼᱚᱯᱷᱼᱥᱯᱤᱪᱷ ᱴᱮᱜᱚᱨ</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Santali&diff=73835Santali2022-01-10T15:27:55Z<p>Rocky 734: Added one page link</p>
<hr />
<div>[[Category:Santali]]<br />
[[Category:Languages]]<br />
<br />
'''Santali''' or '''Santhali''' is the most widely spoken language of the Munda subfamily of the Austroasiatic languages, related to Ho and Mundari, spoken mainly in the Indian states of Assam, Bihar, Jharkhand, Mizoram, Odisha, Tripura and West Bengal. It is a recognized regional language of India per the Eighth Schedule of the Indian Constitution. It is spoken by around 7.6&nbsp;million people in India, Bangladesh, Bhutan and Nepal, making it the third most-spoken Austroasiatic language after Vietnamese and Khmer. Santali was a mainly oral language until the development of Ol Chiki by '''Pandit Raghunath Murmu''' in 1925. Ol Chiki is an alphabetic script, sharing none of the syllabic properties of the other Indic scripts, and is now widely used to write Santali language in India. Before the invention of Ol Chiki script Santali language was used to be written in Roman/latin, Devanagari and Kalinga script.<br />
<br />
== Resources ==<br />
=== literature ===<br />
* Neukom, L. (2000). Argument marking in Santali. MonKhmer Studies, 95-114.<br />
* Marandi, C., & Maringanti, H. B. [https://d1wqtxts1xzle7.cloudfront.net/54279769/barii_pda_university_conference_papaer-with-cover-page-v2.pdf?Expires=1641826728&Signature=DS5JzbqCFQRw9hEPjL~xTaFLfKURQiaiZzOoMI5sxfBX5lTPNKC5o88s8a4bwvcAesVu9zu1qBPcpaQ1UPdND3hOuh9g41xGu2VqnFviBY0t29CHC9ZTh05D9qmbRh7uuNArYatcI-xvG0cF3Mr2VUk4lR7DAGwAVrNnrQrjrxHnEC-0stsujQQXvDB-3rR9mYTb6fFm6OIA1T-CN4hzbTdiSY-86UuPHXyRgmipxticu0Ss2D~SO7~Fz5kXcHZXApBehMbT-nw0QScCigLtFoHb~BoaxhUiEPZ38-DNfETNjQEQ3pgoaRtXsZybnpcn1wCk72Ofpw75jwBUlf03jw__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA#page=73 Santali Morphological Analysis]. Prof.(Dr.) HIMA BINDU MARINGANTI, 52. pg. no: 74<br />
* Akhtar, M. A. K., Kumar, M., & Sahoo, G. (2017, September). [https://ieeexplore.ieee.org/abstract/document/8125962 Automata for santali language processing]. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 939-943). IEEE.<br />
* Sahoo, S. K., Mishra, B. K., Parida, S., Dash, S. R., Besra, J. N., & Tello, E. V. [https://publications.idiap.ch/attachments/papers/2021/Sahoo_OITSINTERNATIONALCONFERENCEONINFORMATIONTECHNOLOGY(OCIT)_2021.pdf Automatic Dialect Detection for Low Resource Santali Language].<br />
* Dash, S., Sunil Sahoo, Brojo Kishore Mishra, Shantipriya Parida, Jatindra Nath Besra, & Atul Kr. Ojha. (2021). Universal Dependency Treebank for Santali Language. SPAST Abstracts, 1(01). Retrieved from https://spast.org/techrep/article/view/2111<br />
* Basua, J., Hrangkhawlb, T. R., Basuc, T. K., & Majumderd, S. (2021, June). Identification of two tribal languages of India: An experimental study. In Artificial Intelligence and Speech Technology: Proceedings of the 2nd International Conference on Artificial Intelligence and Speech Technology,(AIST2020), 19-20 November, 2020, Delhi, India (p. 221). CRC Press.<br />
<br />
=== Books ===<br />
* Puxley, E. L. (1868). [https://books.google.co.in/books?hl=en&lr=&id=kKcIAAAAQAAJ&oi=fnd&pg=PA1&dq=santali+machine&ots=yrcw6-Z_nv&sig=QKA3nGTTM-8BAIjetrbO9kdw0O8&redir_esc=y#v=onepage&q&f=false A Vocabulary of the Santali Language]. WM Watts.<br />
<br />
=== Dictionary ===<br />
* Campbell, Andrew. A Santali-English Dictionary. Santal mission press, 1899.<br />
* Campbell, A., & MACPHAIL, R. M. (1933). A Santali-English and English-Santali Dictionary... Edited by RM Macphail. Santal Mission Press.<br />
* Bodding, P. O. 1932–1936. A Santali dictionary (5 volumes).<br />
* Bhaduri, Manindra Bhusan. A Mundari-English Dictionary. Asian Educational Services, 1994.<br />
* Hansdah, R. C., and N. C. Murmu. "A Concise Santali-English Dictionary." (2003).<br />
<br />
=== Other closly related Language Dicitionary === <br />
* Ho Dictionary - Deeney, J. J. (1978). Ho-English dictionary. Xavier Ho Publications.<br />
<br />
== Also See ==<br />
*[[Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Calculating_coverage&diff=73834Calculating coverage2022-01-10T15:18:35Z<p>Rocky 734: /* See also= */ fixed a wiki markup</p>
<hr />
<div><br />
[[Calculer la couverture|En français]]<br />
<br />
==Simple bidix-trimmed coverage testing==<br />
<br />
First install apertium-cleanstream:<br />
<br />
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-cleanstream<br />
cd apertium-cleanstream<br />
make<br />
sudo cp apertium-cleanstream /usr/local/bin<br />
<br />
'''Note - After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see [[Migrating tools to GitHub]].'''<br />
<br />
Then save this as coverage.sh:<br />
<br />
#!/bin/bash<br />
mode=$1<br />
outfile=/tmp/$mode.clean<br />
apertium -d . $mode | apertium-cleanstream -n > $outfile<br />
total=$(grep -c '^\^' $outfile)<br />
unknown=$(grep -c '/\*' $outfile)<br />
bidix_unknown=$(grep -c '/@' $outfile)<br />
known_percent=$(calc -p "round( 100*($total-$unknown-$bidix_unknown)/$total, 3)")<br />
echo "$known_percent % known tokens ($unknown unknown, $bidix_unknown bidix-unknown of total $total tokens)"<br />
echo "Top unknown words:"<br />
grep '/[*@]' $outfile | sort | uniq -c | sort -nr | head<br />
<br />
And run it like<br />
<br />
cat asm.corpus | bash coverage.sh asm-eng-biltrans<br />
<br />
(The bidix-unknown count should always be 0 if your pair uses [[lt-trim|automatic analyser trimming]].)<br />
<br />
==TODO: paradigm-coverage (less naïve)==<br />
On an analysed corpus, we can sum frequencies into bins for each lemma+mainpos, so if the analysed corpus contains<br />
<br />
<pre><br />
musa/mus<n><f><sg><def>/muse<vblex><past><br />
mus/mus<n><f><sg><ind>/mus<n><f><pl><ind>/muse<vblex><imp><br />
musene/mus<n><f><pl><def><br />
</pre><br />
then output has<br />
<pre><br />
3 mus<n><f><br />
2 muse<vblex><br />
</pre><br />
and we can find paradigms that are likely to mess up disambiguation, or where we need to ensure that the bidix contains the highest-frequency paradigm (since the bidix is typically smaller than the monodix).<br />
<br />
We could also weight these numbers by number of unique forms in the pardef; if the verb pardef has 6 unique forms and then noun only 3, then the above output should be even more skewed:<br />
<pre><br />
0.33 mus<n><f><br />
0.75 muse<vblex><br />
</pre><br />
<br />
==Faster coverage testing with frequency lists==<br />
<br />
If words appear several times in your corpus, why bother analysing them several times? We can make a frequency list first and add together the frequencies. This script does some very stupid tokenisation and creates a frequency list:<br />
<br />
make-freqlist.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
if [[ -t 0 ]]; then<br />
echo "Expecting a corpus on stdin"<br />
exit 2<br />
fi<br />
<br />
tr '[:space:][:punct:]' '\n' | grep . | sort | uniq -c | sort -nr<br />
</pre><br />
And this script runs your analyser, summing up the frequencies:<br />
<br />
freqlist-coverage.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
set -e -u<br />
<br />
if [[ $# -eq 0 || -t 0 ]]; then<br />
echo "Expecting apertium arguments and a 'sort|uniq -c|sort -nr' style frequency list on stdin"<br />
echo "For example:"<br />
echo "\$ < spa.freqlist $0 -d . spa-morph"<br />
exit 2<br />
fi<br />
<br />
sed 's%^ *%<apertium-notrans>%;s% %</apertium-notrans>%;s%$% .%' |<br />
apertium -f html-noent "$@" |<br />
awk -F'</?apertium-notrans>| *\\^\\./\\.<sent><clb>\\$' '<br />
/[/][*@]/ {<br />
unknown+=$2<br />
if(!printed) print "Top unknown tokens:"<br />
if(++printed<10) print $2,$3<br />
next<br />
}<br />
{<br />
known+=$2<br />
}<br />
END {<br />
total=known+unknown<br />
known_pct=100*known/total<br />
unk_pct=100*unknown/total<br />
print known_pct" % known of total "total" tokens"<br />
}'<br />
</pre> <br />
<br />
Usage:<br />
<br />
$ chmod +x make-freqlist.sh freqlist-coverage.sh<br />
$ bzcat ~/corpora/nno.txt.bz2 |./make-freqlist.sh > nno.freqlist<br />
$ < nno.freqlist ./freqlist-coverage.sh -d ~/apertium-svn/languages/apertium-nno/ nno-morph<br />
<br />
==coverage.py==<br />
<br />
https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/coverage.py is a coverage script that wraps curl and bzcat. <br />
<br />
<br />
'''Note - After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see [[Migrating tools to GitHub]].'''<br />
<br />
== See also ==<br />
<br />
* [[Wikipedia dumps]]<br />
* [[Cleanstream]]<br />
<br />
[[Category:Documentation]]<br />
[[Category:Documentation in English]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Category:Santali&diff=73833Category:Santali2022-01-10T15:11:04Z<p>Rocky 734: Added Santali language to there respective categories</p>
<hr />
<div>[[Category:Languages|Santali]]<br />
[[Category:Indic languages]]</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Santali&diff=73832Santali2022-01-10T14:56:44Z<p>Rocky 734: Added some resources</p>
<hr />
<div>[[Category:Santali]]<br />
[[Category:Languages]]<br />
<br />
'''Santali''' or '''Santhali''' is the most widely spoken language of the Munda subfamily of the Austroasiatic languages, related to Ho and Mundari, spoken mainly in the Indian states of Assam, Bihar, Jharkhand, Mizoram, Odisha, Tripura and West Bengal. It is a recognized regional language of India per the Eighth Schedule of the Indian Constitution. It is spoken by around 7.6&nbsp;million people in India, Bangladesh, Bhutan and Nepal, making it the third most-spoken Austroasiatic language after Vietnamese and Khmer. Santali was a mainly oral language until the development of Ol Chiki by '''Pandit Raghunath Murmu''' in 1925. Ol Chiki is an alphabetic script, sharing none of the syllabic properties of the other Indic scripts, and is now widely used to write Santali language in India. Before the invention of Ol Chiki script Santali language was used to be written in Roman/latin, Devanagari and Kalinga script.<br />
<br />
== Resources ==<br />
=== literature ===<br />
* Neukom, L. (2000). Argument marking in Santali. MonKhmer Studies, 95-114.<br />
* Marandi, C., & Maringanti, H. B. [https://d1wqtxts1xzle7.cloudfront.net/54279769/barii_pda_university_conference_papaer-with-cover-page-v2.pdf?Expires=1641826728&Signature=DS5JzbqCFQRw9hEPjL~xTaFLfKURQiaiZzOoMI5sxfBX5lTPNKC5o88s8a4bwvcAesVu9zu1qBPcpaQ1UPdND3hOuh9g41xGu2VqnFviBY0t29CHC9ZTh05D9qmbRh7uuNArYatcI-xvG0cF3Mr2VUk4lR7DAGwAVrNnrQrjrxHnEC-0stsujQQXvDB-3rR9mYTb6fFm6OIA1T-CN4hzbTdiSY-86UuPHXyRgmipxticu0Ss2D~SO7~Fz5kXcHZXApBehMbT-nw0QScCigLtFoHb~BoaxhUiEPZ38-DNfETNjQEQ3pgoaRtXsZybnpcn1wCk72Ofpw75jwBUlf03jw__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA#page=73 Santali Morphological Analysis]. Prof.(Dr.) HIMA BINDU MARINGANTI, 52. pg. no: 74<br />
* Akhtar, M. A. K., Kumar, M., & Sahoo, G. (2017, September). [https://ieeexplore.ieee.org/abstract/document/8125962 Automata for santali language processing]. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 939-943). IEEE.<br />
* Sahoo, S. K., Mishra, B. K., Parida, S., Dash, S. R., Besra, J. N., & Tello, E. V. [https://publications.idiap.ch/attachments/papers/2021/Sahoo_OITSINTERNATIONALCONFERENCEONINFORMATIONTECHNOLOGY(OCIT)_2021.pdf Automatic Dialect Detection for Low Resource Santali Language].<br />
* Dash, S., Sunil Sahoo, Brojo Kishore Mishra, Shantipriya Parida, Jatindra Nath Besra, & Atul Kr. Ojha. (2021). Universal Dependency Treebank for Santali Language. SPAST Abstracts, 1(01). Retrieved from https://spast.org/techrep/article/view/2111<br />
* Basua, J., Hrangkhawlb, T. R., Basuc, T. K., & Majumderd, S. (2021, June). Identification of two tribal languages of India: An experimental study. In Artificial Intelligence and Speech Technology: Proceedings of the 2nd International Conference on Artificial Intelligence and Speech Technology,(AIST2020), 19-20 November, 2020, Delhi, India (p. 221). CRC Press.<br />
<br />
== Books ==<br />
* Puxley, E. L. (1868). [https://books.google.co.in/books?hl=en&lr=&id=kKcIAAAAQAAJ&oi=fnd&pg=PA1&dq=santali+machine&ots=yrcw6-Z_nv&sig=QKA3nGTTM-8BAIjetrbO9kdw0O8&redir_esc=y#v=onepage&q&f=false A Vocabulary of the Santali Language]. WM Watts.<br />
<br />
=== Dictionary ===<br />
* Campbell, Andrew. A Santali-English Dictionary. Santal mission press, 1899.<br />
* Campbell, A., & MACPHAIL, R. M. (1933). A Santali-English and English-Santali Dictionary... Edited by RM Macphail. Santal Mission Press.<br />
* Bodding, P. O. 1932–1936. A Santali dictionary (5 volumes).<br />
* Bhaduri, Manindra Bhusan. A Mundari-English Dictionary. Asian Educational Services, 1994.<br />
* Hansdah, R. C., and N. C. Murmu. "A Concise Santali-English Dictionary." (2003).<br />
<br />
=== Other closly related Language Dicitionary === <br />
* Ho Dictionary - Deeney, J. J. (1978). Ho-English dictionary. Xavier Ho Publications.<br />
*</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Santali&diff=73827Santali2022-01-10T08:42:07Z<p>Rocky 734: added alternative name of Santali language</p>
<hr />
<div>[[Category:Santali]]<br />
[[Category:Languages]]<br />
<br />
'''Santali''' or '''Santhali''' is the most widely spoken language of the Munda subfamily of the Austroasiatic languages, related to Ho and Mundari, spoken mainly in the Indian states of Assam, Bihar, Jharkhand, Mizoram, Odisha, Tripura and West Bengal. It is a recognized regional language of India per the Eighth Schedule of the Indian Constitution. It is spoken by around 7.6&nbsp;million people in India, Bangladesh, Bhutan and Nepal, making it the third most-spoken Austroasiatic language after Vietnamese and Khmer. Santali was a mainly oral language until the development of Ol Chiki by '''Pandit Raghunath Murmu''' in 1925. Ol Chiki is an alphabetic script, sharing none of the syllabic properties of the other Indic scripts, and is now widely used to write Santali language in India. Before the invention of Ol Chiki script Santali language was used to be written in Roman/latin, Devanagari and Kalinga script.</div>Rocky 734https://wiki.apertium.org/w/index.php?title=Santali&diff=73826Santali2022-01-10T08:39:21Z<p>Rocky 734: Added Santali language Information</p>
<hr />
<div>[[Category:Santali]]<br />
[[Category:Languages]]<br />
<br />
'''Santali''' is the most widely spoken language of the Munda subfamily of the Austroasiatic languages, related to Ho and Mundari, spoken mainly in the Indian states of Assam, Bihar, Jharkhand, Mizoram, Odisha, Tripura and West Bengal. It is a recognized regional language of India per the Eighth Schedule of the Indian Constitution. It is spoken by around 7.6&nbsp;million people in India, Bangladesh, Bhutan and Nepal, making it the third most-spoken Austroasiatic language after Vietnamese and Khmer. Santali was a mainly oral language until the development of Ol Chiki by '''Pandit Raghunath Murmu''' in 1925. Ol Chiki is an alphabetic script, sharing none of the syllabic properties of the other Indic scripts, and is now widely used to write Santali language in India. Before the invention of Ol Chiki script Santali language was used to be written in Roman/latin, Devanagari and Kalinga script.</div>Rocky 734https://wiki.apertium.org/w/index.php?title=User:Rocky_734&diff=73825User:Rocky 7342022-01-10T08:25:32Z<p>Rocky 734: Added a short bio</p>
<hr />
<div>Hi, myself Prasanta Hembram. I like to work in Open Source Projects and Apertium will soon become one of my fav project. Mostly interested in improving Indian language pairs. Currently working on English-Santali and English-Hindi pairs.</div>Rocky 734