https://wiki.apertium.org/w/api.php?action=feedcontributions&user=Dharjunior&feedformat=atom
Apertium - User contributions [en]
2024-03-28T13:50:58Z
User contributions
MediaWiki 1.34.1
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68309
Hindi
2018-12-07T00:45:22Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
<br />
=== Morphological transducer & disambiguator ===<br />
* [[Apertium-hin]]<br />
* [https://github.com/apertium/apertium-hin apertium-hin Github Repository]<br />
<br />
=== Language pairs ===<br />
<br />
====Trunk====<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator<br />
<br />
====Nursery====<br />
<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin] Linguistic data for the Apertium English-Hindi machine translator<br />
<br />
====Incubator====<br />
<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin] Linguistic data for the Apertium Marathi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-bn-hi apertium-bn-hi] Linguistic data for the Apertium Bangla-Hindi machine translator<br />
*[https://github.com/apertium/apertium-hin-pan apertium-hin-pan] Linguistic data for the Apertium Punjabi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-guj-hin apertium-guj-hin] Linguistic data for the Apertium Gujarati-Hindi machine translator<br />
*[https://github.com/apertium/apertium-as-hi apertium-as-hi] Linguistic data for the Apertium Assamese-Hindi machine translator<br />
*[https://github.com/apertium/apertium-snd-hin apertium-snd-hin] Linguistic data for the Apertium Assamese-Hindi machine translator <br />
<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]Copyright © 2018 ACM, Inc<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] © Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru] © John Benjamins Publishing Company<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru] © Cambridge University Press 1982<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul] © 2008 by McNeil Technologies, Inc<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar] © Copyright Frankfurt International School<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English] © 2014 Ravi Narayan et al Creative Commons Attribution License '''(Open Source)'''<br />
<br />
*[http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/ Released under GNU FDL ('''Open Source''')<br />
* http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php Copyright - CFILT, CSE Department, IIT Bombay<br />
* http://e-mahashabdkosh.rb-aai.in/ ©2008 Department of Official Language (DOL) and Centre for Development of Advanced Computing (C-DAC). All rights reserved.<br />
* http://www.aamboli.com/ Copyright 2018 © Aamboli.com<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus] Creative Commons License CC BY-NC '''(Open Source)'''<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS - To be cited if any part of the corpus is used<br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet] © 2007 IIT Bombay, © 2008 Trustees of the University of Pennsylvania<br />
<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68308
Hindi
2018-12-07T00:45:01Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
<br />
=== Morphological transducer & disambiguator ===<br />
* [[Apertium-hin]]<br />
* [https://github.com/apertium/apertium-hin apertium-hin Github Repository]<br />
<br />
=== Language pairs ===<br />
<br />
====Trunk====<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator<br />
<br />
====Nursery====<br />
<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin] Linguistic data for the Apertium English-Hindi machine translator<br />
<br />
====Incubator====<br />
<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin] Linguistic data for the Apertium Marathi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-bn-hi apertium-bn-hi] Linguistic data for the Apertium Bangla-Hindi machine translator<br />
*[https://github.com/apertium/apertium-hin-pan apertium-hin-pan] Linguistic data for the Apertium Punjabi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-guj-hin apertium-guj-hin] Linguistic data for the Apertium Gujarati-Hindi machine translator<br />
*[https://github.com/apertium/apertium-as-hi apertium-as-hi] Linguistic data for the Apertium Assamese-Hindi machine translator<br />
*[https://github.com/apertium/apertium-snd-hin apertium-snd-hin] Linguistic data for the Apertium Assamese-Hindi machine translator <br />
<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]Copyright © 2018 ACM, Inc<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] © Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru] © John Benjamins Publishing Company<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru] © Cambridge University Press 1982<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul] © 2008 by McNeil Technologies, Inc<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar] © Copyright Frankfurt International School<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English] © 2014 Ravi Narayan et al Creative Commons Attribution License '''(Open Source)'''<br />
<br />
*[http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/ Released under GNU FDL '''Open Source'''<br />
* http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php Copyright - CFILT, CSE Department, IIT Bombay<br />
* http://e-mahashabdkosh.rb-aai.in/ ©2008 Department of Official Language (DOL) and Centre for Development of Advanced Computing (C-DAC). All rights reserved.<br />
* http://www.aamboli.com/ Copyright 2018 © Aamboli.com<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus] Creative Commons License CC BY-NC '''(Open Source)'''<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS - To be cited if any part of the corpus is used<br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet] © 2007 IIT Bombay, © 2008 Trustees of the University of Pennsylvania<br />
<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68265
Hindi
2018-12-05T09:17:54Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
<br />
=== Morphological transducer & disambiguator ===<br />
* [[Apertium-hin]]<br />
* [https://github.com/apertium/apertium-hin apertium-hin Github Repository]<br />
<br />
=== Language pairs ===<br />
<br />
====Trunk====<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator<br />
<br />
====Nursery====<br />
<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin] Linguistic data for the Apertium English-Hindi machine translator<br />
<br />
====Incubator====<br />
<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin] Linguistic data for the Apertium Marathi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-bn-hi apertium-bn-hi] Linguistic data for the Apertium Bangla-Hindi machine translator<br />
*[https://github.com/apertium/apertium-hin-pan apertium-hin-pan] Linguistic data for the Apertium Punjabi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-guj-hin apertium-guj-hin] Linguistic data for the Apertium Gujarati-Hindi machine translator<br />
*[https://github.com/apertium/apertium-as-hi apertium-as-hi] Linguistic data for the Apertium Assamese-Hindi machine translator<br />
*[https://github.com/apertium/apertium-snd-hin apertium-snd-hin] Linguistic data for the Apertium Assamese-Hindi machine translator <br />
<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]Copyright © 2018 ACM, Inc<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] © Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru] © John Benjamins Publishing Company<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru] © Cambridge University Press 1982<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul] © 2008 by McNeil Technologies, Inc<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar] © Copyright Frankfurt International School<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English] © 2014 Ravi Narayan et al Creative Commons Attribution License '''(Open Source)'''<br />
<br />
*[http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/ Released under GNU FDL<br />
* http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php Copyright - CFILT, CSE Department, IIT Bombay<br />
* http://e-mahashabdkosh.rb-aai.in/ ©2008 Department of Official Language (DOL) and Centre for Development of Advanced Computing (C-DAC). All rights reserved.<br />
* http://www.aamboli.com/ Copyright 2018 © Aamboli.com<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus] Creative Commons License CC BY-NC '''(Open Source)'''<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS - To be cited if any part of the corpus is used<br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet] © 2007 IIT Bombay, © 2008 Trustees of the University of Pennsylvania<br />
<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68264
Hindi
2018-12-05T09:16:13Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
<br />
=== Morphological transducer & disambiguator ===<br />
* [[Apertium-hin]]<br />
* [https://github.com/apertium/apertium-hin apertium-hin Github Repository]<br />
<br />
=== Language pairs ===<br />
<br />
====Trunk====<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator<br />
<br />
====Nursery====<br />
<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin] Linguistic data for the Apertium English-Hindi machine translator<br />
<br />
====Incubator====<br />
<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin] Linguistic data for the Apertium Marathi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-bn-hi apertium-bn-hi] Linguistic data for the Apertium Bangla-Hindi machine translator<br />
*[https://github.com/apertium/apertium-hin-pan apertium-hin-pan] Linguistic data for the Apertium Punjabi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-guj-hin apertium-guj-hin] Linguistic data for the Apertium Gujarati-Hindi machine translator<br />
*[https://github.com/apertium/apertium-as-hi apertium-as-hi] Linguistic data for the Apertium Assamese-Hindi machine translator<br />
*[https://github.com/apertium/apertium-snd-hin apertium-snd-hin] Linguistic data for the Apertium Assamese-Hindi machine translator <br />
<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]Copyright © 2018 ACM, Inc<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] © Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru] © John Benjamins Publishing Company<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru] © Cambridge University Press 1982<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul] © 2008 by McNeil Technologies, Inc<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar] © Copyright Frankfurt International School<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English] © 2014 Ravi Narayan et al Creative Commons Attribution License<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/ Released under GNU FDL<br />
* http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php Copyright - CFILT, CSE Department, IIT Bombay<br />
* http://e-mahashabdkosh.rb-aai.in/ ©2008 Department of Official Language (DOL) and Centre for Development of Advanced Computing (C-DAC). All rights reserved.<br />
* http://www.aamboli.com/ Copyright 2018 © Aamboli.com |<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus] Creative Commons License CC BY-NC '''(Open Source)'''<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS - To be cited if any part of the corpus is used<br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet] © 2007 IIT Bombay, © 2008 Trustees of the University of Pennsylvania<br />
<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68263
Hindi
2018-12-05T09:15:39Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
<br />
=== Morphological transducer & disambiguator ===<br />
* [[Apertium-hin]]<br />
* [https://github.com/apertium/apertium-hin apertium-hin Github Repository]<br />
<br />
=== Language pairs ===<br />
<br />
====Trunk====<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator<br />
<br />
====Nursery====<br />
<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin] Linguistic data for the Apertium English-Hindi machine translator<br />
<br />
====Incubator====<br />
<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin] Linguistic data for the Apertium Marathi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-bn-hi apertium-bn-hi] Linguistic data for the Apertium Bangla-Hindi machine translator<br />
*[https://github.com/apertium/apertium-hin-pan apertium-hin-pan] Linguistic data for the Apertium Punjabi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-guj-hin apertium-guj-hin] Linguistic data for the Apertium Gujarati-Hindi machine translator<br />
*[https://github.com/apertium/apertium-as-hi apertium-as-hi] Linguistic data for the Apertium Assamese-Hindi machine translator<br />
*[https://github.com/apertium/apertium-snd-hin apertium-snd-hin] Linguistic data for the Apertium Assamese-Hindi machine translator <br />
<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]Copyright © 2018 ACM, Inc<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] © Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru] © John Benjamins Publishing Company<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru] © Cambridge University Press 1982<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul] © 2008 by McNeil Technologies, Inc<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar] © Copyright Frankfurt International School<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English] © 2014 Ravi Narayan et al Creative Commons Attribution License<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/ Released under GNU FDL<br />
* http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php Copyright - CFILT, CSE Department, IIT Bombay<br />
* http://e-mahashabdkosh.rb-aai.in/ ©2008 Department of Official Language (DOL) and Centre for Development of Advanced Computing (C-DAC). All rights reserved.<br />
* http://www.aamboli.com/ Copyright 2018 © Aamboli.com |<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus] Creative Commons License CC BY-NC '''(Open Source)'''<br />
*[http://opus.nlpl.eu/ Hindi-English Parallel Corpora] Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS - To be cited if any part of the corpus is used<br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet] © 2007 IIT Bombay, © 2008 Trustees of the University of Pennsylvania<br />
<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68240
Hindi
2018-12-04T17:48:23Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
<br />
=== Morphological transducer & disambiguator ===<br />
* [[Apertium-hin]]<br />
* [https://github.com/apertium/apertium-hin apertium-hin Github Repository]<br />
<br />
=== Language pairs ===<br />
<br />
====Trunk====<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator<br />
<br />
====Nursery====<br />
<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin] Linguistic data for the Apertium English-Hindi machine translator<br />
<br />
====Incubator====<br />
<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin] Linguistic data for the Apertium Marathi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-bn-hi apertium-bn-hi] Linguistic data for the Apertium Bangla-Hindi machine translator<br />
*[https://github.com/apertium/apertium-hin-pan apertium-hin-pan] Linguistic data for the Apertium Punjabi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-guj-hin apertium-guj-hin] Linguistic data for the Apertium Gujarati-Hindi machine translator<br />
*[https://github.com/apertium/apertium-as-hi apertium-as-hi] Linguistic data for the Apertium Assamese-Hindi machine translator<br />
*[https://github.com/apertium/apertium-snd-hin apertium-snd-hin] Linguistic data for the Apertium Assamese-Hindi machine translator <br />
<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]Copyright © 2018 ACM, Inc<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] © Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru] © John Benjamins Publishing Company<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru] © Cambridge University Press 1982<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul] © 2008 by McNeil Technologies, Inc<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar] © Copyright Frankfurt International School<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English] © 2014 Ravi Narayan et al Creative Commons Attribution License<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/ Released under GNU FDL<br />
* http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php Copyright - CFILT, CSE Department, IIT Bombay<br />
* http://e-mahashabdkosh.rb-aai.in/ ©2008 Department of Official Language (DOL) and Centre for Development of Advanced Computing (C-DAC). All rights reserved.<br />
* http://www.aamboli.com/ Copyright 2018 © Aamboli.com |<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS - To be cited if any part of the corpus is used<br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet] © 2007 IIT Bombay, © 2008 Trustees of the University of Pennsylvania<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus] Creative Commons License CC BY-NC<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68239
Hindi
2018-12-04T17:39:09Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
<br />
=== Morphological transducer & disambiguator ===<br />
* [[Apertium-hin]]<br />
* [https://github.com/apertium/apertium-hin apertium-hin Github Repository]<br />
<br />
=== Language pairs ===<br />
<br />
====Trunk====<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator<br />
<br />
====Nursery====<br />
<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin] Linguistic data for the Apertium English-Hindi machine translator<br />
<br />
====Incubator====<br />
<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin] Linguistic data for the Apertium Marathi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-bn-hi apertium-bn-hi] Linguistic data for the Apertium Bangla-Hindi machine translator<br />
*[https://github.com/apertium/apertium-hin-pan apertium-hin-pan] Linguistic data for the Apertium Punjabi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-guj-hin apertium-guj-hin] Linguistic data for the Apertium Gujarati-Hindi machine translator<br />
*[https://github.com/apertium/apertium-as-hi apertium-as-hi] Linguistic data for the Apertium Assamese-Hindi machine translator<br />
*[https://github.com/apertium/apertium-snd-hin apertium-snd-hin] Linguistic data for the Apertium Assamese-Hindi machine translator <br />
<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]Copyright © 2018 ACM, Inc<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] © Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru] © John Benjamins Publishing Company<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru] © Cambridge University Press 1982<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul] © 2008 by McNeil Technologies, Inc<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar] © Copyright Frankfurt International School<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English]<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68225
Hindi
2018-12-04T05:23:01Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
<br />
=== Morphological transducer & disambiguator ===<br />
* [[Apertium-hin]]<br />
* [https://github.com/apertium/apertium-hin apertium-hin Github Repository]<br />
<br />
=== Language pairs ===<br />
<br />
====Trunk====<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator<br />
<br />
====Nursery====<br />
<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin] Linguistic data for the Apertium English-Hindi machine translator<br />
<br />
====Incubator====<br />
<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin] Linguistic data for the Apertium Marathi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-bn-hi apertium-bn-hi] Linguistic data for the Apertium Bangla-Hindi machine translator<br />
*[https://github.com/apertium/apertium-hin-pan apertium-hin-pan] Linguistic data for the Apertium Punjabi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-guj-hin apertium-guj-hin] Linguistic data for the Apertium Gujarati-Hindi machine translator<br />
*[https://github.com/apertium/apertium-as-hi apertium-as-hi] Linguistic data for the Apertium Assamese-Hindi machine translator<br />
*[https://github.com/apertium/apertium-snd-hin apertium-snd-hin] Linguistic data for the Apertium Assamese-Hindi machine translator <br />
<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]Copyright © 2018 ACM, Inc<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] © Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru]<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru]<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar]<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English]<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68223
Hindi
2018-12-04T05:21:51Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
<br />
=== Morphological transducer & disambiguator ===<br />
* [[Apertium-hin]]<br />
* [https://github.com/apertium/apertium-hin apertium-hin Github Repository]<br />
<br />
=== Language pairs ===<br />
<br />
====Trunk====<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translator<br />
<br />
====Nursery====<br />
<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin] Linguistic data for the Apertium English-Hindi machine translator<br />
<br />
====Incubator====<br />
<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin] Linguistic data for the Apertium Marathi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-bn-hi apertium-bn-hi] Linguistic data for the Apertium Bangla-Hindi machine translator<br />
*[https://github.com/apertium/apertium-hin-pan apertium-hin-pan] Linguistic data for the Apertium Punjabi-Hindi machine translator<br />
*[https://github.com/apertium/apertium-guj-hin apertium-guj-hin] Linguistic data for the Apertium Gujarati-Hindi machine translator<br />
*[https://github.com/apertium/apertium-as-hi apertium-as-hi] Linguistic data for the Apertium Assamese-Hindi machine translator<br />
*[https://github.com/apertium/apertium-snd-hin apertium-snd-hin] Linguistic data for the Apertium Assamese-Hindi machine translator <br />
<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]Copyright © 2018 ACM, Inc<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru]<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru]<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar]<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English]<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68222
Hindi
2018-12-04T05:13:35Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
<br />
=== Morphological transducer & disambiguator ===<br />
* [[Apertium-hin]]<br />
* [https://github.com/apertium/apertium-hin apertium-hin]<br />
<br />
=== Language pairs ===<br />
<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin] Linguistic data for the Apertium English-Hindi machine translator<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin] Linguistic data for the Apertium Marathi-Hindi machine translato<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin] Linguistic data for the Apertium Urdu-Hindi machine translato<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]Copyright © 2018 ACM, Inc<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru]<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru]<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar]<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English]<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68221
Hindi
2018-12-04T05:11:12Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources] COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics Copyright © 2018 ACM, Inc<br />
<br />
Paper on getting reasonable translations between Hindi and Urdu without the use of a parallel corpus.<br />
<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi] ©2009 Center for Applied Linguistics <br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner] Professor Ram Lakhan Meena<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru]<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru]<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar]<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English]<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68208
Hindi
2018-12-03T14:48:39Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi]<br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner]<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru]<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
* [https://doi.org/10.1017/S0267190500000659 Contrastive Analysis of English and Hindi by Yamuna Kachru]<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar]<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English]<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
* [https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68207
Hindi
2018-12-03T14:46:01Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi]<br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner]<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru]<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar]<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English]<br />
<br />
* [https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf Automatic Translation of Nominal Compounds from English to Hindi by Prashant Mathur, Soma Paul]<br />
<br />
*[https://arxiv.org/abs/1404.3992 Assessing the Quality of MT Systems for Hindi to English Translation Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai <br />
Kumar]<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68206
Hindi
2018-12-03T14:40:24Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi]<br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner]<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru]<br />
* [https://www.jstor.org/stable/42931249 Problems In Developing Lexical Resources For Computing by Rita Mathur]<br />
<br />
<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar]<br />
* Hindi Grammar and Reader by Ernest Bender<br />
<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English]<br />
<br />
<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68205
Hindi
2018-12-03T11:18:39Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi]<br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner]<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru]<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English]<br />
<br />
<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68204
Hindi and English
2018-12-03T11:17:45Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSoC==<br />
<br />
English–Hindi is sometimes/often discouraged as a GSoC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already;<br />
# it's hard;<br />
# Google already does it well (the GSoC project is "reaching state-of-the-art");<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi]<br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner]<br />
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru]<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
* [http://esl.fis.edu/grammar/langdiff/hindi.htm Differences between Hindi and Urdu Grammar<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
* [https://www.hindawi.com/journals/tswj/2014/485737/ Quantum Neural Network Based Machine Translator for Hindi to English]<br />
<br />
<br />
<br />
<br />
====Languages Other than English====<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation, Sanjay Chatterji]<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
====Multilingual====<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68203
Hindi and English
2018-12-03T11:08:44Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSoC==<br />
<br />
English–Hindi is sometimes/often discouraged as a GSoC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already;<br />
# it's hard;<br />
# Google already does it well (the GSoC project is "reaching state-of-the-art");<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi]<br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
* [https://surface.syr.edu/cgi/viewcontent.cgi?article=1170&context=suscholar The Oldest Grammar of Hindustani Tej K. Bhatia]<br />
<br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [http://aircconline.com/ijaia/V8N5/8517ijaia04.pdf Building An Effective MT System For English-Hindi Using RNNs Ruchit Agrawal & Dipti Misra Sharma] <br />
<br />
* [https://www.researchgate.net/publication/319351932_Neural_Machine_Translation_of_Indian_Languages Neural Machine Translation of Indian Languages]<br />
<br />
<br />
<br />
<br />
=Languages Other than English=<br />
<br />
* [http://h2p.learnpunjabi.org/ Hindi-Punjabi Machine Translation]<br />
* [http://www.academia.edu/3275646/A_Hybrid_Approach_for_Bengali_to_Hindi_Machine_Translation A Hybrid Approach for Bengali to Hindi Machine Translation<br />
Sanjay Chatterji]<br />
<br />
<br />
<br />
<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
=Multilingual=<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68202
Hindi and English
2018-12-03T10:57:57Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSoC==<br />
<br />
English–Hindi is sometimes/often discouraged as a GSoC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already;<br />
# it's hard;<br />
# Google already does it well (the GSoC project is "reaching state-of-the-art");<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi]<br />
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [https://ieeexplore.ieee.org/document/7732363 An efficient English to Hindi machine translation system using hybrid mechanism]<br />
<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
* http://e-mahashabdkosh.rb-aai.in/<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
* http://www.aamboli.com/<br />
* [http://tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1552&lang=en Bilingual Dictionary Marathi to Hindi]<br />
<br />
=Multilingual=<br />
<br />
* [http://troindia.in/journal/ijcesr/vol3isss9/74-80.pdf An English To Assamese, Bengali And Hindi Multilingual E-Dictionary]<br />
<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68201
Hindi and English
2018-12-03T10:48:14Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSoC==<br />
<br />
English–Hindi is sometimes/often discouraged as a GSoC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already;<br />
# it's hard;<br />
# Google already does it well (the GSoC project is "reaching state-of-the-art");<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]<br />
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi]<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
* [https://www.cse.iitb.ac.in/~pb/papers/eng-hindi-mt.pdf Interlingua based English-Hindi Machine Translation and Language Divergence]<br />
<br />
* [https://ieeexplore.ieee.org/document/7732363 An efficient English to Hindi machine translation system using hybrid mechanism]<br />
<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68200
Hindi and English
2018-12-03T10:43:54Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSoC==<br />
<br />
English–Hindi is sometimes/often discouraged as a GSoC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already;<br />
# it's hard;<br />
# Google already does it well (the GSoC project is "reaching state-of-the-art");<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===General===<br />
<br />
* [http://www.aclweb.org/anthology/C10-2147 Urdu and Hindi: Translation and sharing of linguistic resources]<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
* [https://airccj.org/CSCP/vol7/csit77206.pdf Experiments On Different Recurrent Neural Networks For English-Hindi Machine Translation]<br />
<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]<br />
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]<br />
<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68196
Hindi and English
2018-12-03T06:26:37Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSoC==<br />
<br />
English–Hindi is sometimes/often discouraged as a GSoC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already;<br />
# it's hard;<br />
# Google already does it well (the GSoC project is "reaching state-of-the-art");<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
*[https://github.com/apertium/apertium-mar-hin?files=1 apertium-mar-hin]<br />
*[https://github.com/apertium/apertium-urd-hin?files=1 apertium-urd-hin]<br />
<br />
==External Resources==<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68189
Hindi and English
2018-12-03T06:06:45Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSoC==<br />
<br />
English–Hindi is sometimes/often discouraged as a GSoC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already;<br />
# it's hard;<br />
# Google already does it well (the GSoC project is "reaching state-of-the-art");<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
==Apertium Resources==<br />
*[https://github.com/apertium/apertium-hin apertium-hin]<br />
*[https://github.com/apertium/apertium-eng-hin apertium-eng-hin]<br />
<br />
==External Resources==<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68094
Hindi and English
2018-12-01T14:49:59Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSOC===<br />
<br />
English–Hindi is sometimes/often discouraged as a GSOC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already<br />
# hard.<br />
# Google already does it well (GSOC project is "reaching state-of-the-art")<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
<br />
==Resources==<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
* [http://shodhganga.inflibnet.ac.in/bitstream/10603/78191/11/11_chapter%204.pdf Grammatical & Inflectional Analysis of Hindi and Dogri] <br />
<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi&diff=68093
Hindi
2018-12-01T14:42:34Z
<p>Dharjunior: </p>
<hr />
<div><br />
{{TOCD}}<br />
<br />
<br />
<br />
==Resources==<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
<br />
<br />
<br />
<br />
==Devanagari and Unicode==<br />
<br />
<br />
<br />
[[Category:Hindi|*]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68092
Hindi and English
2018-12-01T14:41:51Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSOC===<br />
<br />
English–Hindi is sometimes/often discouraged as a GSOC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already<br />
# hard.<br />
# Google already does it well (GSOC project is "reaching state-of-the-art")<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
<br />
==Resources==<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
<br />
<br />
===Corpora===<br />
<br />
* [http://opus.nlpl.eu/ Hindi-English Parallel Corpora] <br />
* [http://www.lrec-conf.org/proceedings/lrec2014/pdf/835_Paper.pdf HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation]<br />
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68091
Hindi and English
2018-12-01T14:27:32Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSOC===<br />
<br />
English–Hindi is sometimes/often discouraged as a GSOC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already<br />
# hard.<br />
# Google already does it well (GSOC project is "reaching state-of-the-art")<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
<br />
==Resources==<br />
<br />
===Grammars===<br />
<br />
* [http://www.koausa.org/iils/pdf/ModernHindiGrammar.pdf Modern Hindi Grammar, Omkar N. Koul]<br />
<br />
<br />
===Machine Translation===<br />
<br />
* [http://web2py.iiit.ac.in/research_centres/publications/download/mastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf Hindi to English Machine Translation, Kunal Sachdeva]<br />
<br />
* [https://arxiv.org/ftp/arxiv/papers/1702/1702.01587.pdf A Hybrid Approach For Hindi-English Machine Translation]<br />
<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
===Dictionaries===<br />
<br />
* http://hindi-english.org/<br />
<br />
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary<br />
<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68090
Hindi and English
2018-12-01T11:29:01Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSOC===<br />
<br />
English–Hindi is sometimes/often discouraged as a GSOC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already<br />
# hard.<br />
# Google already does it well (GSOC project is "reaching state-of-the-art")<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
<br />
==Resources==<br />
<br />
<br />
===Morphology===<br />
<br />
* [http://www.ijsrp.org/research-paper-0613/ijsrp-p18124.pdf Morphology: Indian Languages and European Languages (contains morphological information for both English and Hindi)]<br />
<br />
<br />
* [http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
* [http://learnpunjabi.org/pdf/gslehal-pap21.pdf Hindi Morphological Analyzer and Generator]<br />
<br />
* [http://aclweb.org/anthology/W12-2302 Hindi Derivational Morphological Analyzer]<br />
<br />
<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Hindi_and_English&diff=68089
Hindi and English
2018-12-01T11:23:03Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
==English and Hindi for GSOC===<br />
<br />
English–Hindi is sometimes/often discouraged as a GSOC project because:<br />
<br />
# it has been done unsuccessfully at least 3 times already<br />
# hard.<br />
# Google already does it well (GSOC project is "reaching state-of-the-art")<br />
# ...<br />
<br />
==Todo list==<br />
{{main|/Work plan|Work plan}}<br />
<br />
* <s>Check that [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/xupaixkar/rasskaz/en.txt the story] is correctly translated from English to Hindi and that there are no missing sentences.</s><br />
* <s>Fix spelling errors in the story.</s><br />
* <s>Add other personal pronouns on the model of 'I' -- in both the Hindi dictionary and the bilingual dictionary.</s><br />
* <s>Come up with short example sentences (they should include a verb) for common postpositions "with the cat" "under the table". etc.</s><br />
* Every word in the bidix should have a POS tag<br />
* <s>Every noun on the Hindi side should have gender.</s><br />
* Adjectives in English in the bidix should be marked for 'sint'.<br />
* Check if there are any forms missing from [[#Verb_conjugation]]<br />
<br />
==Disambiguation==<br />
<br />
<pre><br />
^बच्चों/बच्चा<n><m><pl><obl>$ <br />
^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ <br />
^मॉं/मॉं<n><f><sg><nom>/मॉं<n><f><sg><obl>$ <br />
</pre><br />
<br />
==Grammar stuff==<br />
<br />
===Noun phrase===<br />
<br />
===Postpositional phrase===<br />
<br />
<pre><br />
NP-obl post<br />
</pre><br />
<br />
Postposition can be simple or kā + adv/noun.<br />
<br />
===Genitive phrase===<br />
<br />
<pre><br />
<br />
SN-obl kā-$GEN.$NBR.nom SN-$GEN.$NBR.nom<br />
<br />
</pre><br />
<br />
===Verb conjugation===<br />
<div style="float:right"><br />
{|class=wikitable<br />
! Tag !! Description<br />
|-<br />
| {{tag|impf}} || Imperfective participle<br />
|-<br />
| {{tag|perf}} || Perfective participle<br />
|-<br />
| {{tag|inf}} || Infinitive<br />
|-<br />
| {{tag|prs}} || Subjunctive<br />
|-<br />
| {{tag|fut}} || Future<br />
|-<br />
| {{tag|imp}} || Imperative<br />
|-<br />
| ... || ...<br />
|- <br />
|}<br />
</div><br />
; Transitive<br />
<br />
* Imperfective participle: -taa, -tii, -te, -tii<br />
* Perfective participle: -aa, -ii, -e, -iim<br />
* Infinitive: -naa<br />
* Subjunctive: -uum, -e, -e, -em, -o, -em<br />
* Future: subjunctive + -gaa, -ge, -gii, -gii<br />
* Imperative:<br />
* Verbal adverbs: -kar / -0 kar<br />
<br />
<pre><br />
बोल; बोलना; inf.nom; vblex.tv <br />
बोल; बोलने; inf.obl; vblex.tv <br />
बोल; बोलता; impf.m.sg; vblex.tv<br />
बोल; बोलती; impf.f.sg; vblex.tv<br />
बोल; बोलते; impf.m.pl; vblex.tv<br />
बोल; बोलती; impf.f.pl; vblex.tv<br />
बोल; बोला; perf.m.sg; vblex.tv<br />
बोल; बोली; perf.f.sg; vblex.tv<br />
बोल; बोले; perf.m.pl; vblex.tv<br />
बोल; बोली; perf.f.pl; vblex.tv<br />
बोल; बोलूं; prs.p1.sg; vblex.tv<br />
बोल; बोले; prs.p2.sg; vblex.tv<br />
बोल; बोले; prs.p3.sg; vblex.tv<br />
बोल; बोलें; prs.p1.pl; vblex.tv<br />
बोल; बोलो; prs.p2.pl; vblex.tv<br />
बोल; बोलें; prs.p3.pl; vblex.tv<br />
बोल; बोलूंगा; fut.p1.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p2.m.sg; vblex.tv<br />
बोल; बोलेगा; fut.p3.m.sg; vblex.tv<br />
बोल; बोलेंगे; fut.p1.m.pl; vblex.tv<br />
बोल; बोलोगे; fut.p2.m.pl; vblex.tv<br />
बोल; बोलेंगे; fut.p3.m.pl; vblex.tv<br />
बोल; बोलूंगी; fut.p1.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p2.f.sg; vblex.tv<br />
बोल; बोलेगी; fut.p3.f.sg; vblex.tv<br />
बोल; बोलेंगी; fut.p1.f.pl; vblex.tv<br />
बोल; बोलोगी; fut.p2.f.pl; vblex.tv<br />
बोल; बोलेंगी; fut.p3.f.pl; vblex.tv<br />
बोल; बोल; imp.p2.sg; vblex.tv<br />
बोल; बोलो; imp.p2.pl; vblex.tv<br />
बोल; बोलिए; imp.p2.frm.pl; vblex.tv<br />
बोल; बोलकर; gna; vblex.tv<br />
बोल; बोलके; gna; vblex.tv<br />
बोल; बोलनेवाला; agnt.m.sg; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.sg; vblex.tv<br />
बोल; बोलनेवाले; agnt.m.pl; vblex.tv<br />
बोल; बोलनेवाली; agnt.f.pl; vblex.tv<br />
</pre><br />
<br />
; Intransitive<br />
<br />
===Verb patterns===<br />
<br />
; impf participle + "be" present<br />
<br />
* pres<br />
<br />
; stem + "raha" impf participle + "be" present<br />
<br />
* "be" present + ger<br />
<br />
; perf participle <br />
<br />
* past<br />
<br />
; perf participle + "be" present <br />
<br />
* "have" present + pp<br />
<br />
; perf participle + "be" past<br />
<br />
* past<br />
* "have" past + pp<br />
<br />
; impf participle + "be" past<br />
<br />
* past<br />
* "used to" + inf<br />
<br />
; impf participle + "raha" present + "be" past<br />
<br />
* "be" past + ger<br />
<br />
; future<br />
<br />
* "will" present + inf<br />
<br />
; impf participle + "raha" future<br />
<br />
* "will be" + ger<br />
<br />
; prs<br />
<br />
* "may" present + inf<br />
<br />
<br />
===Resources===<br />
<br />
<br />
==Morphology==<br />
<br />
[http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2010/singh-sarma.pdf Hindi Noun Inflection and Distributed Morphology, Smriti Singh and Vaijayanthi M Sarma]<br />
<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[/Pending tests|Pending tests]]<br />
* [[Hindi and Urdu]]<br />
* [[Hindi]]<br />
<br />
[[Category:Hindi and English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Calculating_coverage&diff=68088
Calculating coverage
2018-12-01T10:08:20Z
<p>Dharjunior: </p>
<hr />
<div><br />
[[Calculer la couverture|En français]]<br />
<br />
==Simple bidix-trimmed coverage testing==<br />
<br />
First install apertium-cleanstream:<br />
<br />
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-cleanstream<br />
cd apertium-cleanstream<br />
make<br />
sudo cp apertium-cleanstream /usr/local/bin<br />
<br />
'''Note - After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see [[Migrating tools to GitHub]].'''<br />
<br />
Then save this as coverage.sh:<br />
<br />
#!/bin/bash<br />
mode=$1<br />
outfile=/tmp/$mode.clean<br />
apertium -d . $mode | apertium-cleanstream -n > $outfile<br />
total=$(grep -c '^\^' $outfile)<br />
unknown=$(grep -c '/\*' $outfile)<br />
bidix_unknown=$(grep -c '/@' $outfile)<br />
known_percent=$(calc -p "round( 100*($total-$unknown-$bidix_unknown)/$total, 3)")<br />
echo "$known_percent % known tokens ($unknown unknown, $bidix_unknown bidix-unknown of total $total tokens)"<br />
echo "Top unknown words:"<br />
grep '/[*@]' $outfile | sort | uniq -c | sort -nr | head<br />
<br />
And run it like<br />
<br />
cat asm.corpus | bash coverage.sh asm-eng-biltrans<br />
<br />
(The bidix-unknown count should always be 0 if your pair uses [[lt-trim|automatic analyser trimming]].)<br />
<br />
==TODO: paradigm-coverage (less naïve)==<br />
On an analysed corpus, we can sum frequencies into bins for each lemma+mainpos, so if the analysed corpus contains<br />
<br />
<pre><br />
musa/mus<n><f><sg><def>/muse<vblex><past><br />
mus/mus<n><f><sg><ind>/mus<n><f><pl><ind>/muse<vblex><imp><br />
musene/mus<n><f><pl><def><br />
</pre><br />
then output has<br />
<pre><br />
3 mus<n><f><br />
2 muse<vblex><br />
</pre><br />
and we can find paradigms that are likely to mess up disambiguation, or where we need to ensure that the bidix contains the highest-frequency paradigm (since the bidix is typically smaller than the monodix).<br />
<br />
We could also weight these numbers by number of unique forms in the pardef; if the verb pardef has 6 unique forms and then noun only 3, then the above output should be even more skewed:<br />
<pre><br />
0.33 mus<n><f><br />
0.75 muse<vblex><br />
</pre><br />
<br />
==Faster coverage testing with frequency lists==<br />
<br />
If words appear several times in your corpus, why bother analysing them several times? We can make a frequency list first and add together the frequencies. This script does some very stupid tokenisation and creates a frequency list:<br />
<br />
make-freqlist.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
if [[ -t 0 ]]; then<br />
echo "Expecting a corpus on stdin"<br />
exit 2<br />
fi<br />
<br />
tr '[:space:][:punct:]' '\n' | grep . | sort | uniq -c | sort -nr<br />
</pre><br />
And this script runs your analyser, summing up the frequencies:<br />
<br />
freqlist-coverage.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
set -e -u<br />
<br />
if [[ $# -eq 0 || -t 0 ]]; then<br />
echo "Expecting apertium arguments and a 'sort|uniq -c|sort -nr' style frequency list on stdin"<br />
echo "For example:"<br />
echo "\$ < spa.freqlist $0 -d . spa-morph"<br />
exit 2<br />
fi<br />
<br />
sed 's%^ *%<apertium-notrans>%;s% %</apertium-notrans>%;s%$% .%' |<br />
apertium -f html-noent "$@" |<br />
awk -F'</?apertium-notrans>| *\\^\\./\\.<sent><clb>\\$' '<br />
/[/][*@]/ {<br />
unknown+=$2<br />
if(!printed) print "Top unknown tokens:"<br />
if(++printed<10) print $2,$3<br />
next<br />
}<br />
{<br />
known+=$2<br />
}<br />
END {<br />
total=known+unknown<br />
known_pct=100*known/total<br />
unk_pct=100*unknown/total<br />
print known_pct" % known of total "total" tokens"<br />
}'<br />
</pre> <br />
<br />
Usage:<br />
<br />
$ chmod +x make-freqlist.sh freqlist-coverage.sh<br />
$ bzcat ~/corpora/nno.txt.bz2 |./make-freqlist.sh > nno.freqlist<br />
$ <nno.freqlist ./freqlist-coverage.sh -d ~/apertium-svn/languages/apertium-nno/ nno-morph<br />
<br />
==coverage.py==<br />
<br />
https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/coverage.py is a coverage script that wraps curl and bzcat. <br />
<br />
<br />
'''Note - After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see [[Migrating tools to GitHub]].'''<br />
<br />
=See also==<br />
<br />
* [[Wikipedia dumps]]<br />
* [[Cleanstream]]<br />
<br />
[[Category:Documentation]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Calculating_coverage&diff=68087
Calculating coverage
2018-12-01T10:07:28Z
<p>Dharjunior: </p>
<hr />
<div><br />
[[Calculer la couverture|En français]]<br />
<br />
==Simple bidix-trimmed coverage testing==<br />
<br />
First install apertium-cleanstream:<br />
<br />
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-cleanstream<br />
cd apertium-cleanstream<br />
make<br />
sudo cp apertium-cleanstream /usr/local/bin<br />
<br />
'''Note - After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.'''<br />
<br />
Then save this as coverage.sh:<br />
<br />
#!/bin/bash<br />
mode=$1<br />
outfile=/tmp/$mode.clean<br />
apertium -d . $mode | apertium-cleanstream -n > $outfile<br />
total=$(grep -c '^\^' $outfile)<br />
unknown=$(grep -c '/\*' $outfile)<br />
bidix_unknown=$(grep -c '/@' $outfile)<br />
known_percent=$(calc -p "round( 100*($total-$unknown-$bidix_unknown)/$total, 3)")<br />
echo "$known_percent % known tokens ($unknown unknown, $bidix_unknown bidix-unknown of total $total tokens)"<br />
echo "Top unknown words:"<br />
grep '/[*@]' $outfile | sort | uniq -c | sort -nr | head<br />
<br />
And run it like<br />
<br />
cat asm.corpus | bash coverage.sh asm-eng-biltrans<br />
<br />
(The bidix-unknown count should always be 0 if your pair uses [[lt-trim|automatic analyser trimming]].)<br />
<br />
==TODO: paradigm-coverage (less naïve)==<br />
On an analysed corpus, we can sum frequencies into bins for each lemma+mainpos, so if the analysed corpus contains<br />
<br />
<pre><br />
musa/mus<n><f><sg><def>/muse<vblex><past><br />
mus/mus<n><f><sg><ind>/mus<n><f><pl><ind>/muse<vblex><imp><br />
musene/mus<n><f><pl><def><br />
</pre><br />
then output has<br />
<pre><br />
3 mus<n><f><br />
2 muse<vblex><br />
</pre><br />
and we can find paradigms that are likely to mess up disambiguation, or where we need to ensure that the bidix contains the highest-frequency paradigm (since the bidix is typically smaller than the monodix).<br />
<br />
We could also weight these numbers by number of unique forms in the pardef; if the verb pardef has 6 unique forms and then noun only 3, then the above output should be even more skewed:<br />
<pre><br />
0.33 mus<n><f><br />
0.75 muse<vblex><br />
</pre><br />
<br />
==Faster coverage testing with frequency lists==<br />
<br />
If words appear several times in your corpus, why bother analysing them several times? We can make a frequency list first and add together the frequencies. This script does some very stupid tokenisation and creates a frequency list:<br />
<br />
make-freqlist.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
if [[ -t 0 ]]; then<br />
echo "Expecting a corpus on stdin"<br />
exit 2<br />
fi<br />
<br />
tr '[:space:][:punct:]' '\n' | grep . | sort | uniq -c | sort -nr<br />
</pre><br />
And this script runs your analyser, summing up the frequencies:<br />
<br />
freqlist-coverage.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
set -e -u<br />
<br />
if [[ $# -eq 0 || -t 0 ]]; then<br />
echo "Expecting apertium arguments and a 'sort|uniq -c|sort -nr' style frequency list on stdin"<br />
echo "For example:"<br />
echo "\$ < spa.freqlist $0 -d . spa-morph"<br />
exit 2<br />
fi<br />
<br />
sed 's%^ *%<apertium-notrans>%;s% %</apertium-notrans>%;s%$% .%' |<br />
apertium -f html-noent "$@" |<br />
awk -F'</?apertium-notrans>| *\\^\\./\\.<sent><clb>\\$' '<br />
/[/][*@]/ {<br />
unknown+=$2<br />
if(!printed) print "Top unknown tokens:"<br />
if(++printed<10) print $2,$3<br />
next<br />
}<br />
{<br />
known+=$2<br />
}<br />
END {<br />
total=known+unknown<br />
known_pct=100*known/total<br />
unk_pct=100*unknown/total<br />
print known_pct" % known of total "total" tokens"<br />
}'<br />
</pre> <br />
<br />
Usage:<br />
<br />
$ chmod +x make-freqlist.sh freqlist-coverage.sh<br />
$ bzcat ~/corpora/nno.txt.bz2 |./make-freqlist.sh > nno.freqlist<br />
$ <nno.freqlist ./freqlist-coverage.sh -d ~/apertium-svn/languages/apertium-nno/ nno-morph<br />
<br />
==coverage.py==<br />
<br />
https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/coverage.py is a coverage script that wraps curl and bzcat. <br />
<br />
<br />
'''Note - After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.'''<br />
<br />
=See also==<br />
<br />
* [[Wikipedia dumps]]<br />
* [[Cleanstream]]<br />
<br />
[[Category:Documentation]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Calculating_coverage&diff=68086
Calculating coverage
2018-12-01T10:05:23Z
<p>Dharjunior: </p>
<hr />
<div><br />
[[Calculer la couverture|En français]]<br />
<br />
==Simple bidix-trimmed coverage testing==<br />
<br />
First install apertium-cleanstream:<br />
<br />
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-cleanstream<br />
cd apertium-cleanstream<br />
make<br />
sudo cp apertium-cleanstream /usr/local/bin<br />
<br />
'''Note - After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.'''<br />
<br />
Then save this as coverage.sh:<br />
<br />
#!/bin/bash<br />
mode=$1<br />
outfile=/tmp/$mode.clean<br />
apertium -d . $mode | apertium-cleanstream -n > $outfile<br />
total=$(grep -c '^\^' $outfile)<br />
unknown=$(grep -c '/\*' $outfile)<br />
bidix_unknown=$(grep -c '/@' $outfile)<br />
known_percent=$(calc -p "round( 100*($total-$unknown-$bidix_unknown)/$total, 3)")<br />
echo "$known_percent % known tokens ($unknown unknown, $bidix_unknown bidix-unknown of total $total tokens)"<br />
echo "Top unknown words:"<br />
grep '/[*@]' $outfile | sort | uniq -c | sort -nr | head<br />
<br />
And run it like<br />
<br />
cat asm.corpus | bash coverage.sh asm-eng-biltrans<br />
<br />
(The bidix-unknown count should always be 0 if your pair uses [[lt-trim|automatic analyser trimming]].)<br />
<br />
==TODO: paradigm-coverage (less naïve)==<br />
On an analysed corpus, we can sum frequencies into bins for each lemma+mainpos, so if the analysed corpus contains<br />
<br />
<pre><br />
musa/mus<n><f><sg><def>/muse<vblex><past><br />
mus/mus<n><f><sg><ind>/mus<n><f><pl><ind>/muse<vblex><imp><br />
musene/mus<n><f><pl><def><br />
</pre><br />
then output has<br />
<pre><br />
3 mus<n><f><br />
2 muse<vblex><br />
</pre><br />
and we can find paradigms that are likely to mess up disambiguation, or where we need to ensure that the bidix contains the highest-frequency paradigm (since the bidix is typically smaller than the monodix).<br />
<br />
We could also weight these numbers by number of unique forms in the pardef; if the verb pardef has 6 unique forms and then noun only 3, then the above output should be even more skewed:<br />
<pre><br />
0.33 mus<n><f><br />
0.75 muse<vblex><br />
</pre><br />
<br />
==Faster coverage testing with frequency lists==<br />
<br />
If words appear several times in your corpus, why bother analysing them several times? We can make a frequency list first and add together the frequencies. This script does some very stupid tokenisation and creates a frequency list:<br />
<br />
make-freqlist.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
if [[ -t 0 ]]; then<br />
echo "Expecting a corpus on stdin"<br />
exit 2<br />
fi<br />
<br />
tr '[:space:][:punct:]' '\n' | grep . | sort | uniq -c | sort -nr<br />
</pre><br />
And this script runs your analyser, summing up the frequencies:<br />
<br />
freqlist-coverage.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
set -e -u<br />
<br />
if [[ $# -eq 0 || -t 0 ]]; then<br />
echo "Expecting apertium arguments and a 'sort|uniq -c|sort -nr' style frequency list on stdin"<br />
echo "For example:"<br />
echo "\$ < spa.freqlist $0 -d . spa-morph"<br />
exit 2<br />
fi<br />
<br />
sed 's%^ *%<apertium-notrans>%;s% %</apertium-notrans>%;s%$% .%' |<br />
apertium -f html-noent "$@" |<br />
awk -F'</?apertium-notrans>| *\\^\\./\\.<sent><clb>\\$' '<br />
/[/][*@]/ {<br />
unknown+=$2<br />
if(!printed) print "Top unknown tokens:"<br />
if(++printed<10) print $2,$3<br />
next<br />
}<br />
{<br />
known+=$2<br />
}<br />
END {<br />
total=known+unknown<br />
known_pct=100*known/total<br />
unk_pct=100*unknown/total<br />
print known_pct" % known of total "total" tokens"<br />
}'<br />
</pre> <br />
<br />
Usage:<br />
<br />
$ chmod +x make-freqlist.sh freqlist-coverage.sh<br />
$ bzcat ~/corpora/nno.txt.bz2 |./make-freqlist.sh > nno.freqlist<br />
$ <nno.freqlist ./freqlist-coverage.sh -d ~/apertium-svn/languages/apertium-nno/ nno-morph<br />
<br />
==coverage.py==<br />
<br />
https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/coverage.py is a coverage script that wraps curl and bzcat (?)<br />
<br />
==See also==<br />
<br />
* [[Wikipedia dumps]]<br />
* [[Cleanstream]]<br />
<br />
[[Category:Documentation]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Specific_resources_per_language&diff=68073
Specific resources per language
2018-11-30T13:39:22Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
The incubator can be found in the 'incubator' column in https://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work.<br />
<br />
<br />
==Specific resources per language==<br />
<br />
Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.<br />
<br />
See also the individual language pages. <br />
<br />
===[[Albanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-sqi/blob/master/apertium-sqi.sqi.dix Albanian Monodix]''<br />
<br />
;Resources<br />
* http://www.albanianoverview.com/grammar.htm<br />
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free)<br />
<br />
===[[Armenian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-hye/blob/master/apertium-hye.hye.dix Armenian Monodix]''<br />
<br />
;Resources<br />
<br />
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons<br />
<br />
===[[Assamese and Hindi]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-as-hi/blob/91f3c38b0c636deb620cbd27725d63dd763c5f0b/apertium-as-hi.hi.dix Assemese-Hindi Bidix]''<br />
<br />
<br />
--- Anusuya<br />
<br />
===[[Belarusian]]=== <br />
<br />
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language]<br />
<br />
===[[Bengali]]===<br />
<br />
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali <br />
* http://anubadok.sf.net/ -- See above<br />
<br />
===[[Bulgarian]]===<br />
:''Dictionary: [https://raw.githubusercontent.com/apertium/apertium-bul/master/apertium-bul.bul.dix Bulgarian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology]<br />
<br />
===[[Cornish]]===<br />
<br />
:''Dictionary: [https://sourceforge.net/projects/apertium/files/apertium-cy-en/0.1.0/ Cornish Monodix from SourceForge]''<br />
<br />
'''This resource has not been migrated to GitHub from SVN<br />
'''<br />
<br />
;Resources<br />
<br />
* [http://www.cornishtranslator.com/ Cornish Translator]<br />
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist]<br />
<br />
===[[Czech]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-cs.cs.dix.xml apertium-pl-cs.cs.dix.xml]'' <br />
'''This resource has not been migrated to GitHub from SVN<br />
'''<br />
<br />
:''Dictionary: [https://github.com/apertium/apertium-eo-cs/blob/c16fa21194a285941307a68e420c194a1825ebc3/apertium-eo-cs.eo-cs.dix Czech-Esperanto Bidix]''<br />
:''Dictionary: [https://github.com/apertium/apertium-cs-sl/tree/062fa172705e16f77302a8096df3733581079fb8 Czech-Slovenian Bidix]''<br />
;Resources<br />
<br />
* [http://nlp.fi.muni.cz/nlp/aisa/NlpCz/Frekvence_slov_lemmat.html Most frequent words] Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords<br />
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links]<br />
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]<br />
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source<br />
<br />
===[[Faroese]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-fao/blob/master/apertium-fao.fao.dix Faroese Monodix]''<br />
<br />
;Resources<br />
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ]<br />
* [https://github.com/apertium/apertium-fao-isl/blob/master/apertium-fao-isl.fao-isl.rlx Faroese Constraint Grammar]<br />
* [http://www.archive.org/details/frskanthologi00denmgoog Faroese-Danish dictionary from 1886]<br />
<br />
===[[Finnish]]===<br />
{{see-also|Omorfi}}<br />
;Resources<br />
<br />
* http://kaino.kotus.fi/sanat/nykysuomi/ &mdash; full form list for Finnish -- LGPL<br />
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language]<br />
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]<br />
<pre><br />
s = lemma<br />
hn = homonymy ref<br />
t = inflection info<br />
tn = inflection number (referring to table)<br />
av = ref to consonant gradation<br />
</pre><br />
<br />
===[[German and English]]===<br />
<br />
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for [http://www-user.tu-chemnitz.de/~fri/ding/ "Ding: A Dictionary LookUp program"] (version 1.5 2007-04-09) from Frank Richter, [http://tu-chemnitz.de Technische Universität Chemnitz]<br />
<br />
:''[https://github.com/apertium/apertium-eng-deu/blob/master/apertium-eng-deu.eng-deu.dix German-English Dictionary]''<br />
<br />
===[[Greek]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-ell/blob/master/apertium-ell.ell.dix Greek Monodix] <br />
:''Greek-English Dictionary: [https://github.com/apertium/apertium-ell-eng/blob/master/apertium-ell-eng.eng.dix Greek-English Dictionary]<br />
<br />
;Resources<br />
<br />
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/<br />
<br />
===[[Hebrew]]===<br />
<br />
;Resources<br />
<br />
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL<br />
* http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password<br />
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL<br />
* http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown<br />
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL<br />
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL<br />
<br />
===[[Hindi]]===<br />
{{see-also|Hindi}}<br />
<br />
;Resources<br />
<br />
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2<br />
<br />
* https://github.com/unhammer/apertium-en-hi/blob/master/apertium-en-hi.en.dix <br />
* https://github.com/apertium/apertium-hin/blob/master/apertium-hin.hin.dix <br />
* https://github.com/apertium/apertium-urd-hin/blob/master/dev/en-hi-ur.list<br />
* https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix<br />
<br />
<br />
<br />
===[[Iranian Persian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-pes/blob/master/apertium-pes.pes.dix Persian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian]<br />
<br />
===[[Ingush]]===<br />
<br />
; Resources<br />
<br />
* [http://www.linguistics.berkeley.edu/~ingush/database.html Lexical database] (non-free)<br />
* [http://books.google.com/books?id=J7wqVHeRWdwC&pg=PA5&lpg=PA5&dq=ingush+father&source=bl&ots=N8TDZudzGZ&sig=JO9X_Y9gio7dUhZWeyZX7j17iPw&hl=ca&ei=vfq4TM6CH86OjAfO94XaDg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CB8Q6AEwAg#v=onepage&q=ingush%20father&f=false Ingush-English dict] (non-free)<br />
<br />
===[[Latvian]]===<br />
;Resources<br />
* https://github.com/PeterisP/morphology GPL full-form dictionary (https://github.com/PeterisP/morphology/blob/master/src/main/resources/Lexicon.xml)<br />
<br />
;See also<br />
* [[Latvian and Russian]]<br />
<br />
===[[Lithuanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-lit/blob/master/apertium-lit.lit.dix Lithuanian Monodix]''<br />
<br />
;Resources<br />
<br />
===[[Nogai]]===<br />
<br />
; Resources<br />
<br />
* [http://ksirov.ru/%D1%8F%D0%B7%D1%8B%D0%BA%D0%B8/%D0%BD%D0%BE%D0%B3%D0%B0%D0%B9%D1%81%D0%BA%D0%B8%D0%B9 Grammar Sketch and Russian-Nogai dictionary]<br />
<br />
===[[Ossetian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-oss/blob/master/apertium-oss.oss.dix Ossetian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] &mdash; quite nice and comprehensive.<br />
* [http://www.ossetic-studies.org/ Ossetic National Corpus]<br />
<br />
===[[Piemontese]]===<br />
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-it-pms.pms.dix Piemontese Monodix from SourceForge]'' <br />
'''This resource has not been migrated to GitHub from SVN<br />
'''<br />
<br />
;Resources<br />
<br />
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain<br />
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."<br />
<br />
===[[Portuguese]]===<br />
<br />
Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.<br />
<br />
;Resources<br />
<br />
* [http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/dicionarios.html Recursos Lexicais Português do Brasil]<br />
<br />
We believe it has a LGPL license.<br />
<br />
===[[Punjabi]]===<br />
<br />
; Resources<br />
<br />
* [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi lexicon]<br />
<br />
===[[Quechua]]===<br />
<br />
;Resources<br />
<br />
* http://www.runasimipi.org/<br />
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]])<br />
<br />
===[[Russian]]===<br />
<br />
:''Dictionary: [https://github.com/apertium/apertium-rus/blob/master/apertium-rus.rus.dix monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-pol-rus/blob/master/apertium-pol-rus.pol-rus.dix Polish-Russian]''<br />
:''Bidix: [https://github.com/apertium/apertium-rus-eng/blob/master/apertium-ru-en.ru.dix English-Russian]<br />
<br />
;Resources<br />
<br />
* http://www.alphadictionary.com/rusgrammar/<br />
* http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf<br />
* [http://www.cic.ipn.mx/~sidorov/rmorph/index.html Russian analyser] - non-free, Windows only<br />
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian]<br />
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.<br />
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries].<br />
* [http://www.lugattj.com/news.php?tid=1&ln=en Another Tajik--Russian dictionary]<br />
<br />
===[[Sanskrit]] '''संस्कृतम्'''===<br />
:''Dictionary: [https://github.com/apertium/apertium-san/blob/master/apertium-san.san.dix Sanskrit Monodix]<br />
<br />
;Resources<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln]<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/download.html Material available for download].<br />
<br />
===[[Slovakian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-slk/blob/master/apertium-slk.slk.dix Slovak Monodix]''<br />
<br />
;Resources<br />
<br />
* http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English)<br />
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)<br />
* http://www.angelfire.com/sk3/quality/Slovak_declension.html<br />
* http://www.juls.savba.sk/msj/<br />
<br />
===[[Thai]]===<br />
* https://github.com/veer66/Yaitron Yaitron English-Thai and Thai-English XML dictionary, license seems standard 4-clause<br />
<br />
===[[Urdu]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-urd/blob/master/apertium-urd.urd.dix Urdu Monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]''<br />
<br />
;Resources<br />
* http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ &mdash; GPL analyser of Urdu<br />
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system<br />
<br />
<br />
==Github Migration==<br />
<br />
For languages whose resources are not yet on Github, you can use [[apertium-init]] to make their corresponding repository and add the files from SVN to that repositiry. <br />
<br />
<br />
<br />
<br />
[[Category:Development]]<br />
[[Category:Repository]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Testvoc&diff=68072
Testvoc
2018-11-30T13:37:43Z
<p>Dharjunior: </p>
<hr />
<div>[[Test de vocabulaire|En français]]<br />
{{TOCD}}<br />
A '''testvoc''' is literally a test of vocabulary. At the most basic level, it just expands an {{sc|sl}} dictionary, and runs each possibly analysed [[lexical form]] through all the translation stages to see that for each possible input, a sensible translation in the {{sc|tl}}, without <code>#</code>, or <code>@</code> symbols is generated.<br />
<br />
However, as transfer rules may introduce errors that are not visible when translating single lexical units, a release-quality language pair also needs testvoc on phrases consisting of several lexical units. Often one can find a lot of the errors by running a large corpus (with all @, / or # symbols removed) through the translator, with debug symbols on, and grepping for [@#/]. <br />
: It would be nice however, with a script that testvoc'ed all possible transfer rule runs (without having to run all possible combinations of lexical units, which would take forever). One problems is that transfer rules can refer to not only tags, but lemmas; and that multi-stage transfer means you have to test fairly long sequences.<br />
<br />
==Trimmed testvoc==<br />
Most new Apertium pairs use automatically trimmed analysers from monolingual dependencies, e.g. with [[lt-trim]] if the analyser is lttoolbox-based.<br />
When using <code>lt-trim</code>, there's no need to testvoc the analyser→bidix step (the '@'-marks), since the analyser will only contain what the bidix contains.<br />
<br />
However, you still need to look for #'s and /'s with<br />
* Corpus testvoc to ensure your transfer rules are correct (see [[#Corpus testvoc]] below), and<br />
* Generation testvoc to ensure all the forms that are in both analyser and bidix also exist in your generator (see next section for real-life script).<br />
<br />
<br />
<small>Since the analyser dix file can be much larger than the trimmed analyser, testvoc scripts that don't take that into account will give false hits. That is, a command like <code>lt-expand complete-analyser.dix | lt-proc -b bidix.bin | apertium-transfer -b foo.t1x foo.t1x.bin | lt-proc -d gen.bin</code> will give lots of @'s that won't appear when running the real pipeline. The [[#Generation testvoc with lttoolbox analyser]] ignores any @ and assumes lt-trim just works.</small><br />
<br />
==Generation testvoc==<br />
<br />
===Generation testvoc with lttoolbox analyser===<br />
The script generation.sh in<br />
https://github.com/apertium/apertium-swe-dan/blob/master/dev/testvoc/generation.sh should work with any pipeline that uses lttoolbox on the analysis side. <br />
<br />
It tests that anything the analyser can produce will go through to generation without '/' or '#'-marks (that is, there is one and only one form generated for anything the analyser can produce). <br />
<br />
It doesn't test that the bidix contains everything the analyser has – we assume your Makefile uses lt-trim for that (all recent pairs with monolingual dependencies do).<br />
<br />
It also only tests single words seperated by periods – any generation problem that crops up with more context (typically due to transfer rules) will require a [[#Corpus testvoc]]. But it's a nice and fairly quick way to get most of your dictionary consistency issues.<br />
<br />
====HFST-based testvoc of lttoolbox analyser====<br />
Another way to testvoc a trimmed analyser, if you have [[HFST]] installed, is to replace <code>lt-expand ana.dix</code> in a simple testvoc pipeline with this sequence:<br />
<pre><br />
lt-print trimmed-analyser.bin |sed 's/ /@_SPACE_@/g' | hfst-txt2fst -e ε | hfst-project -p lower | hfst-fst2strings -c0<br />
</pre><br />
(The -c0 says to never follow cycles; you can also follow them at most once with -c1 etc., but this can take a while depending on how many {{tag|re}}'s you use.)<br />
<br />
If we call that command "expand", then the full testvoc pipeline would be something like<br />
<pre><br />
expand | sed 's/^/^/;s/$/$/' | apertium-pretransfer | apertium-transfer …bin …t1x | lt-proc -d …autogen.bin<br />
</pre><br />
<br />
which may be a more "complete" testvoc.<br />
<br />
Running https://github.com/apertium/apertium-swe-dan/blob/master/dev/testvoc/generation.sh with --hfst as the first argument will make it use this method.<br />
<br />
===Generation testvoc with HFST analyser===<br />
<br />
The Tatar-Bashkir language pair has a testvoc script for use with HFST, see https://github.com/apertium/apertium-tat-bak/blob/master/dev/inconsistency.sh which contains e.g.<br />
<pre><br />
hfst-fst2strings ../.deps/ba.LR-debug.hfst | sort -u | sed 's/:/%/g' | cut -f1 -d'%' | sed 's/^/^/g' | sed 's/$/$ ^.<sent>$/g' | tee $TMPDIR/tmp_testvoc1.txt |<br />
apertium-pretransfer|<br />
apertium-transfer ../apertium-tt-ba.ba-tt.t1x ../ba-tt.t1x.bin ../ba-tt.autobil.bin |<br />
apertium-transfer -n ../apertium-tt-ba.ba-tt.t2x ../ba-tt.t2x.bin | tee $TMPDIR/tmp_testvoc2.txt |<br />
hfst-proc -d ../ba-tt.autogen.hfst > $TMPDIR/tmp_testvoc3.txt<br />
paste -d _ $TMPDIR/tmp_testvoc1.txt $TMPDIR/tmp_testvoc2.txt $TMPDIR/tmp_testvoc3.txt | sed 's/\^.<sent>\$//g' | sed 's/_/ ---------> /g'<br />
</pre><br />
<br />
<br />
==Words in bidix but not in analyser==<br />
<br />
The script bidix-unknowns.sh in https://github.com/apertium/apertium-swe-dan/blob/master/dev/testvoc/ will look for entries in bidix that your analyser would never produce. It should work with any pipeline that uses lttoolbox on the analysis side.<br />
<br />
This is useful for making sure all your hard bidix work is actually useful. It may find lemmas that are completely missing from the analyser, or that simply have the wrong gender-tag or similar.<br />
<br />
<br />
==Corpus testvoc==<br />
<br />
Typically corpus testvoc consists of running a big corpus through your translator, and grepping for @'s, /'s or #'s. You can use a command like the below to first delete debug symbols from input (so you don't get false hits), run it through your translator (the "dgen" mode runs the generation step using lt-proc -d, which shows the full analysis when a word is not in the generator) and then grep for debug symbols (highlighting some context on either side just to make sure you see the symbol):<br />
<pre><br />
xzcat corpora/nno.xz | tr -d '#@/' | apertium -d . nno-nob-dgen | grep '.\{0,6\}[#@/].\{0,6\}'<br />
</pre><br />
<br />
<br />
However, sometimes you want to get to the original line in the corpus that gave that @ or #. <br />
<br />
This is one way of looking for @'s in a corpus while still being able to go easily find the original line:<br />
<pre><br />
$ cat corpus.txt | apertium-destxt | nl | apertium -f none -d . sme-nob-interchunk1 |grep '\^@' <br />
</pre><br />
<br />
<code>nl</code> will number each line in corpus.txt, inside the superblank that is at each line-end. So if we now see<br />
<br />
<pre><br />
276 ]^part<part>{^å<part>$}$ ^verb<SV><inf><loc-for><m>{^@ballat<V><inf>$}$<br />
...<br />
</pre><br />
<br />
we can get the original line like this:<br />
<pre><br />
$ sed -n '276p' corpus.txt<br />
</pre><br />
<br />
<br />
==Testvoc without trimming==<br />
The following is a very simple script illustrating testvoc for 1-stage transfer. The tee command saves the output from transfer, which includes words (actually lexical units) that passed successfully through transfer and words that got an @ prepended. The last file is output from generation, which includes words that were successfully generated, and words that have an # prepended (anything with an @ will also get a #):<br />
<pre><br />
MONODIX=apertium-nn-nb.nn.dix<br />
T1X=apertum-nn-nb.nn-nb.t1x<br />
BIDIXBIN=nn-nb.autobil.bin<br />
GENERATORBIN=nn-nb.autogen.bin<br />
ALPHABET="ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅabcdefghijklmnopqrstuvwxyzæøåcqwxzCQWXZéèêóòâôÉÊÈÓÔÒÂáàÁÀäÄöÖ" # from $MONODIX<br />
<br />
lt-expand ${MONODIX} | grep -e ':<:' -e '[$ALPHABET]:[$ALPHABET]' |\<br />
sed 's/:<:/%/g' | sed 's/:/%/g' | cut -f2 -d'%' | sed 's/^/^/g' | sed 's/$/$ ^.<sent><clb>$/g' |\<br />
apertium-transfer ${T1X} ${T1X}.bin ${BIDIXBIN} | tee after-transfer.txt |\<br />
lt-proc ${GENERATORBIN} > after-generation.txt<br />
</pre><br />
<br />
<br />
The following is a real-life <code>inconsistency.sh</code> script from <code>apertium-br-fr</code> that expands the dictionary of Breton and passes it through the translator:<br />
<pre><br />
TMPDIR=/tmp<br />
<br />
lt-expand ../apertium-br-fr.br.dix | grep -v '<prn><enc>' | grep -e ':<:' -e '\w:\w' |\<br />
sed 's/:<:/%/g' | sed 's/:/%/g' | cut -f2 -d'%' | sed 's/^/^/g' | sed 's/$/$ ^.<sent>$/g' |\<br />
tee $TMPDIR/tmp_testvoc1.txt |\<br />
apertium-pretransfer|\<br />
apertium-transfer ../apertium-br-fr.br-fr.t1x ../br-fr.t1x.bin ../br-fr.autobil.bin |\<br />
apertium-interchunk ../apertium-br-fr.br-fr.t2x ../br-fr.t2x.bin |\<br />
apertium-postchunk ../apertium-br-fr.br-fr.t3x ../br-fr.t3x.bin |\<br />
tee $TMPDIR/tmp_testvoc2.txt |\<br />
lt-proc -d ../br-fr.autogen.bin > $TMPDIR/tmp_testvoc3.txt<br />
<br />
paste -d _ $TMPDIR/tmp_testvoc1.txt $TMPDIR/tmp_testvoc2.txt $TMPDIR/tmp_testvoc3.txt |\<br />
sed 's/\^.<sent>\$//g' | sed 's/_/ ---------> /g'<br />
<br />
<br />
</pre><br />
<br />
<br />
==See also==<br />
* [[Automatically trimming a monodix]]<br />
* [[Why we trim]]<br />
* [[Finding errors in dictionaries]]<br />
<br />
[[Category:Terminology]]<br />
[[Category:Quality control]]<br />
[[Category:Development]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Apertium-service&diff=68071
Apertium-service
2018-11-30T13:36:00Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
<br />
{{Github-unmigrated-tool}}<br />
<br />
==Introduction==<br />
<br />
Apertium-service runs apertium translation pairs as a service and provides '''translate''' and '''detect''' (language recognition) capabilities over an '''XML-RPC''' interface, as well as '''REST''' and '''SOAP''' wrappers. <br />
<br />
The service is implemented as a multi-threaded C++ program which uses libapertium and liblttoolbox to run translation modes (and works by redirecting the C FILE streams within the libraries, instead of starting separate processes and [[NUL flushing]] – see [[Daemon]] for discussion). It also manages a ''resource pool'' of e.g. language pair data, both (eagerly) pre-allocated and (lazily) allocated at need, up to a high water mark.<br />
<br />
A paper describing the service, its interfaces and internal architecture can be found here: http://rua.ua.es/dspace/handle/10045/12031 . The development was also documented on the wiki page [[Apertium going SOA]].<br />
<br />
==Compiling and Installing==<br />
<br />
<br />
This document covers compilation and installation of apertium-service on Unix and Unix-like systems only, but it can be compiled also on other systems that meet the requirements.<br />
<br />
apertium-service, like many other Open Source projects, uses GNU buildtools (like autoconf and automake) to create a build environment.<br />
<br />
<br />
===Requirements===<br />
<br />
<br />
You need the following software installed:<br />
<br />
* liblttoolbox3 - library for lttoolbox, a toolbox for lexical processing, morphological analysis and generation of words.<br />
* libapertium3 - library for apertium, a Free / Open-Source machine translation system.<br />
<br />
* [http://software.wise-guys.nl/libtextcat/ libtextcat0] - a library implementing the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization". '''(optional)'''<br />
* libapertiumcombine1 - a library implementing a [[Multi-engine_translation_synthesiser|Multi-Engine Translation Synthesiser]] '''(optional)'''<br />
<br />
* [http://xmlrpc-c.sourceforge.net/ libxmlrpc-c3] - a lightweight RPC library based on XML and HTTP for C and C++. (>= 1.16.07-1)<br />
* libxml++2 - a C++ interface to libxml2, the GNOME XML library.<br />
* libboost - Boost C++ libraries are a collection of peer-reviewed, Open Source libraries that extend the functionality of C++. (>= 1.41.0)<br />
<br />
<br />
In particular, the following components from Boost C++ libraries are required:<br />
<br />
* libboost-thread - for portable C++ multi-threading.<br />
* libboost-filesystem - for portable filesystem operations in C++.<br />
* libboost-system - for dealing with system-specific error code values in C++.<br />
* libboost-date-time - for portable date/time operations in C++.<br />
* libboost-regex - regular expression library for C++. <br />
* libboost-program-options - program options library for C++.<br />
<br />
====Ubuntu====<br />
To install the xml and boost components on Ubuntu, use<br />
<pre><br />
sudo apt-get install libxml++2.6-dev libxmlrpc-c3-dev libboost-thread-dev libboost-filesystem-dev \<br />
libboost-system-dev libboost-date-time-dev libboost-regex-dev libboost-program-options-dev libcurl4-openssl-dev<br />
</pre><br />
<br />
====Arch Linux====<br />
To install the xml, boost and other components on Arch Linux, first do:<br />
<pre><br />
sudo pacman -S autoconf automake libtextcat libxml2 libxml++ boost <br />
</pre><br />
Note: libtextcat is optional. <br />
<br />
The other requirements are in AUR. If you have [http://aur.archlinux.org/packages.php?ID=5863 yaourt], you should be able to do:<br />
<pre><br />
sudo yaourt -S lttoolbox apertium xmlrpc-c-abyss<br />
</pre><br />
although you might first have to <code>sudo pacman -Rd xmlrpc-c</code> since that (outdated package) conflicts with xmlrpc-c-abyss (also, AMD64 users might have to use [http://aur.archlinux.org/packages.php?ID=32354 this patch]).<br />
<br />
To make sure apertium-service finds liblttoolbox, do<br />
<pre><br />
$ sudo vi /etc/ld.so.conf<br />
</pre><br />
and append<br />
<pre>/usr/lib</pre><br />
to the file, then run<br />
<pre>sudo ldconfig</pre><br />
<br />
===Checkout from SVN===<br />
'''Note:''' After Apertium's migration to GitHub, this tool is '''read-only''' on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see [[Migrating tools to GitHub]].<br />
<br />
apertium-service can be downloaded from the [[Using_SVN | Apertium SVN repository]] with the following command:<br />
<br />
<pre><br />
$ svn co http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-service<br />
</pre><br />
<br />
Immediately after a SVN checkout, you can generate the files required for building apertium-service with GNU autotools with the following command:<br />
<br />
<pre><br />
$ ./autogen.sh<br />
</pre><br />
<br />
===Configuring the source tree===<br />
<br />
<br />
The next step is to configure the apertium-service source tree for your particular platform and personal requirements. This is done using the script configure included in the root directory of the distribution. (Developers downloading an unreleased version of the apertium-service source tree will need to have autoconf and libtool installed and will need to run the script autogen.sh before proceeding with the next steps. This is not necessary for official releases.)<br />
<br />
To configure the source tree using all the default options, simply type ./configure. To change the default options, configure accepts a variety of variables and command line options.<br />
<br />
The most important option is the location --prefix where the apertium-service is to be installed later, because apertium-service has to be configured for this location to work correctly. More fine-tuned control of the location of files is possible with additional configure options.<br />
<br />
In addition, it is sometimes necessary to provide the configure script with extra information about the location of your compiler, libraries, or header files. This is done by passing either environment variables or command line options to configure. For more information, see the configure manual page.<br />
<br />
For a short impression of what possibilities you have, here is a typical example which compiles apertium-service for the installation tree /sw/pkg/apertium-service with a particular compiler and flags:<br />
<br />
<pre><br />
$ CC="pgcc" CFLAGS="-O2" \<br />
./configure --prefix=/sw/pkg/apertium-service<br />
</pre><br />
<br />
When configure is run it will take a few seconds to test for the availability of features on your system and build Makefiles which will later be used to compile the server.<br />
<br />
Details on all the different configure options are available on the configure manual page.<br />
<br />
<br />
===Build===<br />
<br />
<br />
Now you can build the various parts which form the apertium-service package by simply running the command:<br />
<br />
<pre><br />
$ make<br />
</pre><br />
<br />
A base configuration takes a few minutes to compile and the time will vary widely depending on your hardware.<br />
<br />
<br />
===Install===<br />
<br />
<br />
Now it's time to install the package under the configured installation PREFIX (see --prefix option above) by running:<br />
<br />
<pre><br />
$ make install<br />
</pre><br />
<br />
<br />
===Customise===<br />
Next, you can customise your apertium-service by editing the configuration files under PREFIX/etc/apertium-service/.<br />
<br />
<pre><br />
$ vi PREFIX/etc/apertium-service/configuration.xml<br />
</pre><br />
<br />
The users.xml is only if you want access control. The configuration.xml file is fairly straightforward,<br />
<br />
<pre><br />
<ApertiumServiceConfiguration><br />
<ServerPort>6173</ServerPort><br />
<ApertiumBase>/usr/local/share/apertium/modes</ApertiumBase><br />
</pre><br />
<br />
The supported fields of the configuration file are the following:<br />
<br />
* <code>ServerPort</code> sets the port where the XML-RPC service should listen on<br />
* <code>ApertiumBase</code> sets where it can find the modes files.<br />
<br />
* <code>HighWaterMark</code> sets the high water mark (the maximum number of object that can be allocated for each resource pool).<br />
<br />
* <code>MultiEngineMachineTranslation</code> is only if you want to enable the [[Multi-engine_translation_synthesiser|MEMT]] module (not yet stable). Within that, <br />
** <code><MonolingualDictionary srcLang="br" destLang="fr">/usr/local/share/apertium/apertium-br-fr/br-fr.automorf.bin</MonolingualDictionary></code> gives the path to an analyser for a given language. This analyser is used to lemmatise all the input sentences to the MEMT module to improve alignment. <br />
** <code><LanguageModel lang="de">/home/pasquale/gsoc/lm/europarl.de.blm</LanguageModel></code> gives an [[Moses|IRSTLM]] language model used to score the final hypotheses from the MEMT module.<br />
<br />
===Test===<br />
<br />
<br />
Now you can start your apertium-service by immediately running:<br />
<br />
<pre><br />
$ PREFIX/bin/apertium-service<br />
</pre><br />
<br />
and then you should be able to make your first XML-RPC query via URL http://localhost:port/RPC2.<br />
<br />
==Consuming the service==<br />
<br />
The following samples assume that the service you want to consume is located at the address <code>http://www.neuralnoise.com:6173/RPC2</code><br />
<br />
===Python===<br />
<br />
<pre><br />
#!/usr/bin/python<br />
# coding=utf-8<br />
# -*- encoding: utf-8 -*-<br />
<br />
import xmlrpclib;<br />
<br />
proxy = xmlrpclib.ServerProxy("http://www.neuralnoise.com:6173/RPC2");<br />
res = proxy.translate("Això no és una prova.", "ca", "en");<br />
print res["translation"];<br />
</pre><br />
<br />
Should give the output,<br />
<br />
<pre><br />
$ python test.py <br />
This is not a test.<br />
</pre><br />
<br />
Providing you have the ca-en pair installed. You can find which language pairs are detected with the method <code>languagePairs()</code>, for example,<br />
<br />
<pre><br />
proxy = xmlrpclib.ServerProxy("http://www.neuralnoise.com:6173/RPC2");<br />
<br />
for pair in proxy.languagePairs(): #{<br />
sys.stdout.write(pair["srcLang"] + "-" + pair["destLang"] + " ");<br />
#}<br />
print "";<br />
</pre><br />
<br />
===Ruby===<br />
<br />
<pre><br />
#!/usr/bin/ruby<br />
<br />
require 'xmlrpc/client'<br />
<br />
server = XMLRPC::Client.new("www.neuralnoise.com", "/RPC2", 6173)<br />
puts server.call("translate", "This is a test for the machine translation program.", "en", "es")["translation"]<br />
</pre><br />
<br />
===Perl===<br />
<br />
<pre><br />
#!/usr/bin/perl<br />
<br />
require RPC::XML;<br />
require RPC::XML::Client;<br />
<br />
my $client = RPC::XML::Client->new("http://www.neuralnoise.com:6173/RPC2");<br />
my $result = $client->send_request("translate", "This is a test for the machine translation service.", "en", "es");<br />
<br />
binmode(STDOUT, ":utf8");<br />
<br />
foreach my $key ( sort keys %{$result} ) {<br />
print $key . " = " . $result->value->{$key} . "\n";<br />
}<br />
</pre><br />
<br />
===Java===<br />
<br />
<pre><br />
import java.net.*;<br />
import java.util.*;<br />
<br />
import org.apache.xmlrpc.*;<br />
import org.apache.xmlrpc.client.*;<br />
<br />
public class TestCase {<br />
public static void main(String[] args) throws MalformedURLException, XmlRpcException {<br />
XmlRpcClientConfigImpl config = new XmlRpcClientConfigImpl();<br />
config.setServerURL(new URL("http://www.neuralnoise.com:6173/RPC2"));<br />
config.setBasicEncoding("UTF-8");<br />
<br />
XmlRpcClient client = new XmlRpcClient();<br />
client.setTransportFactory(new XmlRpcSunHttpTransportFactory(client));<br />
client.setConfig(config);<br />
<br />
Object[] params = { <br />
"This is a test for the machine translation service",<br />
"en", "es"};<br />
<br />
Map<String, String> ret = (Map<String, String>) client.execute("translate", params);<br />
System.out.println(ret.get("translation"));<br />
}<br />
}<br />
</pre><br />
<br />
===C++===<br />
<br />
<pre><br />
/*<br />
* g++ test.cc -o test -lxmlrpc_client++ -lxmlrpc++ -lxmlrpc_client -lxmlrpc_cpp -lxmlrpc_xmlparse -lxmlrpc_xmltok -lxmlrpc_server<br />
*/<br />
<br />
#include <cstdlib><br />
#include <string><br />
#include <iostream><br />
#include <xmlrpc-c/girerr.hpp><br />
#include <xmlrpc-c/base.hpp><br />
#include <xmlrpc-c/client_simple.hpp><br />
<br />
using namespace std;<br />
<br />
int<br />
main(int argc, char **) {<br />
try {<br />
string const serverUrl("http://www.neuralnoise.com:6173/RPC2");<br />
string const methodName("translate");<br />
<br />
xmlrpc_c::clientSimple myClient;<br />
xmlrpc_c::value result;<br />
<br />
myClient.call(serverUrl, methodName, "sss", &result, "test", "en", "es");<br />
<br />
map<string, xmlrpc_c::value> const resultStruct = xmlrpc_c::value_struct(result);<br />
map<string, xmlrpc_c::value>::const_iterator iter = resultStruct.find("translation");<br />
<br />
string ret = (string)xmlrpc_c::value_string(iter->second);<br />
<br />
cout << "Translation: " << ret << endl;<br />
} catch (exception const& e) {<br />
cerr << "Client threw error: " << e.what() << endl;<br />
} catch (...) {<br />
cerr << "Client threw unexpected error." << endl;<br />
}<br />
return 0;<br />
}<br />
</pre><br />
<br />
===Haskell===<br />
<br />
<pre><br />
import Network.XmlRpc.Client<br />
<br />
server = "http://www.neuralnoise.com:6173/RPC2"<br />
<br />
translate :: String -> String -> String -> String -> IO [(String,String)]<br />
translate url = remote url "translate"<br />
<br />
main = do<br />
let x = "This is a test for the machine translation service."<br />
y = "en"<br />
z = "es"<br />
ret <- translate server x y z<br />
print ret<br />
</pre><br />
<br />
===Emacs-Lisp===<br />
<pre><br />
; put http://www.emacswiki.org/emacs/xml-rpc.el into a member of load-path and<br />
(require 'xml-rpc)<br />
<br />
(defvar apertium-server "http://www.neuralnoise.com:6173/RPC2")<br />
<br />
(xml-rpc-method-call apertium-server 'translate "Això no és una prova." "ca" "en")<br />
<br />
<br />
(defun clean-apertium-pairs (pairs) ; optionally<br />
"Make a list of pairs ("from" . "to")."<br />
(setq apertium-pairs<br />
(mapcar (lambda (pair)<br />
(cons (cdr (assoc "destLang" pair))<br />
(cdr (assoc "srcLang" pair))))<br />
pairs)))<br />
<br />
(clean-apertium-pairs (xml-rpc-method-call<br />
apertium-server 'languagePairs))<br />
;; or async:<br />
(xml-rpc-method-call-async 'clean-apertium-pairs<br />
apertium-server 'languagePairs)<br />
</pre><br />
<br />
==Benchmarks==<br />
<br />
<br />
==See also==<br />
* [[Apertium going SOA]] - documentation of the development of <code>apertium-service</code><br />
* [[Apertium services]]<br />
<br />
[[Category:Services]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Calculating_coverage&diff=68047
Calculating coverage
2018-11-30T00:20:44Z
<p>Dharjunior: </p>
<hr />
<div>{{Github-migration-check}}<br />
<br />
[[Calculer la couverture|En français]]<br />
<br />
==Simple bidix-trimmed coverage testing==<br />
<br />
First install apertium-cleanstream:<br />
<br />
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-cleanstream<br />
cd apertium-cleanstream<br />
make<br />
sudo cp apertium-cleanstream /usr/local/bin<br />
<br />
'''Note - After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.'''<br />
<br />
Then save this as coverage.sh:<br />
<br />
#!/bin/bash<br />
mode=$1<br />
outfile=/tmp/$mode.clean<br />
apertium -d . $mode | apertium-cleanstream -n > $outfile<br />
total=$(grep -c '^\^' $outfile)<br />
unknown=$(grep -c '/\*' $outfile)<br />
bidix_unknown=$(grep -c '/@' $outfile)<br />
known_percent=$(calc -p "round( 100*($total-$unknown-$bidix_unknown)/$total, 3)")<br />
echo "$known_percent % known tokens ($unknown unknown, $bidix_unknown bidix-unknown of total $total tokens)"<br />
echo "Top unknown words:"<br />
grep '/[*@]' $outfile | sort | uniq -c | sort -nr | head<br />
<br />
And run it like<br />
<br />
cat asm.corpus | bash coverage.sh asm-eng-biltrans<br />
<br />
(The bidix-unknown count should always be 0 if your pair uses [[lt-trim|automatic analyser trimming]].)<br />
<br />
==TODO: paradigm-coverage (less naïve)==<br />
On an analysed corpus, we can sum frequencies into bins for each lemma+mainpos, so if the analysed corpus contains<br />
<br />
<pre><br />
musa/mus<n><f><sg><def>/muse<vblex><past><br />
mus/mus<n><f><sg><ind>/mus<n><f><pl><ind>/muse<vblex><imp><br />
musene/mus<n><f><pl><def><br />
</pre><br />
then output has<br />
<pre><br />
3 mus<n><f><br />
2 muse<vblex><br />
</pre><br />
and we can find paradigms that are likely to mess up disambiguation, or where we need to ensure that the bidix contains the highest-frequency paradigm (since the bidix is typically smaller than the monodix).<br />
<br />
We could also weight these numbers by number of unique forms in the pardef; if the verb pardef has 6 unique forms and then noun only 3, then the above output should be even more skewed:<br />
<pre><br />
0.33 mus<n><f><br />
0.75 muse<vblex><br />
</pre><br />
<br />
==Faster coverage testing with frequency lists==<br />
<br />
If words appear several times in your corpus, why bother analysing them several times? We can make a frequency list first and add together the frequencies. This script does some very stupid tokenisation and creates a frequency list:<br />
<br />
make-freqlist.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
if [[ -t 0 ]]; then<br />
echo "Expecting a corpus on stdin"<br />
exit 2<br />
fi<br />
<br />
tr '[:space:][:punct:]' '\n' | grep . | sort | uniq -c | sort -nr<br />
</pre><br />
And this script runs your analyser, summing up the frequencies:<br />
<br />
freqlist-coverage.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
set -e -u<br />
<br />
if [[ $# -eq 0 || -t 0 ]]; then<br />
echo "Expecting apertium arguments and a 'sort|uniq -c|sort -nr' style frequency list on stdin"<br />
echo "For example:"<br />
echo "\$ < spa.freqlist $0 -d . spa-morph"<br />
exit 2<br />
fi<br />
<br />
sed 's%^ *%<apertium-notrans>%;s% %</apertium-notrans>%;s%$% .%' |<br />
apertium -f html-noent "$@" |<br />
awk -F'</?apertium-notrans>| *\\^\\./\\.<sent><clb>\\$' '<br />
/[/][*@]/ {<br />
unknown+=$2<br />
if(!printed) print "Top unknown tokens:"<br />
if(++printed<10) print $2,$3<br />
next<br />
}<br />
{<br />
known+=$2<br />
}<br />
END {<br />
total=known+unknown<br />
known_pct=100*known/total<br />
unk_pct=100*unknown/total<br />
print known_pct" % known of total "total" tokens"<br />
}'<br />
</pre> <br />
<br />
Usage:<br />
<br />
$ chmod +x make-freqlist.sh freqlist-coverage.sh<br />
$ bzcat ~/corpora/nno.txt.bz2 |./make-freqlist.sh > nno.freqlist<br />
$ <nno.freqlist ./freqlist-coverage.sh -d ~/apertium-svn/languages/apertium-nno/ nno-morph<br />
<br />
==coverage.py==<br />
<br />
https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/coverage.py is a coverage script that wraps curl and bzcat (?)<br />
<br />
==See also==<br />
<br />
* [[Wikipedia dumps]]<br />
* [[Cleanstream]]<br />
<br />
[[Category:Documentation]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Calculating_coverage&diff=68046
Calculating coverage
2018-11-30T00:20:30Z
<p>Dharjunior: </p>
<hr />
<div>{{Github-migration-check}}<br />
<br />
[[Calculer la couverture|En français]]<br />
<br />
==Simple bidix-trimmed coverage testing==<br />
<br />
First install apertium-cleanstream:<br />
<br />
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-cleanstream<br />
cd apertium-cleanstream<br />
make<br />
sudo cp apertium-cleanstream /usr/local/bin<br />
<br />
'''<br />
Note - After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.'''<br />
<br />
Then save this as coverage.sh:<br />
<br />
#!/bin/bash<br />
mode=$1<br />
outfile=/tmp/$mode.clean<br />
apertium -d . $mode | apertium-cleanstream -n > $outfile<br />
total=$(grep -c '^\^' $outfile)<br />
unknown=$(grep -c '/\*' $outfile)<br />
bidix_unknown=$(grep -c '/@' $outfile)<br />
known_percent=$(calc -p "round( 100*($total-$unknown-$bidix_unknown)/$total, 3)")<br />
echo "$known_percent % known tokens ($unknown unknown, $bidix_unknown bidix-unknown of total $total tokens)"<br />
echo "Top unknown words:"<br />
grep '/[*@]' $outfile | sort | uniq -c | sort -nr | head<br />
<br />
And run it like<br />
<br />
cat asm.corpus | bash coverage.sh asm-eng-biltrans<br />
<br />
(The bidix-unknown count should always be 0 if your pair uses [[lt-trim|automatic analyser trimming]].)<br />
<br />
==TODO: paradigm-coverage (less naïve)==<br />
On an analysed corpus, we can sum frequencies into bins for each lemma+mainpos, so if the analysed corpus contains<br />
<br />
<pre><br />
musa/mus<n><f><sg><def>/muse<vblex><past><br />
mus/mus<n><f><sg><ind>/mus<n><f><pl><ind>/muse<vblex><imp><br />
musene/mus<n><f><pl><def><br />
</pre><br />
then output has<br />
<pre><br />
3 mus<n><f><br />
2 muse<vblex><br />
</pre><br />
and we can find paradigms that are likely to mess up disambiguation, or where we need to ensure that the bidix contains the highest-frequency paradigm (since the bidix is typically smaller than the monodix).<br />
<br />
We could also weight these numbers by number of unique forms in the pardef; if the verb pardef has 6 unique forms and then noun only 3, then the above output should be even more skewed:<br />
<pre><br />
0.33 mus<n><f><br />
0.75 muse<vblex><br />
</pre><br />
<br />
==Faster coverage testing with frequency lists==<br />
<br />
If words appear several times in your corpus, why bother analysing them several times? We can make a frequency list first and add together the frequencies. This script does some very stupid tokenisation and creates a frequency list:<br />
<br />
make-freqlist.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
if [[ -t 0 ]]; then<br />
echo "Expecting a corpus on stdin"<br />
exit 2<br />
fi<br />
<br />
tr '[:space:][:punct:]' '\n' | grep . | sort | uniq -c | sort -nr<br />
</pre><br />
And this script runs your analyser, summing up the frequencies:<br />
<br />
freqlist-coverage.sh:<br />
<pre><br />
#!/bin/bash<br />
<br />
set -e -u<br />
<br />
if [[ $# -eq 0 || -t 0 ]]; then<br />
echo "Expecting apertium arguments and a 'sort|uniq -c|sort -nr' style frequency list on stdin"<br />
echo "For example:"<br />
echo "\$ < spa.freqlist $0 -d . spa-morph"<br />
exit 2<br />
fi<br />
<br />
sed 's%^ *%<apertium-notrans>%;s% %</apertium-notrans>%;s%$% .%' |<br />
apertium -f html-noent "$@" |<br />
awk -F'</?apertium-notrans>| *\\^\\./\\.<sent><clb>\\$' '<br />
/[/][*@]/ {<br />
unknown+=$2<br />
if(!printed) print "Top unknown tokens:"<br />
if(++printed<10) print $2,$3<br />
next<br />
}<br />
{<br />
known+=$2<br />
}<br />
END {<br />
total=known+unknown<br />
known_pct=100*known/total<br />
unk_pct=100*unknown/total<br />
print known_pct" % known of total "total" tokens"<br />
}'<br />
</pre> <br />
<br />
Usage:<br />
<br />
$ chmod +x make-freqlist.sh freqlist-coverage.sh<br />
$ bzcat ~/corpora/nno.txt.bz2 |./make-freqlist.sh > nno.freqlist<br />
$ <nno.freqlist ./freqlist-coverage.sh -d ~/apertium-svn/languages/apertium-nno/ nno-morph<br />
<br />
==coverage.py==<br />
<br />
https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/coverage.py is a coverage script that wraps curl and bzcat (?)<br />
<br />
==See also==<br />
<br />
* [[Wikipedia dumps]]<br />
* [[Cleanstream]]<br />
<br />
[[Category:Documentation]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Testvoc&diff=68045
Testvoc
2018-11-30T00:16:33Z
<p>Dharjunior: </p>
<hr />
<div>[[Test de vocabulaire|En français]]<br />
{{Github-migration-check}}<br />
{{TOCD}}<br />
A '''testvoc''' is literally a test of vocabulary. At the most basic level, it just expands an {{sc|sl}} dictionary, and runs each possibly analysed [[lexical form]] through all the translation stages to see that for each possible input, a sensible translation in the {{sc|tl}}, without <code>#</code>, or <code>@</code> symbols is generated.<br />
<br />
However, as transfer rules may introduce errors that are not visible when translating single lexical units, a release-quality language pair also needs testvoc on phrases consisting of several lexical units. Often one can find a lot of the errors by running a large corpus (with all @, / or # symbols removed) through the translator, with debug symbols on, and grepping for [@#/]. <br />
: It would be nice however, with a script that testvoc'ed all possible transfer rule runs (without having to run all possible combinations of lexical units, which would take forever). One problems is that transfer rules can refer to not only tags, but lemmas; and that multi-stage transfer means you have to test fairly long sequences.<br />
<br />
==Trimmed testvoc==<br />
Most new Apertium pairs use automatically trimmed analysers from monolingual dependencies, e.g. with [[lt-trim]] if the analyser is lttoolbox-based.<br />
When using <code>lt-trim</code>, there's no need to testvoc the analyser→bidix step (the '@'-marks), since the analyser will only contain what the bidix contains.<br />
<br />
However, you still need to look for #'s and /'s with<br />
* Corpus testvoc to ensure your transfer rules are correct (see [[#Corpus testvoc]] below), and<br />
* Generation testvoc to ensure all the forms that are in both analyser and bidix also exist in your generator (see next section for real-life script).<br />
<br />
<br />
<small>Since the analyser dix file can be much larger than the trimmed analyser, testvoc scripts that don't take that into account will give false hits. That is, a command like <code>lt-expand complete-analyser.dix | lt-proc -b bidix.bin | apertium-transfer -b foo.t1x foo.t1x.bin | lt-proc -d gen.bin</code> will give lots of @'s that won't appear when running the real pipeline. The [[#Generation testvoc with lttoolbox analyser]] ignores any @ and assumes lt-trim just works.</small><br />
<br />
==Generation testvoc==<br />
<br />
===Generation testvoc with lttoolbox analyser===<br />
The script generation.sh in<br />
https://github.com/apertium/apertium-swe-dan/blob/master/dev/testvoc/generation.sh should work with any pipeline that uses lttoolbox on the analysis side. <br />
<br />
It tests that anything the analyser can produce will go through to generation without '/' or '#'-marks (that is, there is one and only one form generated for anything the analyser can produce). <br />
<br />
It doesn't test that the bidix contains everything the analyser has – we assume your Makefile uses lt-trim for that (all recent pairs with monolingual dependencies do).<br />
<br />
It also only tests single words seperated by periods – any generation problem that crops up with more context (typically due to transfer rules) will require a [[#Corpus testvoc]]. But it's a nice and fairly quick way to get most of your dictionary consistency issues.<br />
<br />
====HFST-based testvoc of lttoolbox analyser====<br />
Another way to testvoc a trimmed analyser, if you have [[HFST]] installed, is to replace <code>lt-expand ana.dix</code> in a simple testvoc pipeline with this sequence:<br />
<pre><br />
lt-print trimmed-analyser.bin |sed 's/ /@_SPACE_@/g' | hfst-txt2fst -e ε | hfst-project -p lower | hfst-fst2strings -c0<br />
</pre><br />
(The -c0 says to never follow cycles; you can also follow them at most once with -c1 etc., but this can take a while depending on how many {{tag|re}}'s you use.)<br />
<br />
If we call that command "expand", then the full testvoc pipeline would be something like<br />
<pre><br />
expand | sed 's/^/^/;s/$/$/' | apertium-pretransfer | apertium-transfer …bin …t1x | lt-proc -d …autogen.bin<br />
</pre><br />
<br />
which may be a more "complete" testvoc.<br />
<br />
Running https://github.com/apertium/apertium-swe-dan/blob/master/dev/testvoc/generation.sh with --hfst as the first argument will make it use this method.<br />
<br />
===Generation testvoc with HFST analyser===<br />
<br />
The Tatar-Bashkir language pair has a testvoc script for use with HFST, see https://github.com/apertium/apertium-tat-bak/blob/master/dev/inconsistency.sh which contains e.g.<br />
<pre><br />
hfst-fst2strings ../.deps/ba.LR-debug.hfst | sort -u | sed 's/:/%/g' | cut -f1 -d'%' | sed 's/^/^/g' | sed 's/$/$ ^.<sent>$/g' | tee $TMPDIR/tmp_testvoc1.txt |<br />
apertium-pretransfer|<br />
apertium-transfer ../apertium-tt-ba.ba-tt.t1x ../ba-tt.t1x.bin ../ba-tt.autobil.bin |<br />
apertium-transfer -n ../apertium-tt-ba.ba-tt.t2x ../ba-tt.t2x.bin | tee $TMPDIR/tmp_testvoc2.txt |<br />
hfst-proc -d ../ba-tt.autogen.hfst > $TMPDIR/tmp_testvoc3.txt<br />
paste -d _ $TMPDIR/tmp_testvoc1.txt $TMPDIR/tmp_testvoc2.txt $TMPDIR/tmp_testvoc3.txt | sed 's/\^.<sent>\$//g' | sed 's/_/ ---------> /g'<br />
</pre><br />
<br />
<br />
==Words in bidix but not in analyser==<br />
<br />
The script bidix-unknowns.sh in https://github.com/apertium/apertium-swe-dan/blob/master/dev/testvoc/ will look for entries in bidix that your analyser would never produce. It should work with any pipeline that uses lttoolbox on the analysis side.<br />
<br />
This is useful for making sure all your hard bidix work is actually useful. It may find lemmas that are completely missing from the analyser, or that simply have the wrong gender-tag or similar.<br />
<br />
<br />
==Corpus testvoc==<br />
<br />
Typically corpus testvoc consists of running a big corpus through your translator, and grepping for @'s, /'s or #'s. You can use a command like the below to first delete debug symbols from input (so you don't get false hits), run it through your translator (the "dgen" mode runs the generation step using lt-proc -d, which shows the full analysis when a word is not in the generator) and then grep for debug symbols (highlighting some context on either side just to make sure you see the symbol):<br />
<pre><br />
xzcat corpora/nno.xz | tr -d '#@/' | apertium -d . nno-nob-dgen | grep '.\{0,6\}[#@/].\{0,6\}'<br />
</pre><br />
<br />
<br />
However, sometimes you want to get to the original line in the corpus that gave that @ or #. <br />
<br />
This is one way of looking for @'s in a corpus while still being able to go easily find the original line:<br />
<pre><br />
$ cat corpus.txt | apertium-destxt | nl | apertium -f none -d . sme-nob-interchunk1 |grep '\^@' <br />
</pre><br />
<br />
<code>nl</code> will number each line in corpus.txt, inside the superblank that is at each line-end. So if we now see<br />
<br />
<pre><br />
276 ]^part<part>{^å<part>$}$ ^verb<SV><inf><loc-for><m>{^@ballat<V><inf>$}$<br />
...<br />
</pre><br />
<br />
we can get the original line like this:<br />
<pre><br />
$ sed -n '276p' corpus.txt<br />
</pre><br />
<br />
<br />
==Testvoc without trimming==<br />
The following is a very simple script illustrating testvoc for 1-stage transfer. The tee command saves the output from transfer, which includes words (actually lexical units) that passed successfully through transfer and words that got an @ prepended. The last file is output from generation, which includes words that were successfully generated, and words that have an # prepended (anything with an @ will also get a #):<br />
<pre><br />
MONODIX=apertium-nn-nb.nn.dix<br />
T1X=apertum-nn-nb.nn-nb.t1x<br />
BIDIXBIN=nn-nb.autobil.bin<br />
GENERATORBIN=nn-nb.autogen.bin<br />
ALPHABET="ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅabcdefghijklmnopqrstuvwxyzæøåcqwxzCQWXZéèêóòâôÉÊÈÓÔÒÂáàÁÀäÄöÖ" # from $MONODIX<br />
<br />
lt-expand ${MONODIX} | grep -e ':<:' -e '[$ALPHABET]:[$ALPHABET]' |\<br />
sed 's/:<:/%/g' | sed 's/:/%/g' | cut -f2 -d'%' | sed 's/^/^/g' | sed 's/$/$ ^.<sent><clb>$/g' |\<br />
apertium-transfer ${T1X} ${T1X}.bin ${BIDIXBIN} | tee after-transfer.txt |\<br />
lt-proc ${GENERATORBIN} > after-generation.txt<br />
</pre><br />
<br />
<br />
The following is a real-life <code>inconsistency.sh</code> script from <code>apertium-br-fr</code> that expands the dictionary of Breton and passes it through the translator:<br />
<pre><br />
TMPDIR=/tmp<br />
<br />
lt-expand ../apertium-br-fr.br.dix | grep -v '<prn><enc>' | grep -e ':<:' -e '\w:\w' |\<br />
sed 's/:<:/%/g' | sed 's/:/%/g' | cut -f2 -d'%' | sed 's/^/^/g' | sed 's/$/$ ^.<sent>$/g' |\<br />
tee $TMPDIR/tmp_testvoc1.txt |\<br />
apertium-pretransfer|\<br />
apertium-transfer ../apertium-br-fr.br-fr.t1x ../br-fr.t1x.bin ../br-fr.autobil.bin |\<br />
apertium-interchunk ../apertium-br-fr.br-fr.t2x ../br-fr.t2x.bin |\<br />
apertium-postchunk ../apertium-br-fr.br-fr.t3x ../br-fr.t3x.bin |\<br />
tee $TMPDIR/tmp_testvoc2.txt |\<br />
lt-proc -d ../br-fr.autogen.bin > $TMPDIR/tmp_testvoc3.txt<br />
<br />
paste -d _ $TMPDIR/tmp_testvoc1.txt $TMPDIR/tmp_testvoc2.txt $TMPDIR/tmp_testvoc3.txt |\<br />
sed 's/\^.<sent>\$//g' | sed 's/_/ ---------> /g'<br />
<br />
<br />
</pre><br />
<br />
<br />
==See also==<br />
* [[Automatically trimming a monodix]]<br />
* [[Why we trim]]<br />
* [[Finding errors in dictionaries]]<br />
<br />
[[Category:Terminology]]<br />
[[Category:Quality control]]<br />
[[Category:Development]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Specific_resources_per_language&diff=68044
Specific resources per language
2018-11-30T00:02:25Z
<p>Dharjunior: </p>
<hr />
<div>{{Github-migration-check}}<br />
{{TOCD}}<br />
The incubator can be found in the 'incubator' column in https://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work.<br />
<br />
<br />
==Specific resources per language==<br />
<br />
Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.<br />
<br />
See also the individual language pages. <br />
<br />
===[[Albanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-sqi/blob/master/apertium-sqi.sqi.dix Albanian Monodix]''<br />
<br />
;Resources<br />
* http://www.albanianoverview.com/grammar.htm<br />
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free)<br />
<br />
===[[Armenian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-hye/blob/master/apertium-hye.hye.dix Armenian Monodix]''<br />
<br />
;Resources<br />
<br />
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons<br />
<br />
===[[Assamese and Hindi]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-as-hi/blob/91f3c38b0c636deb620cbd27725d63dd763c5f0b/apertium-as-hi.hi.dix Assemese-Hindi Bidix]''<br />
<br />
<br />
--- Anusuya<br />
<br />
===[[Belarusian]]=== <br />
<br />
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language]<br />
<br />
===[[Bengali]]===<br />
<br />
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali <br />
* http://anubadok.sf.net/ -- See above<br />
<br />
===[[Bulgarian]]===<br />
:''Dictionary: [https://raw.githubusercontent.com/apertium/apertium-bul/master/apertium-bul.bul.dix Bulgarian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology]<br />
<br />
===[[Cornish]]===<br />
<br />
:''Dictionary: [https://sourceforge.net/projects/apertium/files/apertium-cy-en/0.1.0/ Cornish Monodix from SourceForge]''<br />
<br />
'''This resource has not been migrated to GitHub from SVN<br />
'''<br />
<br />
;Resources<br />
<br />
* [http://www.cornishtranslator.com/ Cornish Translator]<br />
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist]<br />
<br />
===[[Czech]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-cs.cs.dix.xml apertium-pl-cs.cs.dix.xml]'' <br />
'''This resource has not been migrated to GitHub from SVN<br />
'''<br />
<br />
:''Dictionary: [https://github.com/apertium/apertium-eo-cs/blob/c16fa21194a285941307a68e420c194a1825ebc3/apertium-eo-cs.eo-cs.dix Czech-Esperanto Bidix]''<br />
:''Dictionary: [https://github.com/apertium/apertium-cs-sl/tree/062fa172705e16f77302a8096df3733581079fb8 Czech-Slovenian Bidix]''<br />
;Resources<br />
<br />
* [http://nlp.fi.muni.cz/nlp/aisa/NlpCz/Frekvence_slov_lemmat.html Most frequent words] Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords<br />
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links]<br />
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]<br />
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source<br />
<br />
===[[Faroese]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-fao/blob/master/apertium-fao.fao.dix Faroese Monodix]''<br />
<br />
;Resources<br />
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ]<br />
* [https://github.com/apertium/apertium-fao-isl/blob/master/apertium-fao-isl.fao-isl.rlx Faroese Constraint Grammar]<br />
* [http://www.archive.org/details/frskanthologi00denmgoog Faroese-Danish dictionary from 1886]<br />
<br />
===[[Finnish]]===<br />
{{see-also|Omorfi}}<br />
;Resources<br />
<br />
* http://kaino.kotus.fi/sanat/nykysuomi/ &mdash; full form list for Finnish -- LGPL<br />
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language]<br />
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]<br />
<pre><br />
s = lemma<br />
hn = homonymy ref<br />
t = inflection info<br />
tn = inflection number (referring to table)<br />
av = ref to consonant gradation<br />
</pre><br />
<br />
===[[German and English]]===<br />
<br />
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for [http://www-user.tu-chemnitz.de/~fri/ding/ "Ding: A Dictionary LookUp program"] (version 1.5 2007-04-09) from Frank Richter, [http://tu-chemnitz.de Technische Universität Chemnitz]<br />
<br />
:''[https://github.com/apertium/apertium-eng-deu/blob/master/apertium-eng-deu.eng-deu.dix German-English Dictionary]''<br />
<br />
===[[Greek]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-ell/blob/master/apertium-ell.ell.dix Greek Monodix] <br />
:''Greek-English Dictionary: [https://github.com/apertium/apertium-ell-eng/blob/master/apertium-ell-eng.eng.dix Greek-English Dictionary]<br />
<br />
;Resources<br />
<br />
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/<br />
<br />
===[[Hebrew]]===<br />
<br />
;Resources<br />
<br />
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL<br />
* http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password<br />
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL<br />
* http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown<br />
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL<br />
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL<br />
<br />
===[[Hindi]]===<br />
{{see-also|Hindi}}<br />
<br />
;Resources<br />
<br />
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2<br />
<br />
* https://github.com/unhammer/apertium-en-hi/blob/master/apertium-en-hi.en.dix <br />
* https://github.com/apertium/apertium-hin/blob/master/apertium-hin.hin.dix <br />
* https://github.com/apertium/apertium-urd-hin/blob/master/dev/en-hi-ur.list<br />
* https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix<br />
<br />
<br />
<br />
===[[Iranian Persian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-pes/blob/master/apertium-pes.pes.dix Persian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian]<br />
<br />
===[[Ingush]]===<br />
<br />
; Resources<br />
<br />
* [http://www.linguistics.berkeley.edu/~ingush/database.html Lexical database] (non-free)<br />
* [http://books.google.com/books?id=J7wqVHeRWdwC&pg=PA5&lpg=PA5&dq=ingush+father&source=bl&ots=N8TDZudzGZ&sig=JO9X_Y9gio7dUhZWeyZX7j17iPw&hl=ca&ei=vfq4TM6CH86OjAfO94XaDg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CB8Q6AEwAg#v=onepage&q=ingush%20father&f=false Ingush-English dict] (non-free)<br />
<br />
===[[Latvian]]===<br />
;Resources<br />
* https://github.com/PeterisP/morphology GPL full-form dictionary (https://github.com/PeterisP/morphology/blob/master/src/main/resources/Lexicon.xml)<br />
<br />
;See also<br />
* [[Latvian and Russian]]<br />
<br />
===[[Lithuanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-lit/blob/master/apertium-lit.lit.dix Lithuanian Monodix]''<br />
<br />
;Resources<br />
<br />
===[[Nogai]]===<br />
<br />
; Resources<br />
<br />
* [http://ksirov.ru/%D1%8F%D0%B7%D1%8B%D0%BA%D0%B8/%D0%BD%D0%BE%D0%B3%D0%B0%D0%B9%D1%81%D0%BA%D0%B8%D0%B9 Grammar Sketch and Russian-Nogai dictionary]<br />
<br />
===[[Ossetian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-oss/blob/master/apertium-oss.oss.dix Ossetian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] &mdash; quite nice and comprehensive.<br />
* [http://www.ossetic-studies.org/ Ossetic National Corpus]<br />
<br />
===[[Piemontese]]===<br />
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-it-pms.pms.dix Piemontese Monodix from SourceForge]'' <br />
'''This resource has not been migrated to GitHub from SVN<br />
'''<br />
<br />
;Resources<br />
<br />
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain<br />
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."<br />
<br />
===[[Portuguese]]===<br />
<br />
Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.<br />
<br />
;Resources<br />
<br />
* [http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/dicionarios.html Recursos Lexicais Português do Brasil]<br />
<br />
We believe it has a LGPL license.<br />
<br />
===[[Punjabi]]===<br />
<br />
; Resources<br />
<br />
* [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi lexicon]<br />
<br />
===[[Quechua]]===<br />
<br />
;Resources<br />
<br />
* http://www.runasimipi.org/<br />
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]])<br />
<br />
===[[Russian]]===<br />
<br />
:''Dictionary: [https://github.com/apertium/apertium-rus/blob/master/apertium-rus.rus.dix monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-pol-rus/blob/master/apertium-pol-rus.pol-rus.dix Polish-Russian]''<br />
:''Bidix: [https://github.com/apertium/apertium-rus-eng/blob/master/apertium-ru-en.ru.dix English-Russian]<br />
<br />
;Resources<br />
<br />
* http://www.alphadictionary.com/rusgrammar/<br />
* http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf<br />
* [http://www.cic.ipn.mx/~sidorov/rmorph/index.html Russian analyser] - non-free, Windows only<br />
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian]<br />
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.<br />
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries].<br />
* [http://www.lugattj.com/news.php?tid=1&ln=en Another Tajik--Russian dictionary]<br />
<br />
===[[Sanskrit]] '''संस्कृतम्'''===<br />
:''Dictionary: [https://github.com/apertium/apertium-san/blob/master/apertium-san.san.dix Sanskrit Monodix]<br />
<br />
;Resources<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln]<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/download.html Material available for download].<br />
<br />
===[[Slovakian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-slk/blob/master/apertium-slk.slk.dix Slovak Monodix]''<br />
<br />
;Resources<br />
<br />
* http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English)<br />
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)<br />
* http://www.angelfire.com/sk3/quality/Slovak_declension.html<br />
* http://www.juls.savba.sk/msj/<br />
<br />
===[[Thai]]===<br />
* https://github.com/veer66/Yaitron Yaitron English-Thai and Thai-English XML dictionary, license seems standard 4-clause<br />
<br />
===[[Urdu]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-urd/blob/master/apertium-urd.urd.dix Urdu Monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]''<br />
<br />
;Resources<br />
* http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ &mdash; GPL analyser of Urdu<br />
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system<br />
<br />
<br />
==Github Migration==<br />
<br />
For languages whose resources are not yet on Github, you can use [[apertium-init]] to make their corresponding repository and add the files from SVN to that repositiry. <br />
<br />
<br />
<br />
<br />
[[Category:Development]]<br />
[[Category:Repository]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Apertium-viewer&diff=68043
Apertium-viewer
2018-11-29T23:59:35Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
Apertium-viewer is a tool to view and edit the output of the various stages of an apertium translation.<br />
<br />
The various stages update ''while you type'', and a change made in any one pane updates the subsequent stages.<br />
<br />
[[Image:Screenshot-jApertiumView.png|thumb|400px|right|A screen shot. Some stages are hidden (split panes have been moved together)]]<br />
<br />
Such a tool is invaluable when you want to work with a language pair.<br />
<br />
== Installing and running apertium-viewer ==<br />
<br />
Make sure you have Java 7 or later installed<br />
<br />
Download apertium-viewer.jar from [https://github.com/apertium/apertium-viewer/releases github] and save it to your hard drive. Double-click on apertium-viewer.jar (or right click on it and open with Java Runtime)<br />
<br />
Or, from the command line, type:<br />
<pre><br />
wget https://github.com/apertium/apertium-viewer/releases/download/2.5.4/apertium-viewer.jar -O apertium-viewer.jar<br />
java -Xmx500m -jar apertium-viewer.jar <br />
</pre><br />
<br />
(the parameter -Xmx500m is only neccesary if you want to edit very large dictionary files)<br />
<br />
== Testing unreleased language pairs from subversion ==<br />
At startup Apertium-viewer will scan for languages installed on the system, but you really need to install your own language pairs anywhere to use it. Just compile the language pair with 'make' and point to the .mode file generated: Choose File | Load mode and select the mode file from the language pair.<br />
[[Image:Screenshot-jApertiumView-OpenMode.png|thumb|300px|right|Opening a mode file]]<br />
<br />
The [[Language pair packages#List of ready-to-use packages|online language pairs]] are also supported. You can choose any of the 24 online pairs that are available and work with them as if they where installed locally, even if they aren't.<br />
<br />
<br />
==Keyboard shortcuts==<br />
* Alt-U: Set/unset 'mark unknown words'<br />
* Alt-I: Fit text: Automatically resising panes. Stages with unchanged text are automatically collapsed.<br />
* Alt-C: Copy All: Puts text for all stages into clipboard<br />
* Alt-S: Hide/Show commands (for clearer view)<br />
* Ctrl-0/Ctrl-1 brings focus to the first pane (input), <br />
* Ctrl-2 brings focus to the to the second pane (etc). <br />
* Ctrl-9 brings focus to the last pane (output). It autoscrolls to make the panes fully visible.<br />
* Ctrl-Pgup, Ctrl-PgDn: Cycle throgh the text panes<br />
* Ctrl-Z/Ctrl-Y Undo/redo on a per text-pane/stage basis<br />
* Ctrl-T: Make test case: Text can be copied directly into a [[Regression testing]] wiki page (also using Tools | Make Test Case...).<br />
* Ctrl-I: Import Wiki text case<br />
<br />
<br />
==Features==<br />
<br />
* Syntax highlighting. If a surface form has an ambigious analysis it's shown in red. If you click on an alternative it is selected (basically, between / /) and can be removed it pressing Delete key.<br />
[[Image:Screenshot-jApertiumView-3.png|thumb|400px|right|A click on an ambigious analysis selects one possibility (press Del to delete it). Also, the freeze button is shown.]]<br />
* Views can be frozen/paused to not propagate changes<br />
* Zoom button to get a detached window (particularly input and output windows).<br />
[[Image:Screenshot-Apertium-viewer-1.png|thumb|400px|right| When the text is the same as on the former stage it is shown with a yellow background. Commands have been hidden for a clearer view. <br />
Coloring scheme for version 1.4 is shown]]<br />
* Language pairs can be tested directly from the Github source directory, without installing them ('make install'). Unlike [Apertium-view], it doesn't use dbus. Rather you can just directly point to a mode file and use it.<br />
* [[Language pair packages#List of ready-to-use packages|Online language pairs]] can be used within the application without the need of having them locally.<br />
<br />
<br />
=== Version 1.3 (dec 2008) ===<br />
<br />
[[Image:Wiki test case paste.png|thumb|400px|right|Apertium-viewer showing a Wiki [[Regression testing]] case text ready to be pasted.]]<br />
<br />
[[Image:Apertium-Viewer_Wiki_test_case_import.png|thumb|400px|right|Apertium-viewer import of a Wiki [[Regression testing]] case.]]<br />
<br />
* Up to 10 texts can be stored for later use<br />
* Text field with keyboard focus is highlighted<br />
<br />
=== Version 1.4 (apr 2010) ===<br />
<br />
* Much improved highlighting: Different colors for ambigious and unrecognized words, and for chunks<br />
* A "Hide intermediate" button hides all but input and output text<br />
<br />
=== Version 1.5 (nov 2010) ===<br />
<br />
* Option to ignore error messages from commands (stderr) to make it usable for Gramtrans stuff<br />
<br />
=== Version 2.0 (aug 2012) ===<br />
<br />
* Completely based on [[lttoolbox-java]]. This removes the requirement of a local Apertium installation, and offers a much higher translation speed. External processing can still be enabled in the options.<br />
* Support for the 25 [[Language pair packages#List of ready-to-use packages|online language pairs]]. All these pairs can be used within the application without the need of having them locally.<br />
* Full and meaningful names for the modes (for instance, "Basque → Spanish" instead of "eu-es").<br />
<br />
=== Version 2.1 (april 2015) ===<br />
<br />
* More robust and user friendly startup and UI<br />
* Automatically store separate input text for each source language<br />
* One big JAR file (easier than the dist/lib/ folder). Java Web start is dropped as it wont work with self-signed certificates.<br />
<br />
=== Version 2.3 (may 2015) ===<br />
<br />
* If you switch to a new language, a bundled example phrase is shown<br />
* Bugfixed Autofit and Hide intermediate<br />
* It's become a development platform! You can easily view/edit the concerned dictionaries and compile from within the tool!<br />
<br />
For developers that installed pairs from SVN source:<br />
* Click on a command to edit the source code.<br />
* The tool validates XML dictionary and transfer files.<br />
* After a change, you can recompile the pair and immediately see the result<br />
<br />
=== Version 2.4 (may 2015) ===<br />
<br />
* Support for trace of transfer/interchunk - with links to the applied rules - also works for online modes<br />
* Support for editing the .lexc source file from a HFST binary file<br />
* Add buttons to easily switch between Java and C++ version<br />
<br />
=== Version 2.5 (may 2015) ===<br />
<br />
* Easy tool to download and compile language pair from SVN (using [https://github.com/unhammer/apertium-get apertium-get])<br />
* "Edit | Search for development language pairs" will find most language pairs on your system so you dont have to load modes<br />
* List of modes can be arbitrarily large and contain commments<br />
<br />
<br />
=== Version 2.5.2 (august 2015) ===<br />
<br />
* Option to prefer edit source code in external editor<br />
* About box: Show Environment variables and other stuff that might help locating problems<br />
* More userfriendly Download menu<br />
* Automatic checks if a new version of Apertium-viewer is available<br />
* Lots of bugfixes and smaller improvements<br />
<br />
=== Version 2.5.3 (july 2017) ===<br />
<br />
* Fix: Mode lines with quotes and spaces are now handled correctly<br />
<br />
=== Version 2.5.4 (july 2018) ===<br />
<br />
* Fix online language pairs (path had moved)<br />
<br />
<br />
== Troubleshooting ==<br />
<br />
=== Troubleshooting if it won't start ===<br />
<br />
If the wiewer wont start up and you get something like this in the console<br />
<pre><br />
Unregognized parameter: ?<br />
LTProc3.2j: process a stream with a letter transducer<br />
USAGE: LTProc [-c] [-a|-g|-n|-d|-b|-p|-s|-t] fst_file [input_file [output_file]]<br />
</pre><br />
<br />
then it means that you've hit an internal bug that prevents the viewer from starting: The viewer is internally using lttoolbox-java for processing (when the viewer starts, it will try to use lttoolbox-java on the last used language pair, but if that pair is using an option unkown to lttoolbox-java, then the program will EXIT, making you unable to switch to another mode!).<br />
<br />
The solution <br />
A) either is to delete a preferences file that apertium-viewer uses to remember the last used pair.<br />
On Linux the file to delete would be:<br />
<br />
rm -f ~/.java/.userPrefs/apertiumview/prefs.xml<br />
<br />
B) remove the problematic .mode. You can do so by uninstalling language pairs.<br />
<br />
===OSX troubleshooting ===<br />
Delete the preferences file: <br />
<br />
Users/<yourusername>/Library/Preferences/com.apple.java.util.prefs.plist<br />
<br />
If this still doesn't work try recompiling the program from source (run, ant run)<br />
--[[User:Jonasfromseier|Jonasfromseier]] 17:09, 6 May 2013 (UTC)<br />
<br />
=== Mac users ===<br />
Many modern macs come with an old JDK 1.6 or earlier. Make sure JDK 8 is installed and paste<br />
<br />
'/Library/Internet Plug-ins/JavaAppletPlugin.plugin/Contents/Home/bin/java' -jar ~/apertium-viewer.jar<br />
<br />
into the terminal<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Getting, compiling and running apertium-viewer from source ==<br />
<br />
<br />
Check out the source code (Netbeans project) from https://github.com/apertium/apertium-viewer. <br />
<br />
<pre><br />
git clone https://github.com/apertium/apertium-viewer<br />
cd apertium-viewer<br />
ant run<br />
</pre><br />
<br />
To run it's easiest just to type 'ant run' or use Netbeans to compile.<br />
You might need to specify where to look for JDK, like:<br />
<pre><br />
ant -Dplatforms.default_platform.home=/usr run<br />
</pre><br />
<br />
You need a copy of http://wiki.apertium.org/wiki/Lttoolbox-java (put lttooolbox.jar in lib/ or link the projects)<br />
<br />
== TODO ==<br />
<br />
<br />
<br />
See development in https://github.com/apertium/apertium-viewer/commits/master<br />
<br />
==== Feature requests/bugs ====<br />
<br />
* Even if a file is saved, the editor sometimes still warns about unsaved changes<br />
<br />
* Create a dedicated lexer (see http://jflex.de/) for our dictionary format<br />
<br />
* Polish the editor, create better autocompletion<br />
<br />
* Integrate tools from apertium-dixtools<br />
<br />
* If it can't auto-find the source, maybe it could ask for (and store) the location of the source? (Should probably also be editable for the auto-found sources, in case it finds the wrong file.). Most of the time I find the files, and if not I try some 'desperate searches' using wildcards. I'll do a select list of I get to the 'desperate' step and remember the decision. The problem is, what if the user chooses the wrong file?<br />
I would also need an option to select another file..<br />
<br />
* If I use Ctrl-F to search a word, and then click in the editor with the mouse, I jump back to where I was before searching<br />
<br />
==== Feature requests/bugs Gavin ====<br />
* make files available from UNTRIMMED as well as DGEN e.g. eng-sco dictionary, i.e. if file is in subdir .deps/ then seach for a source file for the binary file like if it was in the parent directory<br />
* Show Java home and temporary dirs in About, to help people fix different things<br />
* Modes combobox is still garbled on my PC<br />
* Stack trace<br />
java.lang.IndexOutOfBoundsException: Index: 2103, Size: 360 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apertium.lttoolbox.Alphabet.decode(Alphabet.java:394) at org.apertium.lttoolbox.process.TransducerExe.loadNode(TransducerExe.java:156) at org.apertium.lttoolbox.process.TransducerExe.getNode(TransducerExe.java:131) at org.apertium.lttoolbox.process.Node.transitions_getIterator(Node.java:76) at org.apertium.lttoolbox.process.State.epsilonClosure(State.java:320) at org.apertium.lttoolbox.process.State.step(State.java:353) at org.apertium.lttoolbox.process.State.step_case(State.java:372) at org.apertium.lttoolbox.process.FSTProcessor.analysis(FSTProcessor.java:915) at org.apertium.lttoolbox.LTProc.doMain(LTProc.java:296) at org.apertium.pipeline.Dispatcher.doLTProc(Dispatcher.java:180) at org.apertium.pipeline.Dispatcher.dispatch(Dispatcher.java:264) at apertiumview.Pipeline$PipelineTask.run(Pipeline.java:129) at apertiumview.Pipeline$1.run(Pipeline.java:40)<br />
<br />
==Related software==<br />
<br />
* [[Apertium-view]] is a simpler version of the same program and coded in Python instead of Java and requires dbus and that you install your language pairs.<br />
* [[Apertium-view.sh]] is a short shell script that just displays output from all parts of the pipeline, no interactive features<br />
* [[Apertium-tolk]] is similar to, but much simpler than Apertium-viewer. It only has an input window and an output window. Where Apertium-viewer is aimed at developers, Apertium-tolk is intended to be as user friendly as possible.<br />
<br />
[[Category:Tools]]<br />
[[Category:User interfaces]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Beginner%27s_Constraint_Grammar_HOWTO&diff=68042
Beginner's Constraint Grammar HOWTO
2018-11-29T23:54:39Z
<p>Dharjunior: </p>
<hr />
<div>[[Installation et fonctionnement de Constraint Grammar|En français]]<br />
<br />
''The installation part for Apertium and language pairs described below refer to Ubuntu distribution. For others Linux distributions or others operating systems, let see the general [[Installation]] page''.<br />
<br />
==Download==<br />
<br />
;Apertium<br />
<br />
Sourced from [[Install Apertium core using packaging]]<br />
First, remove any Apertium packages you have installed from operating system repositories. They will be out-of-date, sometimes by years.<br />
<br />
Add the repository,<br />
<br />
<pre><br />
# Pick one:<br />
<br />
# Nightly, unstable, new, almost always use this:<br />
wget https://apertium.projectjj.com/apt/install-nightly.sh -O - | sudo bash<br />
<br />
# Release, stable, old:<br />
wget https://apertium.projectjj.com/apt/install-release.sh -O - | sudo bash<br />
</pre><br />
<br />
<br />
You should see messages. <br />
<br />
Install dev tools,<br />
<br />
<pre><br />
sudo apt-get -f install apertium-all-dev<br />
</pre><br />
<br />
====About the Debian repository install====<br />
Check the script installed Apertium repository details,<br />
<br />
<pre><br />
apt-cache policy | grep apertium<br />
</pre><br />
<br />
Unfortunately, due to the seamless upgrading of Debian packaging, it is difficult to see which packages the new repository has added, and where. Even Synaptic, the wonder GUI, has no way through. You could try this brute force commandline,<br />
<br />
<pre><br />
find /var/lib/apt/lists/ |grep projectjj.*Packages | xargs grep -h Package<br />
</pre><br />
<br />
Which will, if nothing else, tell you a lot about byways of the Apertium project.<br />
<br />
<br />
;Constraint grammar<br />
<br />
To use CG we must have lttoolbox (we have it), apertium (we have it too) and ICU (we have to install it now).<br />
<br />
How to install ICU for Ubuntu. Open terminal and copy/paste this code:<br />
<br />
apt-get install libicu-dev<br />
<br />
Now we can install apertium, lttoolbox and CG.<br />
<br />
==Install== <br />
<br />
;Apertium<br />
<br />
Before installing apertium we have to install lttoolbox(which has been downloaded with apertium at same time).To do that you have to copy/paste this code:<br />
<br />
'''cd apertium'''<br />
<br />
'''cd lttoolbox/'''<br />
<br />
'''PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh'''<br />
<br />
'''make'''<br />
<br />
'''sudo make install'''<br />
<br />
'''sudo ldconfig'''<br />
<br />
<br />
Terminal will ask us for password again '''[sudo] password for user:''' When you write it press '''Enter'''.<br />
Wait to show you terminal user@ubuntu:~/apertium/lttoolbox$ then copy/paste this code:<br />
<br />
'''cd ..'''<br />
<br />
'''cd apertium/'''<br />
<br />
'''PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh'''<br />
<br />
'''make'''<br />
<br />
'''sudo make install'''<br />
<br />
'''sudo ldconfig'''<br />
<br />
This will start installing apertium.You have to wait a few minutes.When shows you <br />
<br />
'''vasil@ubuntu:~/apertium/apertium$ sudo ldconfig'''<br />
<br />
'''vasil@ubuntu:~/apertium/apertium$ '''<br />
<br />
the process is ready.<br />
<br />
<br />
<br />
;Constraint grammar<br />
<br />
<br />
How to install CG.Open terminal and copy/paste this code:<br />
<br />
'''$ svn co --username anonymous --password anonymous http://beta.visl.sdu.dk/svn/visl/tools/vislcg3/trunk vislcg3'''<br />
<br />
'''$ cd vislcg3'''<br />
<br />
'''$ sh autogen.sh --prefix=<prefix>'''<br />
<br />
'''$ make''' <br />
<br />
'''$ make install'''<br />
<br />
It will ask you for password '''[sudo] password for user:''' . When you write it press '''Enter.'''<br />
<br />
We are ready.<br />
<br />
=Usage=<br />
<br />
For the examples below, we use the language pair apertium-es-ca, but the principles should be applicable to any language pair. First we have to compile this pair. Go into the directory from where you installed Apertium, then<br />
<br />
cd apertium/apertium-es-ca<br />
sh autogen.sh<br />
make<br />
<br />
Let's try that what we installed is working. First copy/paste this code:<br />
<br />
echo "vino a la playa" | lt-proc es-ca.automorf.bin<br />
<br />
This should give you:<br />
<br />
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>/lo<prn><pro><p3><f><sg>$ ^playa/playa<n><f><sg>$<br />
<br />
Here we have ambiguities,one between a noun and a verb and other between a determiner and a pronoun.We can write some rules which can impose to categorize between two ambiguities.First we define our categories, these can be tags, wordforms or lemmas. It might help to think of them as "coarse tags", which may involve a set of fine tags or lemmas. So, create a file grammar.txt, and add the following text: <br />
<br />
DELIMITERS = "<$.>" ;<br />
LIST NOUN = n;<br />
LIST VERB = vblex;<br />
LIST DET = det;<br />
LIST PRN = prn;<br />
LIST PREP = pr;<br />
SECTION<br />
<br />
So first rule is states "When the current lexical unit can be a pronoun or a determiner, and it is followed on the right by a lexical unit which could be a noun, choose the determiner". We have to add this rule to the file, and compile using cg-comp:<br />
<br />
rule:<br />
<br />
<br />
# 1<br />
SELECT DET IF<br />
(0 DET)<br />
(0 PRN)<br />
(1 NOUN) ;<br />
<br />
compile with:<br />
<br />
$ ./cg-comp grammar.txt grammar.bin<br />
Sections: 1, Rules: 1, Sets: 6, Tags: 7<br />
<br />
To try what we have done copy/paste this code:<br />
<br />
$ echo "vino a la playa" | lt-proc es-ca.automorf.bin | cg-proc grammar.bin<br />
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>$ ^playa/playa<n><f><sg>$<br />
<br />
<br />
Second rule is states "When the current lexical unit can be a noun or a verb, if the subsequent two units to the right are preposition and determiner, remove the noun reading." Now we have to add this rule:<br />
<br />
<br />
rule:<br />
<br />
# 2<br />
REMOVE NOUN IF<br />
(0 NOUN)<br />
(0 VERB)<br />
(1 PREP)<br />
(2 DET) ;<br />
<br />
re-compile the grammar and test:<br />
<br />
$ echo "vino a la playa" | lt-proc es-ca.automorf.bin | cg-proc grammar.bin<br />
^vino/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>$ ^playa/playa<n><f><sg>$<br />
<br />
Third rule states "Remove interjection if the preceeding word is a modal verb."<br />
<br />
<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Specific_resources_per_language&diff=68041
Specific resources per language
2018-11-29T23:52:03Z
<p>Dharjunior: </p>
<hr />
<div>{{Github-migration-check}}<br />
{{TOCD}}<br />
The incubator can be found in the 'incubator' column in https://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work.<br />
<br />
<br />
==Specific resources per language==<br />
<br />
Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.<br />
<br />
See also the individual language pages. <br />
<br />
===[[Albanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-sqi/blob/master/apertium-sqi.sqi.dix Albanian Monodix]''<br />
<br />
;Resources<br />
* http://www.albanianoverview.com/grammar.htm<br />
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free)<br />
<br />
===[[Armenian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-hye/blob/master/apertium-hye.hye.dix Armenian Monodix]''<br />
<br />
;Resources<br />
<br />
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons<br />
<br />
===[[Assamese and Hindi]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-as-hi/blob/91f3c38b0c636deb620cbd27725d63dd763c5f0b/apertium-as-hi.hi.dix Assemese-Hindi Bidix]''<br />
<br />
<br />
--- Anusuya<br />
<br />
===[[Belarusian]]=== <br />
<br />
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language]<br />
<br />
===[[Bengali]]===<br />
<br />
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali <br />
* http://anubadok.sf.net/ -- See above<br />
<br />
===[[Bulgarian]]===<br />
:''Dictionary: [https://raw.githubusercontent.com/apertium/apertium-bul/master/apertium-bul.bul.dix Bulgarian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology]<br />
<br />
===[[Cornish]]===<br />
<br />
:''Dictionary: [https://sourceforge.net/projects/apertium/files/apertium-cy-en/0.1.0/ Cornish Monodix from SourceForge]''<br />
<br />
'''This resource has not been migrated to GitHub from SVN<br />
'''<br />
<br />
;Resources<br />
<br />
* [http://www.cornishtranslator.com/ Cornish Translator]<br />
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist]<br />
<br />
===[[Czech]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-cs.cs.dix.xml apertium-pl-cs.cs.dix.xml]'' <br />
'''This resource has not been migrated to GitHub from SVN<br />
'''<br />
<br />
:''Dictionary: [https://github.com/apertium/apertium-eo-cs/blob/c16fa21194a285941307a68e420c194a1825ebc3/apertium-eo-cs.eo-cs.dix Czech-Esperanto Bidix]''<br />
:''Dictionary: [https://github.com/apertium/apertium-cs-sl/tree/062fa172705e16f77302a8096df3733581079fb8 Czech-Slovenian Bidix]''<br />
;Resources<br />
<br />
* [http://nlp.fi.muni.cz/nlp/aisa/NlpCz/Frekvence_slov_lemmat.html Most frequent words] Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords<br />
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links]<br />
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]<br />
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source<br />
<br />
===[[Faroese]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-fao/blob/master/apertium-fao.fao.dix Faroese Monodix]''<br />
<br />
;Resources<br />
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ]<br />
* [https://github.com/apertium/apertium-fao-isl/blob/master/apertium-fao-isl.fao-isl.rlx Faroese Constraint Grammar]<br />
* [http://www.archive.org/details/frskanthologi00denmgoog Faroese-Danish dictionary from 1886]<br />
<br />
===[[Finnish]]===<br />
{{see-also|Omorfi}}<br />
;Resources<br />
<br />
* http://kaino.kotus.fi/sanat/nykysuomi/ &mdash; full form list for Finnish -- LGPL<br />
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language]<br />
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]<br />
<pre><br />
s = lemma<br />
hn = homonymy ref<br />
t = inflection info<br />
tn = inflection number (referring to table)<br />
av = ref to consonant gradation<br />
</pre><br />
<br />
===[[German and English]]===<br />
<br />
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for [http://www-user.tu-chemnitz.de/~fri/ding/ "Ding: A Dictionary LookUp program"] (version 1.5 2007-04-09) from Frank Richter, [http://tu-chemnitz.de Technische Universität Chemnitz]<br />
<br />
:''[https://github.com/apertium/apertium-eng-deu/blob/master/apertium-eng-deu.eng-deu.dix German-English Dictionary]''<br />
<br />
===[[Greek]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-ell/blob/master/apertium-ell.ell.dix Greek Monodix] <br />
:''Greek-English Dictionary: [https://github.com/apertium/apertium-ell-eng/blob/master/apertium-ell-eng.eng.dix Greek-English Dictionary]<br />
<br />
;Resources<br />
<br />
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/<br />
<br />
===[[Hebrew]]===<br />
<br />
;Resources<br />
<br />
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL<br />
* http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password<br />
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL<br />
* http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown<br />
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL<br />
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL<br />
<br />
===[[Hindi]]===<br />
{{see-also|Hindi}}<br />
<br />
;Resources<br />
<br />
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2<br />
<br />
* https://github.com/unhammer/apertium-en-hi/blob/master/apertium-en-hi.en.dix <br />
* https://github.com/apertium/apertium-hin/blob/master/apertium-hin.hin.dix <br />
* https://github.com/apertium/apertium-urd-hin/blob/master/dev/en-hi-ur.list<br />
* https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix<br />
<br />
<br />
<br />
===[[Iranian Persian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-pes/blob/master/apertium-pes.pes.dix Persian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian]<br />
<br />
===[[Ingush]]===<br />
<br />
; Resources<br />
<br />
* [http://www.linguistics.berkeley.edu/~ingush/database.html Lexical database] (non-free)<br />
* [http://books.google.com/books?id=J7wqVHeRWdwC&pg=PA5&lpg=PA5&dq=ingush+father&source=bl&ots=N8TDZudzGZ&sig=JO9X_Y9gio7dUhZWeyZX7j17iPw&hl=ca&ei=vfq4TM6CH86OjAfO94XaDg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CB8Q6AEwAg#v=onepage&q=ingush%20father&f=false Ingush-English dict] (non-free)<br />
<br />
===[[Latvian]]===<br />
;Resources<br />
* https://github.com/PeterisP/morphology GPL full-form dictionary (https://github.com/PeterisP/morphology/blob/master/src/main/resources/Lexicon.xml)<br />
<br />
;See also<br />
* [[Latvian and Russian]]<br />
<br />
===[[Lithuanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-lit/blob/master/apertium-lit.lit.dix Lithuanian Monodix]''<br />
<br />
;Resources<br />
<br />
===[[Nogai]]===<br />
<br />
; Resources<br />
<br />
* [http://ksirov.ru/%D1%8F%D0%B7%D1%8B%D0%BA%D0%B8/%D0%BD%D0%BE%D0%B3%D0%B0%D0%B9%D1%81%D0%BA%D0%B8%D0%B9 Grammar Sketch and Russian-Nogai dictionary]<br />
<br />
===[[Ossetian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-oss/blob/master/apertium-oss.oss.dix Ossetian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] &mdash; quite nice and comprehensive.<br />
* [http://www.ossetic-studies.org/ Ossetic National Corpus]<br />
<br />
===[[Piemontese]]===<br />
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-it-pms.pms.dix Piemontese Monodix from SourceForge]'' <br />
'''This resource has not been migrated to GitHub from SVN<br />
'''<br />
<br />
;Resources<br />
<br />
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain<br />
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."<br />
<br />
===[[Portuguese]]===<br />
<br />
Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.<br />
<br />
;Resources<br />
<br />
* [http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/dicionarios.html Recursos Lexicais Português do Brasil]<br />
<br />
We believe it has a LGPL license.<br />
<br />
===[[Punjabi]]===<br />
<br />
; Resources<br />
<br />
* [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi lexicon]<br />
<br />
===[[Quechua]]===<br />
<br />
;Resources<br />
<br />
* http://www.runasimipi.org/<br />
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]])<br />
<br />
===[[Russian]]===<br />
<br />
:''Dictionary: [https://github.com/apertium/apertium-rus/blob/master/apertium-rus.rus.dix monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-pol-rus/blob/master/apertium-pol-rus.pol-rus.dix Polish-Russian]''<br />
:''Bidix: [https://github.com/apertium/apertium-rus-eng/blob/master/apertium-ru-en.ru.dix English-Russian]<br />
<br />
;Resources<br />
<br />
* http://www.alphadictionary.com/rusgrammar/<br />
* http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf<br />
* [http://www.cic.ipn.mx/~sidorov/rmorph/index.html Russian analyser] - non-free, Windows only<br />
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian]<br />
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.<br />
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries].<br />
* [http://www.lugattj.com/news.php?tid=1&ln=en Another Tajik--Russian dictionary]<br />
<br />
===[[Sanskrit]] '''संस्कृतम्'''===<br />
:''Dictionary: [https://github.com/apertium/apertium-san/blob/master/apertium-san.san.dix Sanskrit Monodix]<br />
<br />
;Resources<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln]<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/download.html Material available for download].<br />
<br />
===[[Slovakian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-slk/blob/master/apertium-slk.slk.dix Slovak Monodix]''<br />
<br />
;Resources<br />
<br />
* http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English)<br />
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)<br />
* http://www.angelfire.com/sk3/quality/Slovak_declension.html<br />
* http://www.juls.savba.sk/msj/<br />
<br />
===[[Thai]]===<br />
* https://github.com/veer66/Yaitron Yaitron English-Thai and Thai-English XML dictionary, license seems standard 4-clause<br />
<br />
===[[Urdu]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-urd/blob/master/apertium-urd.urd.dix Urdu Monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]''<br />
<br />
;Resources<br />
* http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ &mdash; GPL analyser of Urdu<br />
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system<br />
<br />
<br />
[[Category:Development]]<br />
[[Category:Repository]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Beginner%27s_Constraint_Grammar_HOWTO&diff=68032
Beginner's Constraint Grammar HOWTO
2018-11-28T04:04:56Z
<p>Dharjunior: </p>
<hr />
<div>[[Installation et fonctionnement de Constraint Grammar|En français]]<br />
<br />
''The installation part for Apertium and language pairs described below refer to Ubuntu distribution. For others Linux distributions or others operating systems, let see the general [[Installation]] page''.<br />
<br />
==Download==<br />
<br />
;Apertium<br />
<br />
Sourced from [[Install Apertium core using packaging]]<br />
First, remove any Apertium packages you have installed from operating system repositories. They will be out-of-date, sometimes by years.<br />
<br />
Add the repository,<br />
<br />
<pre><br />
# Pick one:<br />
<br />
# Nightly, unstable, new, almost always use this:<br />
wget https://apertium.projectjj.com/apt/install-nightly.sh -O - | sudo bash<br />
<br />
# Release, stable, old:<br />
wget https://apertium.projectjj.com/apt/install-release.sh -O - | sudo bash<br />
</pre><br />
<br />
<br />
You should see messages. <br />
<br />
Install dev tools,<br />
<br />
<pre><br />
sudo apt-get -f install apertium-all-dev<br />
</pre><br />
<br />
====About the Debian repository install====<br />
Check the script installed Apertium repository details,<br />
<br />
<pre><br />
apt-cache policy | grep apertium<br />
</pre><br />
<br />
Unfortunately, due to the seamless upgrading of Debian packaging, it is difficult to see which packages the new repository has added, and where. Even Synaptic, the wonder GUI, has no way through. You could try this brute force commandline,<br />
<br />
<pre><br />
find /var/lib/apt/lists/ |grep projectjj.*Packages | xargs grep -h Package<br />
</pre><br />
<br />
Which will, if nothing else, tell you a lot about byways of the Apertium project.<br />
<br />
<br />
;Constraint grammar<br />
<br />
To use CG we must have lttoolbox (we have it), apertium (we have it too) and ICU (we have to install it now).<br />
<br />
How to install ICU for Ubuntu. Open terminal and copy/paste this code:<br />
<br />
apt-get install libicu-dev<br />
<br />
Now we can install apertium, lttoolbox and CG.<br />
<br />
==Install== <br />
<br />
;Apertium<br />
<br />
Before installing apertium we have to install lttoolbox(which has been downloaded with apertium at same time).To do that you have to copy/paste this code:<br />
<br />
'''cd apertium'''<br />
<br />
'''cd lttoolbox/'''<br />
<br />
'''PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh'''<br />
<br />
'''make'''<br />
<br />
'''sudo make install'''<br />
<br />
'''sudo ldconfig'''<br />
<br />
<br />
Terminal will ask us for password again '''[sudo] password for user:''' When you write it press '''Enter'''.<br />
Wait to show you terminal user@ubuntu:~/apertium/lttoolbox$ then copy/paste this code:<br />
<br />
'''cd ..'''<br />
<br />
'''cd apertium/'''<br />
<br />
'''PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh'''<br />
<br />
'''make'''<br />
<br />
'''sudo make install'''<br />
<br />
'''sudo ldconfig'''<br />
<br />
This will start installing apertium.You have to wait a few minutes.When shows you <br />
<br />
'''vasil@ubuntu:~/apertium/apertium$ sudo ldconfig'''<br />
<br />
'''vasil@ubuntu:~/apertium/apertium$ '''<br />
<br />
the process is ready.<br />
<br />
<br />
<br />
;Constraint grammar<br />
<br />
<br />
How to install CG.Open terminal and copy/paste this code:<br />
<br />
'''$ svn co --username anonymous --password anonymous http://beta.visl.sdu.dk/svn/visl/tools/vislcg3/trunk vislcg3'''<br />
<br />
'''$ cd vislcg3'''<br />
<br />
'''$ sh autogen.sh --prefix=<prefix>'''<br />
<br />
'''$ make''' <br />
<br />
'''$ make install'''<br />
<br />
It will ask you for password '''[sudo] password for user:''' . When you write it press '''Enter.'''<br />
<br />
We are ready.<br />
<br />
=Usage=<br />
<br />
For the examples below, we use the language pair apertium-es-ca, but the principles should be applicable to any language pair. First we have to compile this pair. Go into the directory from where you checked out the apertium SVN, then<br />
<br />
cd apertium/apertium-es-ca<br />
sh autogen.sh<br />
make<br />
<br />
Let's try that what we installed is working. First copy/paste this code:<br />
<br />
echo "vino a la playa" | lt-proc es-ca.automorf.bin<br />
<br />
This should give you:<br />
<br />
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>/lo<prn><pro><p3><f><sg>$ ^playa/playa<n><f><sg>$<br />
<br />
Here we have ambiguities,one between a noun and a verb and other between a determiner and a pronoun.We can write some rules which can impose to categorize between two ambiguities.First we define our categories, these can be tags, wordforms or lemmas. It might help to think of them as "coarse tags", which may involve a set of fine tags or lemmas. So, create a file grammar.txt, and add the following text: <br />
<br />
DELIMITERS = "<$.>" ;<br />
LIST NOUN = n;<br />
LIST VERB = vblex;<br />
LIST DET = det;<br />
LIST PRN = prn;<br />
LIST PREP = pr;<br />
SECTION<br />
<br />
So first rule is states "When the current lexical unit can be a pronoun or a determiner, and it is followed on the right by a lexical unit which could be a noun, choose the determiner". We have to add this rule to the file, and compile using cg-comp:<br />
<br />
rule:<br />
<br />
<br />
# 1<br />
SELECT DET IF<br />
(0 DET)<br />
(0 PRN)<br />
(1 NOUN) ;<br />
<br />
compile with:<br />
<br />
$ ./cg-comp grammar.txt grammar.bin<br />
Sections: 1, Rules: 1, Sets: 6, Tags: 7<br />
<br />
To try what we have done copy/paste this code:<br />
<br />
$ echo "vino a la playa" | lt-proc es-ca.automorf.bin | cg-proc grammar.bin<br />
^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>$ ^playa/playa<n><f><sg>$<br />
<br />
<br />
Second rule is states "When the current lexical unit can be a noun or a verb, if the subsequent two units to the right are preposition and determiner, remove the noun reading." Now we have to add this rule:<br />
<br />
<br />
rule:<br />
<br />
# 2<br />
REMOVE NOUN IF<br />
(0 NOUN)<br />
(0 VERB)<br />
(1 PREP)<br />
(2 DET) ;<br />
<br />
re-compile the grammar and test:<br />
<br />
$ echo "vino a la playa" | lt-proc es-ca.automorf.bin | cg-proc grammar.bin<br />
^vino/venir<vblex><ifi><p3><sg>$ ^a/a<pr>$ ^la/el<det><def><f><sg>$ ^playa/playa<n><f><sg>$<br />
<br />
Third rule states "Remove interjection if the preceeding word is a modal verb."<br />
<br />
<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Specific_resources_per_language&diff=68031
Specific resources per language
2018-11-28T03:50:57Z
<p>Dharjunior: </p>
<hr />
<div>{{Github-migration-check}}<br />
{{TOCD}}<br />
The incubator can be found in the 'incubator' colulmn in https://apertium.github.io/apertium-on-github/source-browser.html. It provides a place for people to put dictionaries and other stuff that is useful in constructing language pairs.<br />
<br />
<br />
==Specific resources per language==<br />
<br />
Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.<br />
<br />
See also the individual language pages. <br />
<br />
===[[Albanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-sqi/blob/master/apertium-sqi.sqi.dix Albanian Monodix]''<br />
<br />
;Resources<br />
* http://www.albanianoverview.com/grammar.htm<br />
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free)<br />
<br />
===[[Armenian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-hye/blob/master/apertium-hye.hye.dix Armenian Monodix]''<br />
<br />
;Resources<br />
<br />
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons<br />
<br />
===[[Assamese and Hindi]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-as-hi/blob/91f3c38b0c636deb620cbd27725d63dd763c5f0b/apertium-as-hi.hi.dix Assemese-Hindi Bidix]''<br />
<br />
<br />
--- Anusuya<br />
<br />
===[[Belarusian]]=== <br />
<br />
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language]<br />
<br />
===[[Bengali]]===<br />
<br />
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali <br />
* http://anubadok.sf.net/ -- See above<br />
<br />
===[[Bulgarian]]===<br />
:''Dictionary: [https://raw.githubusercontent.com/apertium/apertium-bul/master/apertium-bul.bul.dix Bulgarian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology]<br />
<br />
===[[Cornish]]===<br />
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix from SourceForge]''<br />
<br />
:''Dictionary: [https://sourceforge.net/projects/apertium/files/apertium-cy-en/0.1.0/ Cornish Monodix from SourceForge]''<br />
;Resources<br />
<br />
* [http://www.cornishtranslator.com/ Cornish Translator]<br />
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist]<br />
<br />
===[[Czech]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-cs.cs.dix.xml apertium-pl-cs.cs.dix.xml]''<br />
:''Dictionary: [https://github.com/apertium/apertium-eo-cs/blob/c16fa21194a285941307a68e420c194a1825ebc3/apertium-eo-cs.eo-cs.dix Czech-Esperanto Bidix]''<br />
:''Dictionary: [https://github.com/apertium/apertium-cs-sl/tree/062fa172705e16f77302a8096df3733581079fb8 Czech-Slovenian Bidix]''<br />
;Resources<br />
<br />
* [http://nlp.fi.muni.cz/nlp/aisa/NlpCz/Frekvence_slov_lemmat.html Most frequent words] Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords<br />
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links]<br />
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]<br />
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source<br />
<br />
===[[Faroese]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-fao/blob/master/apertium-fao.fao.dix Faroese Monodix]''<br />
<br />
;Resources<br />
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ]<br />
* [https://github.com/apertium/apertium-fao-isl/blob/master/apertium-fao-isl.fao-isl.rlx Faroese Constraint Grammar]<br />
* [http://www.archive.org/details/frskanthologi00denmgoog Faroese-Danish dictionary from 1886]<br />
<br />
===[[Finnish]]===<br />
{{see-also|Omorfi}}<br />
;Resources<br />
<br />
* http://kaino.kotus.fi/sanat/nykysuomi/ &mdash; full form list for Finnish -- LGPL<br />
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language]<br />
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]<br />
<pre><br />
s = lemma<br />
hn = homonymy ref<br />
t = inflection info<br />
tn = inflection number (referring to table)<br />
av = ref to consonant gradation<br />
</pre><br />
<br />
===[[German and English]]===<br />
<br />
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for [http://www-user.tu-chemnitz.de/~fri/ding/ "Ding: A Dictionary LookUp program"] (version 1.5 2007-04-09) from Frank Richter, [http://tu-chemnitz.de Technische Universität Chemnitz]<br />
<br />
:''[https://github.com/apertium/apertium-eng-deu/blob/master/apertium-eng-deu.eng-deu.dix German-English Dictionary]''<br />
<br />
===[[Greek]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-ell/blob/master/apertium-ell.ell.dix Greek Monodix] <br />
:''Greek-English Dictionary: [https://github.com/apertium/apertium-ell-eng/blob/master/apertium-ell-eng.eng.dix Greek-English Dictionary]<br />
<br />
;Resources<br />
<br />
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/<br />
<br />
===[[Hebrew]]===<br />
<br />
;Resources<br />
<br />
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL<br />
* http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password<br />
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL<br />
* http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown<br />
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL<br />
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL<br />
<br />
===[[Hindi]]===<br />
{{see-also|Hindi}}<br />
<br />
;Resources<br />
<br />
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2<br />
<br />
* https://github.com/unhammer/apertium-en-hi/blob/master/apertium-en-hi.en.dix <br />
* https://github.com/apertium/apertium-hin/blob/master/apertium-hin.hin.dix <br />
* https://github.com/apertium/apertium-urd-hin/blob/master/dev/en-hi-ur.list<br />
* https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix<br />
<br />
<br />
<br />
===[[Iranian Persian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-pes/blob/master/apertium-pes.pes.dix Persian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian]<br />
<br />
===[[Ingush]]===<br />
<br />
; Resources<br />
<br />
* [http://www.linguistics.berkeley.edu/~ingush/database.html Lexical database] (non-free)<br />
* [http://books.google.com/books?id=J7wqVHeRWdwC&pg=PA5&lpg=PA5&dq=ingush+father&source=bl&ots=N8TDZudzGZ&sig=JO9X_Y9gio7dUhZWeyZX7j17iPw&hl=ca&ei=vfq4TM6CH86OjAfO94XaDg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CB8Q6AEwAg#v=onepage&q=ingush%20father&f=false Ingush-English dict] (non-free)<br />
<br />
===[[Latvian]]===<br />
;Resources<br />
* https://github.com/PeterisP/morphology GPL full-form dictionary (https://github.com/PeterisP/morphology/blob/master/src/main/resources/Lexicon.xml)<br />
<br />
;See also<br />
* [[Latvian and Russian]]<br />
<br />
===[[Lithuanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-lit/blob/master/apertium-lit.lit.dix Lithuanian Monodix]''<br />
<br />
;Resources<br />
<br />
===[[Nogai]]===<br />
<br />
; Resources<br />
<br />
* [http://ksirov.ru/%D1%8F%D0%B7%D1%8B%D0%BA%D0%B8/%D0%BD%D0%BE%D0%B3%D0%B0%D0%B9%D1%81%D0%BA%D0%B8%D0%B9 Grammar Sketch and Russian-Nogai dictionary]<br />
<br />
===[[Ossetian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-oss/blob/master/apertium-oss.oss.dix Ossetian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] &mdash; quite nice and comprehensive.<br />
* [http://www.ossetic-studies.org/ Ossetic National Corpus]<br />
<br />
===[[Piemontese]]===<br />
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-it-pms.pms.dix Piemontese Monodix from SourceForge]''<br />
;Resources<br />
<br />
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain<br />
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."<br />
<br />
===[[Portuguese]]===<br />
<br />
Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.<br />
<br />
;Resources<br />
<br />
* [http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/dicionarios.html Recursos Lexicais Português do Brasil]<br />
<br />
We believe it has a LGPL license.<br />
<br />
===[[Punjabi]]===<br />
<br />
; Resources<br />
<br />
* [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi lexicon]<br />
<br />
===[[Quechua]]===<br />
<br />
;Resources<br />
<br />
* http://www.runasimipi.org/<br />
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]])<br />
<br />
===[[Russian]]===<br />
<br />
:''Dictionary: [https://github.com/apertium/apertium-rus/blob/master/apertium-rus.rus.dix monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-pol-rus/blob/master/apertium-pol-rus.pol-rus.dix Polish-Russian]''<br />
:''Bidix: [https://github.com/apertium/apertium-rus-eng/blob/master/apertium-ru-en.ru.dix English-Russian]<br />
<br />
;Resources<br />
<br />
* http://www.alphadictionary.com/rusgrammar/<br />
* http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf<br />
* [http://www.cic.ipn.mx/~sidorov/rmorph/index.html Russian analyser] - non-free, Windows only<br />
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian]<br />
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.<br />
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries].<br />
* [http://www.lugattj.com/news.php?tid=1&ln=en Another Tajik--Russian dictionary]<br />
<br />
===[[Sanskrit]] '''संस्कृतम्'''===<br />
:''Dictionary: [https://github.com/apertium/apertium-san/blob/master/apertium-san.san.dix Sanskrit Monodix]<br />
<br />
;Resources<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln]<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/download.html Material available for download].<br />
<br />
===[[Slovakian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-slk/blob/master/apertium-slk.slk.dix Slovak Monodix]''<br />
<br />
;Resources<br />
<br />
* http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English)<br />
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)<br />
* http://www.angelfire.com/sk3/quality/Slovak_declension.html<br />
* http://www.juls.savba.sk/msj/<br />
<br />
===[[Thai]]===<br />
* https://github.com/veer66/Yaitron Yaitron English-Thai and Thai-English XML dictionary, license seems standard 4-clause<br />
<br />
===[[Urdu]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-urd/blob/master/apertium-urd.urd.dix Urdu Monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]''<br />
<br />
;Resources<br />
* http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ &mdash; GPL analyser of Urdu<br />
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system<br />
<br />
<br />
[[Category:Development]]<br />
[[Category:Repository]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Documentation&diff=68030
Documentation
2018-11-28T03:41:59Z
<p>Dharjunior: </p>
<hr />
<div>[[Documentation (français)|En français]]<br />
<br />
{{Main page header}}<br />
<br />
==Official==<br />
{{TOCD}}<br />
* '''[https://wiki.apertium.org/w/images/d/d0/Apertium2-documentation.pdf Apertium 2.0: Official documentation] (222 pages)'''<br />
::Overwhelmingly also applies to Apertium 3.0, except statements about non-Unicode support. Also, the older SLR/SRL method for lexical selection/disambiguation is covered, but this is now replaced by the [[Lexical selection]] module. <br />
<br />
* '''[[Publications]]''' &mdash; Conference and journal articles published about Apertium.<br />
<br />
==Undocumented Features==<br />
* [[Cascaded Interchunk]]<br />
<br />
==General==<br />
<br />
* [[Workflow diagram]]<br />
* [[Workflow reference]] &mdash; includes examples, file names, and links.<br />
* [[List of symbols]]<br />
** [[Liste de symboles]]<br />
* [[Apertium stream format]]<br />
<br />
==Apertium in 5 slides==<br />
<br />
Short presentations prepared by GCI students.<br />
<br />
* [http://slides.com/allysonallyson/deck#/ Apertium: A Free/Open-Source Machine Translation System], by Allyson Boone<br />
* [http://slides.com/darkgaia/deck#/ Apertium A free/open-source machine translation platform], by Darkgaia<br />
<br />
==Entry Test for Apertium==<br />
<br />
Test how well you know Apertium in this quiz! Are you knowledgeable enough about Apertium to be a stellar developer/GCI/GSoC student?<br />
<br />
* [https://www.learnclick.com/quiz/frame/14586 Entry Test for Apertium], by Darkgaia<br />
<br />
==GsoC / GCI students==<br />
<br />
* GsoC students: read [[Top tips for GSOC applications]] and [[Ideas_for_Google_Summer_of_Code]] <br />
* GCI students: read [[Top_tips_for_GSOC_applications#Other_tips]]<br />
<br />
==Creating a new pair==<br />
<br />
* '''[[Apertium New Language Pair HOWTO]]''' &mdash; step-by-step description of how to start a new language pair in Apertium.<br />
** [[Créer une nouvelle paire de langues]] <br />
** [[Uputstvo za novi jezički par za Apertium]]<br />
** [[Апертиум, как се създава нова езикова двойка]]<br />
** [[Kiel aldoni novan lingvan duon]]<br />
** [[Упатство за креирање на нови јазични парови]]<br />
** [[Amestar un par de llingües nuevu]]<br />
** [[Sevel ur c'houblad yezhoù nevez]]<br />
** [[अपर्टियम मे नई भाषा जोडे]]<br />
** [[अपर्टियम मा नायाँ भाषा जोड्नुहोस]]<br />
** [[Руководство по созданию новой языковой пары]]<br />
** [[Come scrivere una nuova coppia di lingue Apertium]]<br />
<br />
* [[Building dictionaries]] &mdash; some tips and tricks for building dictionaries.<br />
* [[Getting started with induction tools]] &mdash; how to install Apertium, GIZA++, etc, and create a bilingual dictionary.<br />
* [[Tagger training]] &mdash; how to train your part-of-speech tagger.<br />
* [[Cookbook]] &mdash; code-snippets for various "hard to work out" phenomena in some languages and families.<br />
* [[Using linguistic resources]] &mdash; a primer for newcomers.<br />
* [[Preparing to use apertium-transfer-tools]]<br />
<br />
==Contributing to an existing pair==<br />
<br />
* [[Contributing to an existing pair]] &mdash; some pointers for contributing to an existing pair.<br />
** [[Comment contribuer à une paire de langues existante]]<br />
* '''[[Become a language pair developer for Apertium]]''' &mdash; A quick, step-by-step guide for Ubuntu and Debian users on how to add entries to a language pair.<br />
<br />
==How it works==<br />
<br />
* [[Apertium for Dummies]]<br />
*[[Using Git]] &mdash; an introduction to using Git for Apertium.<br />
** [https://git-scm.com/doc Git Documentation]<br />
** [http://rogerdudler.github.io/git-guide/ Simple Guide to Using Git]<br />
* [[Code structure]] &mdash; a guide to the file structure of source downloads (and the SVN repository)<br />
* [[Modes introduction]]<br />
* [[Multiwords]]<br />
* [[Frequently Asked Questions]] &mdash; what it says in the link.<br />
** [[Questions fréquentes]]<br />
<br />
====Introductions to Modules====<br />
* [[Monodix basics]] &mdash; a basic step-by-step explanation to morphological dictionaries <br />
** [[Diccionariu morfolóxicu]]<br />
** [[Dictionnaire unilingue]]<br />
* [[Tips for working on bilingual dictionaries]] &mdash; hints about issues you may meet<br />
* [[Bilingual dictionary]] &mdash; some guidelines for writing bilingual dictionaries<br />
**[[Dictionnaire bilingue]]<br />
* [[How to get started with lexical selection rules]]<br />
* [[Chunking: A full example]]<br />
* [[Post-generator]]<br />
* [[Constraint Grammar]]<br />
<br />
Not used often now,<br />
* [[Part-of-speech tagging]] &mdash; information on HMMs for part-of-speech tagging<br />
**[[Balisage d'une partie de discours]]<br />
<br />
See also the [[Workflow reference]]<br />
<br />
<br />
==Indexes==<br />
* [[:Category:Documentation in Turkish|Documentation in Turkish]]<br />
* [[:Category:Documentation in English|Documentation in English]]<br />
* [[:Category:Documentación en castellano|Documentación en Castellano]]<br />
* [[:Category:Dokumentado en Esperanto|Dokumentado en Esperanto]]<br />
* [[:Category:Documentation en français|Documentation en Français]]<br />
* [[:Category:Documentazione in italiano|Documentazione in Italiano]]<br />
* [[:Category:Documentaţie în română|Documentaţie în Română]]<br />
* [[:Category:Машинный перевод для языков России|Машинный перевод для языков России]]<br />
* [[:Category:Документация на русском языке|Документация на русском языке]]<br />
<br />
<br />
[[Category:Documentation|*]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Wiki_regression_testing&diff=68029
Wiki regression testing
2018-11-28T03:35:49Z
<p>Dharjunior: </p>
<hr />
<div>{{TOCD}}<br />
{{Github-migration-check}}<br />
The term '''Regression testing'''<ref>See the Wikipedia article [http://en.wikipedia.org/wiki/Regression_testing Regression testing]</ref> is used in Apertium to describe a way of testing a language pair, by translating a collection of "test phrases" and compare the result to the expected output.<br />
<br />
This is most useful when you are working on a young/new language pair, where you are working with rules or dictionaries and want to change a rule or word and get an overview of the consequences, and check that your changes haven't broken anything. <br />
<br />
Most mature pairs abandon these tests (see discussion [https://plus.google.com/u/0/114804369744409916883/posts/JvozgQxVAyT here]) and use [[testvoc]] and [[corpus test]] to test improvements in the translator, although regression tests described here have an advantage of being deterministic - in the sense that they either pass or fail - whereas it's hard to achieve such determinism in case of a corpus test (especially if the corpus is large). <br />
<br />
The regressions tests work by downloading a wiki page with the phrases and the expected output like the page [[English_and_Esperanto/Regression_tests]] and then running the translator.<br />
<br />
==Installation and invocation==<br />
<br />
See https://github.com/unhammer/apertium-wiki-tests for how to set up your language pair with wiki-based regression tests.<br />
<br />
You should keep the test outputs (t/latest-{regression,pending}.results) on Github, that way you can tell more easily where regressions were introduced.<br />
<br />
==Output==<br />
<br />
For each test, the scipt prints 1) source language 2) expected (correct) output 3) actual output from Apertium. In some test scripts it will only print out the "failed" test (where the expected and actual output is diverging), and in other test scripts it will print out all of the tests, and a summary.<br />
<br />
===All printing style===<br />
<br />
;Example<br />
<br />
<pre><br />
$ sh regression-tests.sh <br />
sv-da Jag vill gå en tur<br />
WORKS Jeg vil gå en tur<br />
<br />
sv-da Du vill gå en tur<br />
WORKS Du vil gå en tur<br />
<br />
....<br />
<br />
sv-da Maten är äten<br />
- Maden er spist<br />
+ #Mad er spist<br />
<br />
sv-da Äpplet är ätet<br />
WORKS Æblet er spist<br />
<br />
22 / 23<br />
~95.65%<br />
</pre><br />
<br />
===No-printing style===<br />
<br />
;Example:<br />
<br />
<pre><br />
en-eo In the 'Boards' section you can change the list of activities. <br />
- En la sekcio 'Tabuloj' vi povas ŝanĝi la liston de aktivecoj. <br />
+ En la 'Tabuloj' sekcio vi povas ŝanĝi la listo de aktivecoj.<br />
<br />
en-eo Just untoggle them in the treeview. <br />
- Simple malelektu ilin en la arbaspekto. <br />
+ Nur *untoggle ilin en la *treeview.<br />
<br />
en-eo You can save multiple configurations, and switch between them easily. <br />
- Vi povas sekurigi multajn agordojn, kaj ŝalti inter ili facile. <br />
+ Vi povas savi *multiple *configurations, kaj ŝanĝo inter ilin facile.<br />
<br />
en-eo You can add multiple profiles, with different lists of boards, and different languages. <br />
- Vi povas aldoni plurajn profilojn, kun diversaj listoj de tabuloj, kaj diversaj lingvoj. <br />
+ Vi povas aldoni *multiple profiloj, kun diferencaj listoj de tabuloj, kaj diferencaj langoj.<br />
</pre><br />
<br />
==See also==<br />
<br />
* [[Special:Search/Regression]] for some examples of regression tests.<br />
<br />
==References==<br />
<references/><br />
<br />
[[Category:Development]]<br />
[[Category:Quality control]]<br />
[[Category:Evaluation]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Specific_resources_per_language&diff=68028
Specific resources per language
2018-11-28T01:40:17Z
<p>Dharjunior: </p>
<hr />
<div>{{Github-migration-check}}<br />
{{TOCD}}<br />
The incubator can be found in the 'incubator' colulmn in https://apertium.github.io/apertium-on-github/source-browser.html. It provides a place for people to put dictionaries and other stuff that is useful in constructing language pairs.<br />
<br />
<br />
==Specific resources per language==<br />
<br />
Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.<br />
<br />
See also the individual language pages. <br />
<br />
===[[Albanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-sqi/blob/master/apertium-sqi.sqi.dix Albanian Monodix]''<br />
<br />
;Resources<br />
* http://www.albanianoverview.com/grammar.htm<br />
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free)<br />
<br />
===[[Armenian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-hye/blob/master/apertium-hye.hye.dix Armenian Monodix]''<br />
<br />
;Resources<br />
<br />
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons<br />
<br />
===[[Assamese and Hindi]]===<br />
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-as-hi/apertium-as-hi.as-hi.dix apertium-as-hi.hi.dix apertium-as-hi.as-hi.dix apertium-as-hi.trules.xml from SourceForge]''<br />
<br />
<br />
--- Anusuya<br />
<br />
===[[Belarusian]]=== <br />
<br />
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language]<br />
<br />
===[[Bengali]]===<br />
<br />
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali <br />
* http://anubadok.sf.net/ -- See above<br />
<br />
===[[Bulgarian]]===<br />
:''Dictionary: [https://raw.githubusercontent.com/apertium/apertium-bul/master/apertium-bul.bul.dix Bulgarian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology]<br />
<br />
===[[Cornish]]===<br />
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix from SourceForge]''<br />
<br />
[No Longer Accessible]<br />
<br />
:''Dictionary: [https://sourceforge.net/projects/apertium/files/apertium-cy-en/0.1.0/ Cornish Monodix from SourceForge]''<br />
;Resources<br />
<br />
* [http://www.cornishtranslator.com/ Cornish Translator]<br />
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist]<br />
<br />
===[[Czech]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-cs.cs.dix.xml apertium-pl-cs.cs.dix.xml]''<br />
;Resources<br />
<br />
* [http://nlp.fi.muni.cz/nlp/aisa/NlpCz/Frekvence_slov_lemmat.html Most frequent words] Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords<br />
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links]<br />
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]<br />
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source<br />
<br />
===[[Faroese]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-fao/blob/master/apertium-fao.fao.dix Faroese Monodix]''<br />
<br />
;Resources<br />
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ]<br />
* [https://github.com/apertium/apertium-fao-isl/blob/master/apertium-fao-isl.fao-isl.rlx Faroese Constraint Grammar]<br />
* [http://www.archive.org/details/frskanthologi00denmgoog Faroese-Danish dictionary from 1886]<br />
<br />
===[[Finnish]]===<br />
{{see-also|Omorfi}}<br />
;Resources<br />
<br />
* http://kaino.kotus.fi/sanat/nykysuomi/ &mdash; full form list for Finnish -- LGPL<br />
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language]<br />
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]<br />
<pre><br />
s = lemma<br />
hn = homonymy ref<br />
t = inflection info<br />
tn = inflection number (referring to table)<br />
av = ref to consonant gradation<br />
</pre><br />
<br />
===[[German and English]]===<br />
<br />
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for [http://www-user.tu-chemnitz.de/~fri/ding/ "Ding: A Dictionary LookUp program"] (version 1.5 2007-04-09) from Frank Richter, [http://tu-chemnitz.de Technische Universität Chemnitz]<br />
<br />
:''[https://github.com/apertium/apertium-eng-deu/blob/master/apertium-eng-deu.eng-deu.dix German-English Dictionary]''<br />
<br />
===[[Greek]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-ell/blob/master/apertium-ell.ell.dix Greek Monodix] <br />
:''Greek-English Dictionary: [https://github.com/apertium/apertium-ell-eng/blob/master/apertium-ell-eng.eng.dix Greek-English Dictionary]<br />
<br />
;Resources<br />
<br />
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/<br />
<br />
===[[Hebrew]]===<br />
<br />
;Resources<br />
<br />
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL<br />
* http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password<br />
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL<br />
* http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown<br />
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL<br />
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL<br />
<br />
===[[Hindi]]===<br />
{{see-also|Hindi}}<br />
<br />
;Resources<br />
<br />
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2<br />
<br />
* https://github.com/unhammer/apertium-en-hi/blob/master/apertium-en-hi.en.dix <br />
* https://github.com/apertium/apertium-hin/blob/master/apertium-hin.hin.dix <br />
* https://github.com/apertium/apertium-urd-hin/blob/master/dev/en-hi-ur.list<br />
* https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix<br />
<br />
<br />
<br />
===[[Iranian Persian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-pes/blob/master/apertium-pes.pes.dix Persian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian]<br />
<br />
===[[Ingush]]===<br />
<br />
; Resources<br />
<br />
* [http://www.linguistics.berkeley.edu/~ingush/database.html Lexical database] (non-free)<br />
* [http://books.google.com/books?id=J7wqVHeRWdwC&pg=PA5&lpg=PA5&dq=ingush+father&source=bl&ots=N8TDZudzGZ&sig=JO9X_Y9gio7dUhZWeyZX7j17iPw&hl=ca&ei=vfq4TM6CH86OjAfO94XaDg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CB8Q6AEwAg#v=onepage&q=ingush%20father&f=false Ingush-English dict] (non-free)<br />
<br />
===[[Latvian]]===<br />
;Resources<br />
* https://github.com/PeterisP/morphology GPL full-form dictionary (https://github.com/PeterisP/morphology/blob/master/src/main/resources/Lexicon.xml)<br />
<br />
;See also<br />
* [[Latvian and Russian]]<br />
<br />
===[[Lithuanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-lit/blob/master/apertium-lit.lit.dix Lithuanian Monodix]''<br />
<br />
;Resources<br />
<br />
===[[Nogai]]===<br />
<br />
; Resources<br />
<br />
* [http://ksirov.ru/%D1%8F%D0%B7%D1%8B%D0%BA%D0%B8/%D0%BD%D0%BE%D0%B3%D0%B0%D0%B9%D1%81%D0%BA%D0%B8%D0%B9 Grammar Sketch and Russian-Nogai dictionary]<br />
<br />
===[[Ossetian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-oss/blob/master/apertium-oss.oss.dix Ossetian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] &mdash; quite nice and comprehensive.<br />
* [http://www.ossetic-studies.org/ Ossetic National Corpus]<br />
<br />
===[[Piemontese]]===<br />
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-it-pms.pms.dix Piemontese Monodix from SourceForge]''<br />
;Resources<br />
<br />
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain<br />
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."<br />
<br />
===[[Portuguese]]===<br />
<br />
Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.<br />
<br />
;Resources<br />
<br />
* [http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/dicionarios.html Recursos Lexicais Português do Brasil]<br />
<br />
We believe it has a LGPL license.<br />
<br />
===[[Punjabi]]===<br />
<br />
; Resources<br />
<br />
* [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi lexicon]<br />
<br />
===[[Quechua]]===<br />
<br />
;Resources<br />
<br />
* http://www.runasimipi.org/<br />
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]])<br />
<br />
===[[Russian]]===<br />
<br />
:''Dictionary: [https://github.com/apertium/apertium-rus/blob/master/apertium-rus.rus.dix monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-pol-rus/blob/master/apertium-pol-rus.pol-rus.dix Polish-Russian]''<br />
:''Bidix: [https://github.com/apertium/apertium-rus-eng/blob/master/apertium-ru-en.ru.dix English-Russian]<br />
<br />
;Resources<br />
<br />
* http://www.alphadictionary.com/rusgrammar/<br />
* http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf<br />
* [http://www.cic.ipn.mx/~sidorov/rmorph/index.html Russian analyser] - non-free, Windows only<br />
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian]<br />
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.<br />
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries].<br />
* [http://www.lugattj.com/news.php?tid=1&ln=en Another Tajik--Russian dictionary]<br />
<br />
===[[Sanskrit]] '''संस्कृतम्'''===<br />
:''Dictionary: [https://github.com/apertium/apertium-san/blob/master/apertium-san.san.dix Sanskrit Monodix]<br />
<br />
;Resources<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln]<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/download.html Material available for download].<br />
<br />
===[[Slovakian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-slk/blob/master/apertium-slk.slk.dix Slovak Monodix]''<br />
<br />
;Resources<br />
<br />
* http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English)<br />
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)<br />
* http://www.angelfire.com/sk3/quality/Slovak_declension.html<br />
* http://www.juls.savba.sk/msj/<br />
<br />
===[[Thai]]===<br />
* https://github.com/veer66/Yaitron Yaitron English-Thai and Thai-English XML dictionary, license seems standard 4-clause<br />
<br />
===[[Urdu]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-urd/blob/master/apertium-urd.urd.dix Urdu Monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]''<br />
<br />
;Resources<br />
* http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ &mdash; GPL analyser of Urdu<br />
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system<br />
<br />
<br />
[[Category:Development]]<br />
[[Category:Repository]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Specific_resources_per_language&diff=68027
Specific resources per language
2018-11-28T01:25:28Z
<p>Dharjunior: </p>
<hr />
<div>{{Github-migration-check}}<br />
{{TOCD}}<br />
The incubator can be found in the 'incubator' colulmn in https://apertium.github.io/apertium-on-github/source-browser.html. It provides a place for people to put dictionaries and other stuff that is useful in constructing language pairs.<br />
<br />
<br />
==Specific resources per language==<br />
<br />
Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.<br />
<br />
See also the individual language pages. <br />
<br />
===[[Albanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-sqi/blob/master/apertium-sqi.sqi.dix Albanian Monodix]''<br />
<br />
;Resources<br />
* http://www.albanianoverview.com/grammar.htm<br />
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free)<br />
<br />
===[[Armenian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-hye/blob/master/apertium-hye.hye.dix Armenian Monodix]''<br />
<br />
;Resources<br />
<br />
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons<br />
<br />
===[[Assamese and Hindi]]===<br />
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-as-hi/apertium-as-hi.as-hi.dix apertium-as-hi.hi.dix apertium-as-hi.as-hi.dix apertium-as-hi.trules.xml from SourceForge]''<br />
<br />
<br />
--- Anusuya<br />
<br />
===[[Belarusian]]=== <br />
<br />
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language]<br />
<br />
===[[Bengali]]===<br />
<br />
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali <br />
* http://anubadok.sf.net/ -- See above<br />
<br />
===[[Bulgarian]]===<br />
:''Dictionary: [https://raw.githubusercontent.com/apertium/apertium-bul/master/apertium-bul.bul.dix Bulgarian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology]<br />
<br />
===[[Cornish]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix]''<br />
<br />
[No Longer Accessible]<br />
<br />
:''Dictionary: [https://sourceforge.net/projects/apertium/files/apertium-cy-en/0.1.0/ Cornish Monodix from SourceForge]''<br />
;Resources<br />
<br />
* [http://www.cornishtranslator.com/ Cornish Translator]<br />
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist]<br />
<br />
===[[Czech]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-cs.cs.dix.xml apertium-pl-cs.cs.dix.xml]''<br />
;Resources<br />
<br />
* [http://nlp.fi.muni.cz/nlp/aisa/NlpCz/Frekvence_slov_lemmat.html Most frequent words] Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords<br />
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links]<br />
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]<br />
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source<br />
<br />
===[[Faroese]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-fao/blob/master/apertium-fao.fao.dix Faroese Monodix]''<br />
<br />
;Resources<br />
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ]<br />
* [https://github.com/apertium/apertium-fao-isl/blob/master/apertium-fao-isl.fao-isl.rlx Faroese Constraint Grammar]<br />
* [http://www.archive.org/details/frskanthologi00denmgoog Faroese-Danish dictionary from 1886]<br />
<br />
===[[Finnish]]===<br />
{{see-also|Omorfi}}<br />
;Resources<br />
<br />
* http://kaino.kotus.fi/sanat/nykysuomi/ &mdash; full form list for Finnish -- LGPL<br />
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language]<br />
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]<br />
<pre><br />
s = lemma<br />
hn = homonymy ref<br />
t = inflection info<br />
tn = inflection number (referring to table)<br />
av = ref to consonant gradation<br />
</pre><br />
<br />
===[[German and English]]===<br />
<br />
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for [http://www-user.tu-chemnitz.de/~fri/ding/ "Ding: A Dictionary LookUp program"] (version 1.5 2007-04-09) from Frank Richter, [http://tu-chemnitz.de Technische Universität Chemnitz]<br />
<br />
:''[https://github.com/apertium/apertium-eng-deu/blob/master/apertium-eng-deu.eng-deu.dix German-English Dictionary]''<br />
<br />
===[[Greek]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-ell/blob/master/apertium-ell.ell.dix Greek Monodix] <br />
:''Greek-English Dictionary: [https://github.com/apertium/apertium-ell-eng/blob/master/apertium-ell-eng.eng.dix Greek-English Dictionary]<br />
<br />
;Resources<br />
<br />
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/<br />
<br />
===[[Hebrew]]===<br />
<br />
;Resources<br />
<br />
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL<br />
* http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password<br />
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL<br />
* http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown<br />
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL<br />
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL<br />
<br />
===[[Hindi]]===<br />
{{see-also|Hindi}}<br />
<br />
;Resources<br />
<br />
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2<br />
<br />
* https://github.com/unhammer/apertium-en-hi/blob/master/apertium-en-hi.en.dix <br />
* https://github.com/apertium/apertium-hin/blob/master/apertium-hin.hin.dix <br />
* https://github.com/apertium/apertium-urd-hin/blob/master/dev/en-hi-ur.list<br />
* https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix<br />
<br />
<br />
<br />
===[[Iranian Persian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-pes/blob/master/apertium-pes.pes.dix Persian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian]<br />
<br />
===[[Ingush]]===<br />
<br />
; Resources<br />
<br />
* [http://www.linguistics.berkeley.edu/~ingush/database.html Lexical database] (non-free)<br />
* [http://books.google.com/books?id=J7wqVHeRWdwC&pg=PA5&lpg=PA5&dq=ingush+father&source=bl&ots=N8TDZudzGZ&sig=JO9X_Y9gio7dUhZWeyZX7j17iPw&hl=ca&ei=vfq4TM6CH86OjAfO94XaDg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CB8Q6AEwAg#v=onepage&q=ingush%20father&f=false Ingush-English dict] (non-free)<br />
<br />
===[[Latvian]]===<br />
;Resources<br />
* https://github.com/PeterisP/morphology GPL full-form dictionary (https://github.com/PeterisP/morphology/blob/master/src/main/resources/Lexicon.xml)<br />
<br />
;See also<br />
* [[Latvian and Russian]]<br />
<br />
===[[Lithuanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-lit/blob/master/apertium-lit.lit.dix Lithuanian Monodix]''<br />
<br />
;Resources<br />
<br />
===[[Nogai]]===<br />
<br />
; Resources<br />
<br />
* [http://ksirov.ru/%D1%8F%D0%B7%D1%8B%D0%BA%D0%B8/%D0%BD%D0%BE%D0%B3%D0%B0%D0%B9%D1%81%D0%BA%D0%B8%D0%B9 Grammar Sketch and Russian-Nogai dictionary]<br />
<br />
===[[Ossetian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-oss/blob/master/apertium-oss.oss.dix Ossetian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] &mdash; quite nice and comprehensive.<br />
* [http://www.ossetic-studies.org/ Ossetic National Corpus]<br />
<br />
===[[Piemontese]]===<br />
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-it-pms.pms.dix Piemontese Monodix from SourceForge]''<br />
;Resources<br />
<br />
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain<br />
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."<br />
<br />
===[[Portuguese]]===<br />
<br />
Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.<br />
<br />
;Resources<br />
<br />
* [http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/dicionarios.html Recursos Lexicais Português do Brasil]<br />
<br />
We believe it has a LGPL license.<br />
<br />
===[[Punjabi]]===<br />
<br />
; Resources<br />
<br />
* [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi lexicon]<br />
<br />
===[[Quechua]]===<br />
<br />
;Resources<br />
<br />
* http://www.runasimipi.org/<br />
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]])<br />
<br />
===[[Russian]]===<br />
<br />
:''Dictionary: [https://github.com/apertium/apertium-rus/blob/master/apertium-rus.rus.dix monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-pol-rus/blob/master/apertium-pol-rus.pol-rus.dix Polish-Russian]''<br />
:''Bidix: [https://github.com/apertium/apertium-rus-eng/blob/master/apertium-ru-en.ru.dix English-Russian]<br />
<br />
;Resources<br />
<br />
* http://www.alphadictionary.com/rusgrammar/<br />
* http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf<br />
* [http://www.cic.ipn.mx/~sidorov/rmorph/index.html Russian analyser] - non-free, Windows only<br />
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian]<br />
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.<br />
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries].<br />
* [http://www.lugattj.com/news.php?tid=1&ln=en Another Tajik--Russian dictionary]<br />
<br />
===[[Sanskrit]] '''संस्कृतम्'''===<br />
:''Dictionary: [https://github.com/apertium/apertium-san/blob/master/apertium-san.san.dix Sanskrit Monodix]<br />
<br />
;Resources<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln]<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/download.html Material available for download].<br />
<br />
===[[Slovakian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-slk/blob/master/apertium-slk.slk.dix Slovak Monodix]''<br />
<br />
;Resources<br />
<br />
* http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English)<br />
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)<br />
* http://www.angelfire.com/sk3/quality/Slovak_declension.html<br />
* http://www.juls.savba.sk/msj/<br />
<br />
===[[Thai]]===<br />
* https://github.com/veer66/Yaitron Yaitron English-Thai and Thai-English XML dictionary, license seems standard 4-clause<br />
<br />
===[[Urdu]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-urd/blob/master/apertium-urd.urd.dix Urdu Monodix]''<br />
:''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]''<br />
<br />
;Resources<br />
* http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ &mdash; GPL analyser of Urdu<br />
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system<br />
<br />
<br />
[[Category:Development]]<br />
[[Category:Repository]]<br />
[[Category:Documentation in English]]</div>
Dharjunior
https://wiki.apertium.org/w/index.php?title=Specific_resources_per_language&diff=68023
Specific resources per language
2018-11-27T17:52:13Z
<p>Dharjunior: Updated a few links from SVN to Github wherever possible.</p>
<hr />
<div>{{Github-migration-check}}<br />
{{TOCD}}<br />
The incubator can be found in the 'incubator' colulmn in https://apertium.github.io/apertium-on-github/source-browser.html. It provides a place for people to put dictionaries and other stuff that is useful in constructing language pairs.<br />
<br />
<br />
==Specific resources per language==<br />
<br />
Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.<br />
<br />
See also the individual language pages. <br />
<br />
===[[Albanian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-sqi/blob/master/apertium-sqi.sqi.dix Albanian Monodix]''<br />
<br />
;Resources<br />
* http://www.albanianoverview.com/grammar.htm<br />
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free)<br />
<br />
===[[Armenian]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-hye/blob/master/apertium-hye.hye.dix Armenian Monodix]''<br />
<br />
;Resources<br />
<br />
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons<br />
<br />
===[[Assamese and Hindi]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-as-hi.as.dix apertium-as-hi.hi.dix apertium-as-hi.as-hi.dix apertium-as-hi.trules.xml]''<br />
<br />
[No Longer Accessible]<br />
<br />
--- Anusuya<br />
<br />
===[[Belarusian]]=== <br />
<br />
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language]<br />
<br />
===[[Bengali]]===<br />
<br />
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali <br />
* http://anubadok.sf.net/ -- See above<br />
<br />
===[[Bulgarian]]===<br />
:''Dictionary: [https://raw.githubusercontent.com/apertium/apertium-bul/master/apertium-bul.bul.dix Bulgarian Monodix]''<br />
<br />
;Resources<br />
<br />
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology]<br />
<br />
===[[Cornish]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix]''<br />
<br />
[No Longer Accessible]<br />
<br />
:''Dictionary: [https://sourceforge.net/projects/apertium/files/apertium-cy-en/0.1.0/ Cornish Monodix from SourceForge]''<br />
;Resources<br />
<br />
* [http://www.cornishtranslator.com/ Cornish Translator]<br />
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist]<br />
<br />
===[[Czech]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-cs.cs.dix.xml apertium-pl-cs.cs.dix.xml]''<br />
;Resources<br />
<br />
* [http://nlp.fi.muni.cz/nlp/aisa/NlpCz/Frekvence_slov_lemmat.html Most frequent words] Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords<br />
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links]<br />
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]<br />
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source<br />
<br />
===[[Faroese]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-fao/blob/master/apertium-fao.fao.dix Faroese Monodix]''<br />
<br />
;Resources<br />
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ]<br />
* [https://github.com/apertium/apertium-fao-isl/blob/master/apertium-fao-isl.fao-isl.rlx Faroese Constraint Grammar]<br />
* [http://www.archive.org/details/frskanthologi00denmgoog Faroese-Danish dictionary from 1886]<br />
<br />
===[[Finnish]]===<br />
{{see-also|Omorfi}}<br />
;Resources<br />
<br />
* http://kaino.kotus.fi/sanat/nykysuomi/ &mdash; full form list for Finnish -- LGPL<br />
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language]<br />
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]<br />
<pre><br />
s = lemma<br />
hn = homonymy ref<br />
t = inflection info<br />
tn = inflection number (referring to table)<br />
av = ref to consonant gradation<br />
</pre><br />
<br />
===[[German and English]]===<br />
<br />
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for [http://www-user.tu-chemnitz.de/~fri/ding/ "Ding: A Dictionary LookUp program"] (version 1.5 2007-04-09) from Frank Richter, [http://tu-chemnitz.de Technische Universität Chemnitz]<br />
<br />
:''[https://github.com/apertium/apertium-eng-deu/blob/master/apertium-eng-deu.eng-deu.dix German-English Dictionary]''<br />
<br />
===[[Greek]]===<br />
:''Dictionary: [https://github.com/apertium/apertium-ell/blob/master/apertium-ell.ell.dix Greek Monodix] <br />
:''Greek-English Dictionary: [https://github.com/apertium/apertium-ell-eng/blob/master/apertium-ell-eng.eng.dix Greek-English Dictionary]<br />
<br />
;Resources<br />
<br />
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/<br />
<br />
===[[Hebrew]]===<br />
<br />
;Resources<br />
<br />
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL<br />
* http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password<br />
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL<br />
* http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown<br />
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL<br />
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL<br />
<br />
===[[Hindi]]===<br />
{{see-also|Hindi}}<br />
<br />
;Resources<br />
<br />
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2<br />
<br />
* https://github.com/unhammer/apertium-en-hi/blob/master/apertium-en-hi.en.dix <br />
* https://github.com/apertium/apertium-hin/blob/master/apertium-hin.hin.dix <br />
* https://github.com/apertium/apertium-urd-hin/blob/master/dev/en-hi-ur.list<br />
* https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix<br />
<br />
<br />
<br />
===[[Iranian Persian]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-tg-fa.fa.dix apertium-tg-fa.fa.dix]''<br />
<br />
;Resources<br />
<br />
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian]<br />
<br />
===[[Ingush]]===<br />
<br />
; Resources<br />
<br />
* [http://www.linguistics.berkeley.edu/~ingush/database.html Lexical database] (non-free)<br />
* [http://books.google.com/books?id=J7wqVHeRWdwC&pg=PA5&lpg=PA5&dq=ingush+father&source=bl&ots=N8TDZudzGZ&sig=JO9X_Y9gio7dUhZWeyZX7j17iPw&hl=ca&ei=vfq4TM6CH86OjAfO94XaDg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CB8Q6AEwAg#v=onepage&q=ingush%20father&f=false Ingush-English dict] (non-free)<br />
<br />
===[[Latvian]]===<br />
;Resources<br />
* https://github.com/PeterisP/morphology GPL full-form dictionary (https://github.com/PeterisP/morphology/blob/master/src/main/resources/Lexicon.xml)<br />
<br />
;See also<br />
* [[Latvian and Russian]]<br />
<br />
===[[Lithuanian]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-en-lt.lt.dix apertium-en-lt.lt.dix]''<br />
<br />
;Resources<br />
<br />
===[[Nogai]]===<br />
<br />
; Resources<br />
<br />
* [http://ksirov.ru/%D1%8F%D0%B7%D1%8B%D0%BA%D0%B8/%D0%BD%D0%BE%D0%B3%D0%B0%D0%B9%D1%81%D0%BA%D0%B8%D0%B9 Grammar Sketch and Russian-Nogai dictionary]<br />
<br />
===[[Ossetian]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-os-fa.os.dix apertium-os-fa.os.dix]''<br />
<br />
;Resources<br />
<br />
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] &mdash; quite nice and comprehensive.<br />
* [http://www.ossetic-studies.org/ Ossetic National Corpus]<br />
<br />
===[[Piemontese]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-it-pms.pms.dix apertium-it-pms.pms.dix]''<br />
;Resources<br />
<br />
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain<br />
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."<br />
<br />
===[[Portuguese]]===<br />
<br />
Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.<br />
<br />
;Resources<br />
<br />
* [http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/dicionarios.html Recursos Lexicais Português do Brasil]<br />
<br />
We believe it has a LGPL license.<br />
<br />
===[[Punjabi]]===<br />
<br />
; Resources<br />
<br />
* [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi lexicon]<br />
<br />
===[[Quechua]]===<br />
<br />
;Resources<br />
<br />
* http://www.runasimipi.org/<br />
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]])<br />
<br />
===[[Russian]]===<br />
<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-ru.ru.dix.xml monodix]''<br />
:''Bidix: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-ru.pl-ru.dix.xml Polish-Russian]''<br />
:''Bidix: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-en-ru.en-ru.dix.xml English-Russian]<br />
<br />
;Resources<br />
<br />
* http://www.alphadictionary.com/rusgrammar/<br />
* http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf<br />
* [http://www.cic.ipn.mx/~sidorov/rmorph/index.html Russian analyser] - non-free, Windows only<br />
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian]<br />
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.<br />
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries].<br />
* [http://www.lugattj.com/news.php?tid=1&ln=en Another Tajik--Russian dictionary]<br />
<br />
===[[Sanskrit]] '''संस्कृतम्'''===<br />
:''Dictionary: [https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-sa-XX apertium-sa-XX]<br />
<br />
;Resources<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln]<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary<br />
* [http://www.sanskrit-lexicon.uni-koeln.de/download.html Material available for download].<br />
<br />
===[[Slovakian]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-sk.sk.dix apertium-pl-sk.sk.dix]''<br />
<br />
;Resources<br />
<br />
* http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English)<br />
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)<br />
* http://www.angelfire.com/sk3/quality/Slovak_declension.html<br />
* http://www.juls.savba.sk/msj/<br />
<br />
===[[Thai]]===<br />
* https://github.com/veer66/Yaitron Yaitron English-Thai and Thai-English XML dictionary, license seems standard 4-clause<br />
<br />
===[[Urdu]]===<br />
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-hi-ur.ur.dix apertium-hi-ur.ur.dix]''<br />
<br />
;Resources<br />
* http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ &mdash; GPL analyser of Urdu<br />
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system<br />
<br />
<br />
[[Category:Development]]<br />
[[Category:Repository]]<br />
[[Category:Documentation in English]]</div>
Dharjunior