Difference between revisions of "Hindi and Bengali"

From Apertium
Jump to navigation Jump to search
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
1.[https://wiki.apertium.org/wiki/User:Gourab337/GSoC2021-Workplan-Control | Workplan for Apertium-ben-hin @ GSoc 2021]
 
   
  +
=Hindi and Bengali for GSoC '21=
2.[https://wiki.apertium.org/wiki/Hindi_and_Bengali/Pending-Tests | Pending Tests for Apertium-ben-hin]
 
  +
  +
This is a language pair translating between [[Hindi]] and [[Bengali]]. The project involved developing the Hindi-Bengali language pair in both directions i.e. ben-hin and hin-ben. The work involved building two dictionaries from an existing open-source project with very minimal work done, i.e., the Bengali monolingual dictionary and the Bengali-Hindi bilingual dictionary. Although it was not anticipated, several errors were found in the Hindi paradigms. So, the Hindi monolingual dictionary was modified. The Bengali dictionary was restructured too to match the Hindi dictionary.
  +
  +
The work was divided over 11 weeks. The work on the Bengali and Bengali-Hindi dictionary began with working on the closed categories and later words were added according to the frequency. The work on the Hindi dictionary began after the 6th week, and several paradigms were corrected (missing tags were added, removed and reordered, and several paradigms were marked as deprecated).
  +
  +
==Current Status==
  +
  +
* Currently there are 7078 words excluding proper names in the monolingual dictionary and 1718 words excluding proper names in the bilingual dictionary.
  +
* Current coverage of Hin-Ben translator is ~67.8% and Ben-Hin translator is ~49.7%.
  +
* The Bengali monolingual dictionary coverage is ~72.0%.
 
* [https://wiki.apertium.org/wiki/User:Gourab337/GSoC2021-Workplan-Control Workplan for GSOC '21 Ben-Hin]
  +
  +
==Goals==
  +
  +
Currently the translator is very basic. We need to increase it's coverage to cover more words of the languages. We also need to add more transfer rules to cover all the [https://wiki.apertium.org/wiki/Hindi_and_Bengali/Pending-Tests Pending Tests] to get more accurate translations.
  +
  +
==Done==
  +
* <s>Closed Categories (n, adj, vblex, vbser, adv, prn, post, cnjcoo, cnjsub, cnjadv, det, num, prn, ord).</s>
  +
* <s>Most frequently used nouns, post, adj, adv, det added.</s>
  +
* <s>Hin > Ben transfer rules on nouns, verbs tenses and adj added.</s>
  +
* <s>Testing scripts and test corpus.</s>
  +
  +
==Todo list==
  +
* Increase coverage of translator by adding more nouns, adjectives and verbs from the list of most frequently used words in corpus. [https://wiki.apertium.org/wiki/Building_dictionaries Reference]
  +
* Add transfer rules to fix pronoun #s (obj -> obl , nom -> nom, erg conversion).
  +
* Write transfer rules for [https://wiki.apertium.org/wiki/Hindi_and_Bengali/Pending-Tests Pending Tests] (Ben > Hin and Hin > Ben).
  +
* Add more rules in the pending tests.
  +
* Remove prox and dist tag in the bidix and replace it by making suitable paradigms for det.prox & det.dist (ইটা / ওটা).
  +
* Working on lexical selections.
  +
* Morphological disambiguation of Hindi sentences for Hindi-Bengali translation.
  +
  +
==Apertium Git Repositories==
  +
*[https://github.com/apertium/apertium-ben-hin apertium-ben-hin]
  +
*[https://github.com/apertium/apertium-hin apertium-hin]
  +
*[https://github.com/apertium/apertium-ben apertium-ben]
  +
  +
==External Resources==
  +
  +
===General===
  +
  +
* [https://github.com/banglakit/awesome-bangla A Useful Collection of Resources (Important)]
  +
* [https://en.wikipedia.org/wiki/Bengali_grammar Bengali Grammar]
  +
* [http://www.cal.org/heritage/pdfs/voices-hindi-language.pdf Introduction to Hindi]
  +
* [https://www.researchgate.net/publication/313030658_Learning_of_Hindi_Phonology_as_a_Foreigner Learning of Hindi Phonology as a Foreigner]
  +
* [https://benjamins.com/catalog/loall.12 Hindi by Yamuna Kachru]
  +
  +
===Dictionaries===
  +
  +
* http://hindi-english.org/
  +
* http://e-mahashabdkosh.rb-aai.in/
  +
* https://shabdkosh.raftaar.in/Hindi-English-Dictionary
  +
* http://www.aamboli.com/
  +
  +
===Corpora===
  +
  +
* [http://corpora.uni-leipzig.de/en?corpusId=hin_news_2011 Hindi News Corpus]
  +
* [https://arxiv.org/abs/1710.02855 The IIT Bombay English-Hindi Parallel Corpus]
  +
* [https://catalog.ldc.upenn.edu/LDC2008L02 Hindi WordNet]
  +
  +
  +
==See also==
  +
 
* [https://wiki.apertium.org/wiki/Hindi_and_Bengali/Pending-Tests Pending Tests for Apertium-ben-hin]
  +
* [[Bengali]]
  +
* [[Hindi]]
  +
  +
[[Category:Hindi and Bengali]]

Latest revision as of 05:45, 25 August 2021

Hindi and Bengali for GSoC '21[edit]

This is a language pair translating between Hindi and Bengali. The project involved developing the Hindi-Bengali language pair in both directions i.e. ben-hin and hin-ben. The work involved building two dictionaries from an existing open-source project with very minimal work done, i.e., the Bengali monolingual dictionary and the Bengali-Hindi bilingual dictionary. Although it was not anticipated, several errors were found in the Hindi paradigms. So, the Hindi monolingual dictionary was modified. The Bengali dictionary was restructured too to match the Hindi dictionary.

The work was divided over 11 weeks. The work on the Bengali and Bengali-Hindi dictionary began with working on the closed categories and later words were added according to the frequency. The work on the Hindi dictionary began after the 6th week, and several paradigms were corrected (missing tags were added, removed and reordered, and several paradigms were marked as deprecated).

Current Status[edit]

  • Currently there are 7078 words excluding proper names in the monolingual dictionary and 1718 words excluding proper names in the bilingual dictionary.
  • Current coverage of Hin-Ben translator is ~67.8% and Ben-Hin translator is ~49.7%.
  • The Bengali monolingual dictionary coverage is ~72.0%.
  • Workplan for GSOC '21 Ben-Hin

Goals[edit]

Currently the translator is very basic. We need to increase it's coverage to cover more words of the languages. We also need to add more transfer rules to cover all the Pending Tests to get more accurate translations.

Done[edit]

  • Closed Categories (n, adj, vblex, vbser, adv, prn, post, cnjcoo, cnjsub, cnjadv, det, num, prn, ord).
  • Most frequently used nouns, post, adj, adv, det added.
  • Hin > Ben transfer rules on nouns, verbs tenses and adj added.
  • Testing scripts and test corpus.

Todo list[edit]

  • Increase coverage of translator by adding more nouns, adjectives and verbs from the list of most frequently used words in corpus. Reference
  • Add transfer rules to fix pronoun #s (obj -> obl , nom -> nom, erg conversion).
  • Write transfer rules for Pending Tests (Ben > Hin and Hin > Ben).
  • Add more rules in the pending tests.
  • Remove prox and dist tag in the bidix and replace it by making suitable paradigms for det.prox & det.dist (ইটা / ওটা).
  • Working on lexical selections.
  • Morphological disambiguation of Hindi sentences for Hindi-Bengali translation.

Apertium Git Repositories[edit]

External Resources[edit]

General[edit]

Dictionaries[edit]

Corpora[edit]


See also[edit]