Apertium-Bhojpuri-Hindi, GSoC '21
Project - Adopt an unreleased language pair, Hindi-Bhojpuri
The project involved developing the Bhojpuri-Hindi language pair in both directions i.e. bho-hin and hin-bho. The work involved building two dictionaries from scratch, i.e., the Bhojpuri monolingual dictionary and the Bhojpuri-Hindi bilingual dictionary. Although it was not anticipated, several errors were found in the Hindi paradigms. So, the Hindi monolingual dictionary was modified.
The work has been done in these three repositories -
- https://github.com/apertium/apertium-bho
- https://github.com/apertium/apertium-bho-hin
- https://github.com/apertium/apertium-hin
The work was divided over 11 weeks. The work on the Bhojpuri and Bhojpuri-Hindi dictionary began with working on the closed categories and later words were added according to the frequency. The work on the Hindi dictionary began after the 6th week, and several paradigms were corrected (missing tags were added, removed and reordered, and several paradigms were marked as deprecated).
The current status of Bhojpuri and Bhojpuri-Hindi dictionary is nearly 800 words in both the dictionaries. The coverage is 58%. The categories that have been covered are -
Closed categories -
- Adpositions (postpositions, there are no prepositions in Bhojpuri and Hindi)
- Conjunctions (cnjcoo, cnjsub, cnjadv)
- Determiners (indefinite, demonstrative, possessive, quantitative)
- Question words (what, who, when, where, why, how, how much/many, etc.)
- Pronouns
- Verb 'to be'
- Nouns
- Adjectives
- Adverbs
The work that's pending is -
- Work on Bhojpuri verbal paradigms (I have copied them from Hindi, but haven't corrected them).
- Working on lexical selections.
- Morphological disambiguation of Hindi sentences for Hindi-Bhojpuri translation.
- Adding more words to the dictionaries.
A few resources that were helpful for this project -