Difference between revisions of "User:Anakuznetsova/GSOC 2018 Guarani Spanish"

From Apertium
Jump to navigation Jump to search
(Created page with "= Adoption of Guarani-Spanish language pair in Apertium = == GSoC Commits == All the GSoC commits on the project could be found here (post link). == Contacts == Anastasi...")
 
Line 11: Line 11:


'''GitHub:''' ana-kuznetsova
'''GitHub:''' ana-kuznetsova

'''Phone number:''' +7 916 804 79 55


'''Timezone:''' UTC+3
'''Timezone:''' UTC+3

Revision as of 10:06, 1 August 2018

Adoption of Guarani-Spanish language pair in Apertium

GSoC Commits

All the GSoC commits on the project could be found here (post link).

Contacts

Anastasia Kuznetsova

E-mail: menina.indigena.17@gmail.com

GitHub: ana-kuznetsova

Timezone: UTC+3

Project description

A project of adoption of Guarani-Spanish language pair in Apertium had as its purpose a creation of machine translation system between Guarani and Spanish languages. As Guarani is one of the low-resource languages of the world the translation system is unlikely to be developed by other methods than Rule-Based Machine Translation. As the evidence of that we had only about 2800 texts from Wikipedia dumps [1] and Guarani-Spanish aligned Bible [2] as a source.

Generally project consisted of three main parts:

  • Morphological analyzer for Guarani
  • Guarani-Spanish bilingual dictionary (bidix)
  • Transfer rules

A detailed work plan for the project can be found here.

Morphological Analyzer

We had to develop morphological analyzer almost from scratch. And the most challenging thing from the beginning was to find any properly organized lists of words or Guarani dictionaries. By the end of the Community Bonding period after 2 week of work we were able to analyze only 30% of words contained in wiki corpora.