User:Eden/GSOC2020Proposal English-Swahili
Contents
My goal
I’m planning to work on the ‘English-Swahili’ language pair.
From last year's work on the eng-lin pair, there are 2 main areas I will improve on for this year: daily communication with my mentors and having enough Swahili language data.
(TODO) Why am I interested in Apertium?
Apertium is at the intersection of computers and languages, which are two of my passions.
Who will benefit and why should it get sponsored
African languages are poorly represented in Apertium and even other commercially available options are usually quite lacking. Given that Swahili, and most African languages do not always have a lot of digitized content accessible, it's hard to use any machine learning or NLP tools to build translators since massive amount of data for these languages do not exist. In such cases, a rule-based MT tool like Apertium becomes the most viable option.
(TODO) Swahili resources
Here is a list of open and public domain resources(dictionaries, grammar books, texts, etc) for Swahili:
(TODO) Coding challenge
All my work are in 2 main repos: apertium-swa apertium-swa-eng
(TODO) Work plan
community bonding period - Swahili corpus from Wikipedia(Done) - Frequency List - Work on transfer rules and CG
Week 1: - adding nouns(from frequency list) in the lin transducer
Week 2: - adding pronouns and adjectives in the swa transducer
Week 3: - polishing the transducer to give better analyses
Week 4: - transfer rules for nouns and adjectives(both directions)
- Deliverable #1 ...
Week 5: - continue work on bilingual dictionary,
Week 6: - filling pronouns, adverbs, and others in the bidix
Week 7: - adding determinants and more adjectives in the bidix
Week 8: - continue work on transfer rules in .t2x and t3x files
- Deliverable #2 ...
Week 9 : - continue work on disambiguation(both directions)
Week 10: - work on transfer rules,
Week 11: - continue work on transfer rules and testing,
Week 12: - filling bidix with miscellaneous words
- Project completed ...
Skills and qualifications
Ongoing major: second year Computer Science students with a minor in Math
Relevant technical skills: python, c/c++, sql(intermediate), git(intermediate), bash(intermediate), html5/css3(advanced)
Languages: French(native), Lingala(native), English(Fluent) , Swahili(proficient), Tshiluba(proficient), Twi(elementary)
Non-Summer-of-Code plans
None.