User:Eden/GSOC2020Proposal English-Swahili

From Apertium
< User:Eden
Revision as of 10:59, 28 March 2020 by Eden (talk | contribs) (Created page with "== My goal == I’m planning to work on the ‘English-Swahili’ language pair.<br/> From last year's work on the eng-lin pair, there are 2 main areas I will improve on for t...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

My goal

I’m planning to work on the ‘English-Swahili’ language pair.
From last year's work on the eng-lin pair, there are 2 main areas I will improve on for this year: daily communication with my mentors and having enough Swahili language data.

(TODO) Why am I interested in Apertium?

Apertium is at the intersection of computers and languages, which are two of my passions.


Who will benefit and why should it get sponsored

African languages are poorly represented in Apertium and even other commercially available options are usually quite lacking. Given that Swahili, and most African languages do not always have a lot of digitized content accessible, it's hard to use any machine learning or NLP tools to build translators since massive amount of data for these languages do not exist. In such cases, a rule-based MT tool like Apertium becomes the most viable option.

(TODO) Swahili resources

Here is a list of open and public domain resources(dictionaries, grammar books, texts, etc) for Swahili:

(TODO) Coding challenge

All my work are in 2 main repos: apertium-swa apertium-swa-eng

(TODO) Work plan

community bonding period 
- Swahili corpus from Wikipedia(Done)
- Frequency List
- Work on transfer rules and CG
Week 1: 
- adding nouns(from frequency list) in the lin transducer
Week 2:
- adding pronouns and adjectives in the swa transducer 
Week 3:  
- polishing the transducer to give better analyses
Week 4:  
- transfer rules for nouns and adjectives(both directions)
  • Deliverable #1 ...
Week 5:  
- continue work on bilingual dictionary,
Week 6:  
- filling pronouns, adverbs, and others in the bidix
Week 7: 
- adding determinants and more adjectives in the bidix
Week 8: 
- continue work on transfer rules in .t2x and t3x files
  • Deliverable #2 ...
Week 9 :
- continue work on disambiguation(both directions)
Week 10:
- work on transfer rules, 
Week 11:
- continue work on transfer rules and testing, 
Week 12:
- filling bidix with miscellaneous words 
  • Project completed ...

Skills and qualifications

Ongoing major: second year Computer Science students with a minor in Math
Relevant technical skills: python, c/c++, sql(intermediate), git(intermediate), bash(intermediate), html5/css3(advanced)
Languages: French(native), Lingala(native), English(Fluent) , Swahili(proficient), Tshiluba(proficient), Twi(elementary)

Non-Summer-of-Code plans

None.