Difference between revisions of "User:Eden"

From Apertium
Jump to navigation Jump to search
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
 
 
== Contact Information ==
 
== Contact Information ==
 
Name: Eden-Grace Muamba<br />
 
Name: Eden-Grace Muamba<br />
Line 6: Line 4:
 
University: University of Alberta<br />
 
University: University of Alberta<br />
 
E-mail: nzambieden@gmail.com<br />
 
E-mail: nzambieden@gmail.com<br />
IRC: eden__<br />
+
IRC: petzin/eden__<br />
 
Timezone: UTC -7<br />
 
Timezone: UTC -7<br />
 
Github: https://github.com/thefreezer<br />
 
Github: https://github.com/thefreezer<br />
   
== My goal ==
+
== About me ==
  +
My main focus is on bringing Bantu languages to Apertium.
I’m planning to start the ‘English-Lingala’ language pair.
 
 
== Why am I interested in Apertium? ==
 
Apertium is at the intersection of computers and languages, which are two of my passions.
 
This will be my first ever contribution to an open source project. For the short amount of time I have been on the IRC and the mailing list, the Apertium community has made it a fun and enjoyable experience for me. I hope, not only to develop an English-Lingala pair but also, to become a long-time contributor to Apertium, mainly by creating new English/French-African Language pairs.
 
 
== Who will benefit and why should it get sponsored ==
 
 
African languages are poorly represented in Apertium and even other commercially available options are usually quite lacking. Given that Lingala, and most African languages do not always have a lot of digitized content accessible, it's hard to use any machine learning or NLP tools to build translators since massive amount of data for these languages do not exist. In such cases, a rule-based MT tool like Apertium becomes the most viable option.
 
 
Lingala is a Bantu Language, mainly used as a lingua franca, in central Africa(mainly in the Democtratic Republic of Congo and to some extent in Angola and the Republic of Congo) with over 70 million speakers(https://en.wikipedia.org/wiki/Lingua_franca). Developing an English-Lingala pair will, I believe, positevely contribute to the technological and economic development of these underserved places. Hopefully this translator will serve a lot of people and organizations. From Wikipedia contributors, to casual users, and to other open source software that might need a Lingala translator.
 
 
== Coding challenge ==
 
1. Installed Apertium tools<br />
 
2. All my work is in my repo: https://github.com/thefreezer/GSOC-apertium-eng-lin <br />
 
I will add a couple more rules and macros.
 
 
== Work plan ==
 
(this page will frequently change as I get more familiar with Apertium)
 
* community bonding period : reading more about transfer-rules and creating a doc for Lingala rules
 
* Week 1: adding stems to transducer
 
* Week 2: work on pronouns and adding adjectives
 
* Week 3: filling nouns and adjectives in bilingual dictionary, regression testing
 
* Week 4: transfer rules for nouns and adjectives
 
 
* '''Deliverable #1''' Advanced Lingala transducer with basic bilingual dictionary
 
 
* Week 5: continue work on bilingual dictionary, filling verbs
 
* Week 6: filling pronouns, adverbs, and others
 
* Week 7: transfer rules for verbs, pronouns, determinants, and adverbs, and others
 
* Week 8: work on disambiguation, lots of testing and improvement of bilingual dictionary(WER < 50%)
 
 
* '''Deliverable #2''' Advanced bilingual dictionary and transfer rules
 
   
* Week 9 : continue work on disambiguation
+
== Things I work on ==
  +
=== Transducers ===
* Week 10: work on transfer rules, testvoc. goal is WER < 40%(is this achievable?)
 
* Week 11: continue work on transfer rules and testing, wikipedia transalations
 
* Week 12: detailed analysis of work completed(wiki), evaluation of results and documentation
 
   
  +
{|class="wikitable sortable"
* '''Project completed''' Goal is to have a WER < 35%
 
  +
|-
  +
! project !! language !! stems !! coverage
  +
|-
  +
! [https://github.com/thefreezer/apertium-swa apertium-swa]
  +
| Swahili ||align="right"| ||align="center"|
  +
|-
  +
! [https://github.com/apertium/apertium-lin apertium-lin]
  +
| Lingala ||align="right"| ||align="center"|
  +
|}
   
  +
=== MT-pair ===
== Skills and qualifications ==
 
Ongoing major: first year Computer Science students with a minor in Statistics<br />
 
Relevant technical skills: python(online data mining, inferential statistics, numpy, pandas, matplotlib), c++(proficient), sql(elementary), git(proficient), bash(proficient), html5/css3(advanced)<br />
 
Work experience: as an intern created static and dynamic websites<br />
 
Languages: French(native), English(native), Lingala(Fluent), Swahili(proficient), Tshiluba(proficient), Twi(elementary)<br />
 
   
  +
{|class="wikitable sortable"
== Non-Summer-of-Code plans ==
 
  +
|-
Traveling to Ontario for 5 days from June 29, but that will not affect my work. I’m committed to put it at least 40+ hours a week for the duration of the project.
 
  +
! project !! languages !! stems !! coverage
  +
|-
  +
! [https://github.com/apertium/apertium-eng-lin apertium-eng-lin]
  +
| English-Lingala
  +
|align="right"| ||
  +
|-
 
! [https://github.com/thefreezer/apertium-eng-swa apertium-swa-eng]
  +
| Swahili-English
  +
|align="right"| ||
  +
|-
  +
! [apertium-swa-lin]
  +
| Swahili-Lingala
  +
|align="right"| ||
  +
|}

Latest revision as of 05:02, 19 May 2020

Contact Information[edit]

Name: Eden-Grace Muamba
Location: Alberta, Canada
University: University of Alberta
E-mail: nzambieden@gmail.com
IRC: petzin/eden__
Timezone: UTC -7
Github: https://github.com/thefreezer

About me[edit]

My main focus is on bringing Bantu languages to Apertium.

Things I work on[edit]

Transducers[edit]

project language stems coverage
apertium-swa Swahili
apertium-lin Lingala

MT-pair[edit]

project languages stems coverage
apertium-eng-lin English-Lingala
apertium-swa-eng Swahili-English
[apertium-swa-lin] Swahili-Lingala