<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=User%3AEden%2FGSOC2019_English-Lingala</id>
	<title>User:Eden/GSOC2019 English-Lingala - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=User%3AEden%2FGSOC2019_English-Lingala"/>
	<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Eden/GSOC2019_English-Lingala&amp;action=history"/>
	<updated>2026-05-14T06:32:25Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.34.1</generator>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Eden/GSOC2019_English-Lingala&amp;diff=71654&amp;oldid=prev</id>
		<title>Eden: Created page with &quot;== My goal == I’m planning to start the ‘English-Lingala’ language pair.&lt;br/&gt; At first I was only planning to work in one direction(eng-lin), but following &#039;&#039;firespeaker...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Eden/GSOC2019_English-Lingala&amp;diff=71654&amp;oldid=prev"/>
		<updated>2020-03-28T09:53:22Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;== My goal == I’m planning to start the ‘English-Lingala’ language pair.&amp;lt;br/&amp;gt; At first I was only planning to work in one direction(eng-lin), but following &amp;#039;&amp;#039;firespeaker...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== My goal ==&lt;br /&gt;
I’m planning to start the ‘English-Lingala’ language pair.&amp;lt;br/&amp;gt;&lt;br /&gt;
At first I was only planning to work in one direction(eng-lin), but following &amp;#039;&amp;#039;firespeaker&amp;#039;&amp;#039; &amp;#039;s suggestion, I will also work in the other direction to make my mentor&amp;#039;s job easier. &lt;br /&gt;
&lt;br /&gt;
== Why am I interested in Apertium? ==&lt;br /&gt;
Apertium is at the intersection of computers and languages, which are two of my passions. &lt;br /&gt;
This will be my first ever contribution to an open source project. For the short amount of time I have been on the IRC and the mailing list, the Apertium community has made it a fun and enjoyable experience for me. I hope, not only to develop an English-Lingala pair but also, to become a long-time contributor to Apertium, mainly by creating new English/French-African Language pairs.&lt;br /&gt;
&lt;br /&gt;
== Who will benefit and why should it get sponsored ==&lt;br /&gt;
&lt;br /&gt;
African languages are poorly represented in Apertium and even other commercially available options are usually quite lacking. Given that Lingala, and most African languages do not always have a lot of digitized content accessible, it&amp;#039;s hard to use any machine learning or NLP tools to build translators since massive amount of data for these languages do not exist. In such cases, a rule-based MT tool like Apertium becomes the most viable option.&lt;br /&gt;
&lt;br /&gt;
Lingala is a Bantu Language, mainly used as a lingua franca, in central Africa(mainly in the Democratic Republic of Congo and to some extent in Angola and the Republic of Congo) with over 70 million speakers(https://en.wikipedia.org/wiki/Lingua_franca). Developing an English-Lingala pair will, I believe, positively contribute to the technological and economic development of these underserved places. Hopefully this translator will serve a lot of people and organizations. From Wikipedia contributors, to casual users, and to other open source software that might need a Lingala translator. &lt;br /&gt;
&lt;br /&gt;
== Lingala resources ==&lt;br /&gt;
Here is a list of &amp;#039;&amp;#039;open&amp;#039;&amp;#039; and &amp;#039;&amp;#039;public domain&amp;#039;&amp;#039; resources(dictionaries, grammar books, texts, etc) for the Lingala language:&amp;lt;br/&amp;gt;&lt;br /&gt;
- [http://crubadan.org/languages/ln Crubadan text corpus] A text corpus sorted by word frequency&amp;lt;br/&amp;gt;&lt;br /&gt;
- The excellent, [https://archive.org/details/suggestionsforgr00stap Grammar and dictionary of Bangala] &amp;lt;br/&amp;gt;&lt;br /&gt;
- [http://unicode.org/udhr/d/udhr_lin_tones.html Universal Declaration of Human Rights - Lingala (tones)] &amp;lt;br/&amp;gt;&lt;br /&gt;
- [https://archive.org/details/ERIC_ED294440/page/n189 Lingala. Livre du formatteur] Lingala teacher&amp;#039;s manual (I will have to confirm if this book is in the public domain)&amp;lt;br/&amp;gt;&lt;br /&gt;
- [https://archive.org/details/rosettaproject_lin_gen-1 Bible] and [https://archive.org/details/TranslationOfTheMeaningOfTheNobleQuranInTheLINGALABANTULanguageHQJUZZAMMA/page/n25 Quran] can be used as parallel texts.&amp;lt;br/&amp;gt;&lt;br /&gt;
- [https://archive.org/details/NotionsDeLingala/page/n29 Notions de Lingala] - Another dictionary plus common Lingala sentences &amp;lt;br/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Coding challenge ==&lt;br /&gt;
All my work is in my repo: https://github.com/thefreezer/GSOC-apertium-eng-lin &amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Update 1: Apr/1/19&amp;#039;&amp;#039;&amp;#039;&amp;lt;br/&amp;gt;&lt;br /&gt;
1. Added ~95% of all words from this [https://sourceforge.net/p/apertium/svn/HEAD/tree/branches/xupaixkar/rasskaz/ story]. &amp;lt;br/&amp;gt;&lt;br /&gt;
2. From the &amp;#039;&amp;#039;493&amp;#039;&amp;#039;-word story, my final translation has &amp;#039;&amp;#039;74&amp;#039;&amp;#039; unknown words(*) and &amp;#039;&amp;#039;63&amp;#039;&amp;#039; words with the wrong final form(#). Most of them are verbs, adj and adv. Original story is [https://github.com/thefreezer/GSOC-apertium-eng-lin/blob/master/story_eng.txt here] and [https://github.com/thefreezer/GSOC-apertium-eng-lin/blob/master/eng-lin-output.txt here] is the final output.&amp;lt;br/&amp;gt;(eng-lin)&lt;br /&gt;
3. Added 8 rules which give me correct translations for:&amp;lt;br/&amp;gt;&lt;br /&gt;
* prn/np vblex/vbhaver/vbser det n (eg. I see a house) with correct present and past(saw) verb tenses&lt;br /&gt;
* prn/np vblex/vbhaver/vbser pr det adj n(eg. Mary eats in the beautiful garden)&lt;br /&gt;
* and other rules for dealing with the infitive form of a verb, and handling the [https://en.wikipedia.org/wiki/Pro-drop_language pro-drop] behavior of the language.&lt;br /&gt;
I will try to implement a rule for dealing with the future tense(eg. I will play ...)&amp;lt;br/&amp;gt;&lt;br /&gt;
&amp;#039;&amp;#039;Note: a lot of these rules are inspired from the eng-fra pair&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
{{comment|It would be good to be able to evaluate WER, so a correct Lingala version of the story would be very useful —[[User:Firespeaker|Firespeaker]] ([[User talk:Firespeaker|talk]]) 03:50, 8 April 2019 (CEST).}}&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Update 2: Apr/7/19&amp;#039;&amp;#039;&amp;#039;&amp;lt;br/&amp;gt;&lt;br /&gt;
1. Added the full Lingala translation [https://github.com/thefreezer/GSOC-apertium-eng-lin/blob/master/story_lin.txt here].&amp;lt;br/&amp;gt;&lt;br /&gt;
2. lin-eng: &amp;#039;&amp;#039;75.27% WER&amp;#039;&amp;#039;&amp;lt;br/&amp;gt;&lt;br /&gt;
3. eng-lin: &amp;#039;&amp;#039;85.65% WER&amp;#039;&amp;#039; (I mostly focused on lin-eng, which explains why this direction is higher)&amp;lt;br/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Work plan ==&lt;br /&gt;
 community bonding period &lt;br /&gt;
 - reading more about transfer-rules and creating a doc for eng-lin lin-eng rules&lt;br /&gt;
 - build a better frequency list of Lingala words&lt;br /&gt;
 - reading more about the HFST&lt;br /&gt;
 - continue work and achieve a WER&amp;lt;50% from at least one direction&lt;br /&gt;
&lt;br /&gt;
 Week 1: &lt;br /&gt;
 - adding nouns(from frequency list) in the lin transducer&lt;br /&gt;
 - adding verbs with correct tenses in the lin transducer&lt;br /&gt;
 - constraint grammar&lt;br /&gt;
&lt;br /&gt;
 Week 2:&lt;br /&gt;
 - adding pronouns and adjectives in the lin transducer &lt;br /&gt;
 - also adding adverbs, conjunctions, prepositions, etc&lt;br /&gt;
 - constraint grammar for prn and adj&lt;br /&gt;
&lt;br /&gt;
 Week 3:  &lt;br /&gt;
 - polishing the transducer to give better analyses&lt;br /&gt;
 - filling nouns and adjectives in bilingual dictionary, &lt;br /&gt;
 - regression testing&lt;br /&gt;
&lt;br /&gt;
 Week 4:  &lt;br /&gt;
 - transfer rules for nouns and adjectives(both directions)&lt;br /&gt;
 - disambiguation rules&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Deliverable #1&amp;#039;&amp;#039;&amp;#039; Advanced Lingala transducer with basic bilingual dictionary&lt;br /&gt;
&lt;br /&gt;
 Week 5:  &lt;br /&gt;
 - continue work on bilingual dictionary,&lt;br /&gt;
 - main work will be on verbs&lt;br /&gt;
 - transfer rules for verbs in both directions&lt;br /&gt;
&lt;br /&gt;
 Week 6:  &lt;br /&gt;
 - filling pronouns, adverbs, and others in the bidix&lt;br /&gt;
 - work on compound Lingala words&lt;br /&gt;
 - transfer rules for pronouns, adverbs and compound nouns(both directions)&lt;br /&gt;
&lt;br /&gt;
 Week 7: &lt;br /&gt;
 - adding determinants and more adjectives in the bidix&lt;br /&gt;
 - WER &amp;lt; 35% on a 500 word story&lt;br /&gt;
 - add/polish rules for concordance between verbs and pronouns&lt;br /&gt;
&lt;br /&gt;
 Week 8: &lt;br /&gt;
 - continue work on transfer rules in .t2x and t3x files&lt;br /&gt;
 - work on disambiguation(eng-lin, lin-eng) &lt;br /&gt;
 - lots of testing and improvement of bilingual dictionary&lt;br /&gt;
 - WER &amp;lt; 30% in both directions on a 1000 word story&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Deliverable #2&amp;#039;&amp;#039;&amp;#039; Advanced bilingual dictionary(~5,000 words) and transfer rules&lt;br /&gt;
&lt;br /&gt;
 Week 9 :&lt;br /&gt;
 - continue work on disambiguation(both directions)&lt;br /&gt;
 - testvoc and improvements&lt;br /&gt;
 - filling bidix(common nouns)&lt;br /&gt;
&lt;br /&gt;
 Week 10:&lt;br /&gt;
 - work on transfer rules, &lt;br /&gt;
 - goal is WER &amp;lt; 30% on a story greater &amp;gt; 1000 words(is this achievable?)&lt;br /&gt;
&lt;br /&gt;
 Week 11:&lt;br /&gt;
 - continue work on transfer rules and testing, &lt;br /&gt;
 - Wikipedia article translations&lt;br /&gt;
 - continue filling bidix&lt;br /&gt;
&lt;br /&gt;
 Week 12:&lt;br /&gt;
 - filling bidix with miscellaneous words &lt;br /&gt;
 - detailed analysis of work completed(wiki),&lt;br /&gt;
 - evaluation of results and documentation&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Project completed&amp;#039;&amp;#039;&amp;#039; WER~30%(with ~7,000 words in bidix) in both directions on most texts&lt;br /&gt;
&lt;br /&gt;
== Skills and qualifications ==&lt;br /&gt;
Ongoing major: first year Computer Science students with a minor in Statistics&amp;lt;br /&amp;gt;&lt;br /&gt;
Relevant technical skills: python(online data mining, inferential statistics, numpy, pandas, matplotlib), c++(elementary), sql(intermediate), git(intermediate), bash(intermediate), html5/css3(advanced)&amp;lt;br /&amp;gt;&lt;br /&gt;
Work experience: as an intern created static and dynamic websites&amp;lt;br /&amp;gt;&lt;br /&gt;
Languages: French(native), Lingala(native), English(Fluent) , Swahili(proficient), Tshiluba(proficient), Twi(elementary)&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Non-Summer-of-Code plans ==&lt;br /&gt;
Traveling to Ontario for 5 days from June 29, but that will not affect my work. No other commitments which will allow me to put it at least 40+ hours a week for the duration of the project.&lt;/div&gt;</summary>
		<author><name>Eden</name></author>
		
	</entry>
</feed>