User:Eden/GSoC progress
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
		
		
		
		
		
		
	
Community Bonding Period
- Find Swahili-Lingala resources
 - Update Lingala lexc transducer to lexd
 - New lexd transducer for Swahili
 - Keep track of coverage for Lin and Swa transducers
 - Get familiar with apertium-recursive
 - Set up 
swa-linpair using apertium-recursive - Update GSOC progress page
 - James and Marry story + Wikipedia article in Swahili and Lingala.
 
Goals
- By first evaluation: have story about kids or similar text to WER/PER of around 20% (work with all stages of translation, focus on "lowest-hanging fruit" relevant to the text)
 - By second evaluation: increase [trimmed] coverage to around 90% (work focused on lexicons, adding from frequency lists)
 - By final evaluation: work to get clean testvoc (work focused on transfer, making sure everything is dealt with one way or other)
 
Status table
| Week | Stems | naïve coverage | WER,PER | Progress | |||||
|---|---|---|---|---|---|---|---|---|---|
| № | dates | lin | lin-eng | lin | lin-eng | lin→eng | eng→lin | Evaluation | Notes | 
| 1 | June 1 - June 7 | ||||||||
| 2 | May 8 - June 14 | ||||||||
| 3 | June 15 - June 21 | ||||||||
| 4 | June 22 - June 28 | ||||||||
| 5 | June 29 - July 5 | ||||||||
| 6 | July 6 - July 12 | ||||||||
| 7 | July 13 - July 19 | ||||||||
| 8 | July 20 - July 26 | ||||||||
| 9 | July 27 - Aug 2 | ||||||||
| 10 | July 3 - Aug 9 | ||||||||
| 11 | Aug 10 - Aug 16 | ||||||||
| 12 | Aug 17 - Aug 23 | ||||||||
Notes
- To count stems in 
lexc, try: 
grep -E ":\w+.*;" apertium-lin.lin.lexc | grep -v "[<>]" | wc -l
- To count stems in the bidix, try this:
 
grep "<p" apertium-eng-lin.eng-lin.dix | wc -l
- To get WER and PER use 
apertium-eval-translator-line 
- Coverage above is on 2019-05-20 Wikipedia dump.