Difference between revisions of "User:Eden/GSoC progress"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
m (→Status table)  | 
				|||
| (8 intermediate revisions by 2 users not shown) | |||
| Line 7: | Line 7: | ||
* Set up <code>swa-lin</code> pair using apertium-recursive  | 
  * Set up <code>swa-lin</code> pair using apertium-recursive  | 
||
* Update GSOC progress page  | 
  * Update GSOC progress page  | 
||
* James and Marry story + Wikipedia article in Swahili and Lingala.  | 
|||
== Goals ==   | 
|||
* By first evaluation: have story about kids or similar text to WER/PER of around 20% (work with all stages of translation, focus on "lowest-hanging fruit" relevant to the text)  | 
|||
* By second evaluation: increase [trimmed] coverage to around 90% (work focused on lexicons, adding from frequency lists)  | 
|||
* By final evaluation: work to get clean testvoc (work focused on transfer, making sure everything is dealt with one way or other)  | 
|||
== Status table ==  | 
  == Status table ==  | 
||
| Line 13: | Line 19: | ||
|-  | 
  |-  | 
||
!colspan="2"|Week  | 
  !colspan="2"|Week  | 
||
!colspan="  | 
  !colspan="3"|Stems  | 
||
!colspan="  | 
  !colspan="3"|naïve coverage  | 
||
!colspan="2"|WER,PER  | 
  !colspan="2"|WER,PER  | 
||
!colspan="2"|Progress  | 
  !colspan="2"|Progress  | 
||
| Line 20: | Line 26: | ||
! №  | 
  ! №  | 
||
! dates  | 
  ! dates  | 
||
! swa  | 
|||
! lin  | 
  ! lin  | 
||
!   | 
  ! swa-lin  | 
||
! swa  | 
|||
! lin  | 
  ! lin  | 
||
!   | 
  ! swa-lin  | 
||
! swa→lin  | 
|||
! lin→eng  | 
  |||
! lin→swa  | 
|||
! eng→lin  | 
  |||
!Evaluation  | 
  !Evaluation  | 
||
!Notes  | 
  !Notes  | 
||
|-  | 
|||
| 0 (community bonding)  | 
|||
| May 4 - May 31  | 
|||
| 86  | 
|||
| 1,444  | 
|||
| 26  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|-  | 
  |-  | 
||
| 1  | 
  | 1  | 
||
| June 1 - June 7  | 
  | June 1 - June 7  | 
||
| 86  | 
|||
| 1,444  | 
|||
| 26  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|-  | 
  |-  | 
||
| 2  | 
  | 2  | 
||
| May 8 - June 14  | 
  | May 8 - June 14  | 
||
| 170  | 
|||
| 1,444  | 
|||
| 26  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|   | 
|||
| Number of stems in lin transducer comes from prev. estimates. Manually counted stems in swa transducer  | 
|||
|-  | 
  |-  | 
||
| 3  | 
  | 3  | 
||
| June 15 - June 21  | 
  | June 15 - June 21  | 
||
| 303  | 
|||
| 1,444  | 
|||
| 26  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
|  | 
|||
| work was mainly collecting and finding stems.   | 
|||
|-  | 
  |-  | 
||
| 4  | 
  | 4  | 
||
| June 22 - June 28  | 
  | June 22 - June 28  | 
||
| 6,667  | 
|||
| 1,716  | 
|||
| 1,436  | 
|||
|  | 
|||
| 76.5%  | 
|||
|   | 
|||
| 94.40%  | 
|||
| 107.95%  | 
|||
|  | 
|||
| several duplicates in the swa transducer.  | 
|||
|-  | 
  |-  | 
||
| 5  | 
  | 5  | 
||
| Line 66: | Line 127: | ||
|-  | 
  |-  | 
||
|}  | 
  |}  | 
||
== Work ==  | 
|||
* June 8 - June 14  | 
|||
- verb, noun, adjective morphotatics in swa transducer  | 
|||
* June 15 - June 21  | 
|||
- add missing verb TAM(continuative, reciprocal,causative)(<br/>  | 
|||
- more subsections in 'Verb Morphotatics'<br/>  | 
|||
- add stems in swa transducer <br/>  | 
|||
- start writing transfer rules <br/>  | 
|||
== Notes ==  | 
  == Notes ==  | 
||
Latest revision as of 15:06, 27 June 2020
Community Bonding Period[edit]
- Find Swahili-Lingala resources
 - Update Lingala lexc transducer to lexd
 - New lexd transducer for Swahili
 - Keep track of coverage for Lin and Swa transducers
 - Get familiar with apertium-recursive
 - Set up 
swa-linpair using apertium-recursive - Update GSOC progress page
 - James and Marry story + Wikipedia article in Swahili and Lingala.
 
Goals[edit]
- By first evaluation: have story about kids or similar text to WER/PER of around 20% (work with all stages of translation, focus on "lowest-hanging fruit" relevant to the text)
 - By second evaluation: increase [trimmed] coverage to around 90% (work focused on lexicons, adding from frequency lists)
 - By final evaluation: work to get clean testvoc (work focused on transfer, making sure everything is dealt with one way or other)
 
Status table[edit]
| Week | Stems | naïve coverage | WER,PER | Progress | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| № | dates | swa | lin | swa-lin | swa | lin | swa-lin | swa→lin | lin→swa | Evaluation | Notes | 
| 0 (community bonding) | May 4 - May 31 | 86 | 1,444 | 26 | |||||||
| 1 | June 1 - June 7 | 86 | 1,444 | 26 | |||||||
| 2 | May 8 - June 14 | 170 | 1,444 | 26 | Number of stems in lin transducer comes from prev. estimates. Manually counted stems in swa transducer | ||||||
| 3 | June 15 - June 21 | 303 | 1,444 | 26 | work was mainly collecting and finding stems. | ||||||
| 4 | June 22 - June 28 | 6,667 | 1,716 | 1,436 | 76.5% | 94.40% | 107.95% | several duplicates in the swa transducer. | |||
| 5 | June 29 - July 5 | ||||||||||
| 6 | July 6 - July 12 | ||||||||||
| 7 | July 13 - July 19 | ||||||||||
| 8 | July 20 - July 26 | ||||||||||
| 9 | July 27 - Aug 2 | ||||||||||
| 10 | July 3 - Aug 9 | ||||||||||
| 11 | Aug 10 - Aug 16 | ||||||||||
| 12 | Aug 17 - Aug 23 | ||||||||||
Work[edit]
- June 8 - June 14
 
- verb, noun, adjective morphotatics in swa transducer
- June 15 - June 21
 
- add missing verb TAM(continuative, reciprocal,causative)(
- more subsections in 'Verb Morphotatics'
- add stems in swa transducer 
- start writing transfer rules 
Notes[edit]
- To count stems in 
lexc, try: 
grep -E ":\w+.*;" apertium-lin.lin.lexc | grep -v "[<>]" | wc -l
- To count stems in the bidix, try this:
 
grep "<p" apertium-eng-lin.eng-lin.dix | wc -l
- To get WER and PER use 
apertium-eval-translator-line 
- Coverage above is on 2019-05-20 Wikipedia dump.