Difference between revisions of "User:Eden/GSoC progress"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) (→Status table: evaluation metrics) |
Firespeaker (talk | contribs) |
||
Line 19: | Line 19: | ||
|- |
|- |
||
!colspan="2"|Week |
!colspan="2"|Week |
||
!colspan=" |
!colspan="3"|Stems |
||
!colspan=" |
!colspan="3"|naïve coverage |
||
!colspan="2"|WER,PER |
!colspan="2"|WER,PER |
||
!colspan="2"|Progress |
!colspan="2"|Progress |
||
Line 26: | Line 26: | ||
! № |
! № |
||
! dates |
! dates |
||
! |
! swa |
||
! lin |
|||
! swa-lin |
! swa-lin |
||
! |
! swa |
||
| lin |
|||
! swa-lin |
! swa-lin |
||
! swa→lin |
! swa→lin |
Revision as of 16:50, 6 June 2020
Community Bonding Period
- Find Swahili-Lingala resources
- Update Lingala lexc transducer to lexd
- New lexd transducer for Swahili
- Keep track of coverage for Lin and Swa transducers
- Get familiar with apertium-recursive
- Set up
swa-lin
pair using apertium-recursive - Update GSOC progress page
- James and Marry story + Wikipedia article in Swahili and Lingala.
Goals
- By first evaluation: have story about kids or similar text to WER/PER of around 20% (work with all stages of translation, focus on "lowest-hanging fruit" relevant to the text)
- By second evaluation: increase [trimmed] coverage to around 90% (work focused on lexicons, adding from frequency lists)
- By final evaluation: work to get clean testvoc (work focused on transfer, making sure everything is dealt with one way or other)
Status table
Week | Stems | naïve coverage | WER,PER | Progress | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
№ | dates | swa | lin | swa-lin | swa | lin | swa-lin | swa→lin | lin→swa | Evaluation | Notes |
0 (community bonding) | May 4 - May 31 | ||||||||||
1 | June 1 - June 7 | ||||||||||
2 | May 8 - June 14 | ||||||||||
3 | June 15 - June 21 | ||||||||||
4 | June 22 - June 28 | ||||||||||
5 | June 29 - July 5 | ||||||||||
6 | July 6 - July 12 | ||||||||||
7 | July 13 - July 19 | ||||||||||
8 | July 20 - July 26 | ||||||||||
9 | July 27 - Aug 2 | ||||||||||
10 | July 3 - Aug 9 | ||||||||||
11 | Aug 10 - Aug 16 | ||||||||||
12 | Aug 17 - Aug 23 |
Notes
- To count stems in
lexc
, try:
grep -E ":\w+.*;" apertium-lin.lin.lexc | grep -v "[<>]" | wc -l
- To count stems in the bidix, try this:
grep "<p" apertium-eng-lin.eng-lin.dix | wc -l
- To get WER and PER use
apertium-eval-translator-line
- Coverage above is on 2019-05-20 Wikipedia dump.