Difference between revisions of "User:Eden/GSoC progress"
Jump to navigation
Jump to search
m (→Status table) |
|||
Line 1: | Line 1: | ||
== Community Bonding Period == |
|||
* Find Swahili-Lingala resources |
|||
* Update Lingala lexc transducer to lexd |
|||
* New lexd transducer for Swahili |
|||
* Keep track of coverage for Lin and Swa transducers |
|||
* Get familiar with apertium-recursive |
|||
* Set up <code>swa-lin</code> pair using apertium-recursive |
|||
* Update GSOC progress page |
|||
== Status table == |
== Status table == |
||
Line 19: | Line 28: | ||
!Evaluation |
!Evaluation |
||
!Notes |
!Notes |
||
|- |
|||
| 0 |
|||
| May 20 - May 26 |
|||
| 727 |
|||
| 139 |
|||
| 61.95% |
|||
| 40.86% |
|||
| 86.79%,80.87% |
|||
| 75.27%,63.98% |
|||
| |
|||
| |
|||
|- |
|- |
||
| 1 |
| 1 |
||
| |
| June 1 - June 7 |
||
| 904 |
|||
| 139 |
|||
| 62.57% |
|||
| 40.86% |
|||
| 86.79%,80.87% |
|||
| 75.27%,63.98% |
|||
| |
|||
| |
|||
|- |
|- |
||
| 2 |
| 2 |
||
| May |
| May 8 - June 14 |
||
| 1,154 |
|||
| 1,416 |
|||
| 63.17% |
|||
| 53.03% |
|||
| 87.02%,79.95% |
|||
| 74.46%,60.22% |
|||
| |
|||
|- |
|- |
||
| 3 |
| 3 |
||
| June |
| June 15 - June 21 |
||
| 1,172 |
|||
| 1,501 |
|||
| |
|||
| 61.60% |
|||
| 91.57%,79.04% |
|||
| 75.85%,62.90% |
|||
| |
|||
| WER for 'lin-eng' went up because of an incomplete rule for verbs that creates unnecessary pronouns. Main work next week will be on rules to dramatically improve WER and PER. |
|||
|- |
|- |
||
| 4 |
| 4 |
||
| June |
| June 22 - June 28 |
||
| 1,200 |
|||
| 1,540 |
|||
| 69.70% |
|||
| 62.70% |
|||
| 79.27%,64.24% |
|||
| 84.41%,72.58% |
|||
| |
|||
| |
|||
|- |
|- |
||
| 5 |
| 5 |
||
| June |
| June 29 - July 5 |
||
| 1,200 |
|||
| 1,556 |
|||
| 70.21% |
|||
| 61.90% |
|||
| 77.68%,67.88% |
|||
| 85.48%,73.92% |
|||
| |
|||
| |
|||
|- |
|- |
||
| 6 |
| 6 |
||
| July |
| July 6 - July 12 |
||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
|- |
|- |
||
| 7 |
| 7 |
||
| July |
| July 13 - July 19 |
||
|1,236 |
|||
|1,577 |
|||
|69.35% |
|||
|60.47% |
|||
|60.59%,46.47% |
|||
|72.61%,58.68% |
|||
| |
|||
|Work was done on lexical selection and rules about determinants. Current lexical selection works well with the text currently in use, which is a more rigid and literary Lingala. Further tests will be run on texts from the Wikipedia corpus to generalize lexical rules. |
|||
|- |
|- |
||
| 8 |
| 8 |
||
| July |
| July 20 - July 26 |
||
|1,280 |
|||
|1,580 |
|||
|72.81% |
|||
|68.62% |
|||
|52.62%,42.82% |
|||
|59.04%,46.28% |
|||
| |
|||
| WER went down in both directions by approximately 2% after I added accents, and missing ɔ́ ɔ ɛ́ ɛ. Next focus will be on negation and trying to find a bigger corpus(>1000 words). |
|||
|- |
|- |
||
| 9 |
| 9 |
||
| July |
| July 27 - Aug 2 |
||
| 1,320 |
|||
| 1,600 |
|||
| 73.24% |
|||
| 68.92% |
|||
| 50.02%,41.55% |
|||
| 52.81%,40.09% |
|||
| |
|||
|- |
|- |
||
| 10 |
| 10 |
||
| July |
| July 3 - Aug 9 |
||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| |
|||
| Work was mainly on lexical selection rules. First half of Bible translation(~1,100 words) is understandable. |
|||
|- |
|- |
||
| 11 |
| 11 |
||
| Aug |
| Aug 10 - Aug 16 |
||
| 1,341 |
|||
| 1,661 |
|||
| 75.35% |
|||
| 69.33% |
|||
| 48.97%,39.18% |
|||
| 53.99%,41.49% |
|||
| |
|||
| Lexical selection rules for 'na' and 'ya'. WER in eng-lin went up because I commented out some words in the bidix. |
|||
|- |
|- |
||
| 12 |
| 12 |
||
| Aug |
| Aug 17 - Aug 23 |
||
| 1,444 |
|||
| 1,700 |
|||
| 76.5% |
|||
| 71.10% |
|||
| 48.52%,37.81% |
|||
| 50.13%,38.13% |
|||
| |
|||
| Added missing morphology for determinants and adjectives. |
|||
|- |
|- |
||
|} |
|} |
Revision as of 04:45, 19 May 2020
Community Bonding Period
- Find Swahili-Lingala resources
- Update Lingala lexc transducer to lexd
- New lexd transducer for Swahili
- Keep track of coverage for Lin and Swa transducers
- Get familiar with apertium-recursive
- Set up
swa-lin
pair using apertium-recursive - Update GSOC progress page
Status table
Week | Stems | naïve coverage | WER,PER | Progress | |||||
---|---|---|---|---|---|---|---|---|---|
№ | dates | lin | lin-eng | lin | lin-eng | lin→eng | eng→lin | Evaluation | Notes |
1 | June 1 - June 7 | ||||||||
2 | May 8 - June 14 | ||||||||
3 | June 15 - June 21 | ||||||||
4 | June 22 - June 28 | ||||||||
5 | June 29 - July 5 | ||||||||
6 | July 6 - July 12 | ||||||||
7 | July 13 - July 19 | ||||||||
8 | July 20 - July 26 | ||||||||
9 | July 27 - Aug 2 | ||||||||
10 | July 3 - Aug 9 | ||||||||
11 | Aug 10 - Aug 16 | ||||||||
12 | Aug 17 - Aug 23 |
Notes
- To count stems in
lexc
, try:
grep -E ":\w+.*;" apertium-lin.lin.lexc | grep -v "[<>]" | wc -l
- To count stems in the bidix, try this:
grep "<p" apertium-eng-lin.eng-lin.dix | wc -l
- To get WER and PER use
apertium-eval-translator-line
- Coverage above is on 2019-05-20 Wikipedia dump.