Difference between revisions of "User:Eden/GSoC progress"

From Apertium
Jump to navigation Jump to search
(6 intermediate revisions by the same user not shown)
Line 119: Line 119:
 
|-
 
|-
 
| 9
 
| 9
| July 12 - July 28
+
| July 22 - July 28
  +
| 1,320
  +
| 1,600
  +
| 73.24%
  +
| 68.92%
  +
| 50.02%,41.55%
  +
| 52.81%,40.09%
 
|
 
|
 
|-
  +
| 10
  +
| July 29 - Aug 04
  +
|
 
|
 
|
 
|
 
|
Line 127: Line 137:
 
|
 
|
 
|
 
|
  +
| Work was mainly on lexical selection rules. First half of Bible translation(~1,100 words) is understandable.
|
 
  +
|-
  +
| 11
  +
| Aug 5 - Aug 11
  +
| 1,341
  +
| 1,661
  +
| 75.35%
  +
| 69.33%
  +
| 48.97%,39.18%
  +
| 53.99%,41.49%
  +
|
  +
| Lexical selection rules for 'na' and 'ya'. WER in eng-lin went up because I commented out some words in the bidix.
  +
|-
  +
| 12
  +
| Aug 12 - Aug 18
  +
| 1,444
  +
| 1,700
  +
| 76.5%
  +
| 71.10%
  +
| 48.52%,37.81%
  +
| 50.13%,38.13%
  +
|
  +
| Added missing morphology for determinants and adjectives.
  +
|-
 
|}
 
|}
   

Revision as of 06:28, 19 August 2019

Status table

Week Stems naïve coverage WER,PER Progress
dates lin lin-eng lin lin-eng lin→eng eng→lin Evaluation Notes
0 May 20 - May 26 727 139 61.95% 40.86% 86.79%,80.87% 75.27%,63.98%
1 May 27 - June 02 904 139 62.57% 40.86% 86.79%,80.87% 75.27%,63.98%
2 May 03 - June 09 1,154 1,416 63.17% 53.03% 87.02%,79.95% 74.46%,60.22%
3 June 10 - June 16 1,172 1,501 61.60% 91.57%,79.04% 75.85%,62.90% WER for 'lin-eng' went up because of an incomplete rule for verbs that creates unnecessary pronouns. Main work next week will be on rules to dramatically improve WER and PER.
4 June 17 - June 23 1,200 1,540 69.70% 62.70% 79.27%,64.24% 84.41%,72.58%
5 June 24 - June 30 1,200 1,556 70.21% 61.90% 77.68%,67.88% 85.48%,73.92%
6 July 1 - July 7
7 July 8 - July 14 1,236 1,577 69.35% 60.47% 60.59%,46.47% 72.61%,58.68% Work was done on lexical selection and rules about determinants. Current lexical selection works well with the text currently in use, which is a more rigid and literary Lingala. Further tests will be run on texts from the Wikipedia corpus to generalize lexical rules.
8 July 15 - July 21 1,280 1,580 72.81% 68.62% 52.62%,42.82% 59.04%,46.28% WER went down in both directions by approximately 2% after I added accents, and missing ɔ́ ɔ ɛ́ ɛ. Next focus will be on negation and trying to find a bigger corpus(>1000 words).
9 July 22 - July 28 1,320 1,600 73.24% 68.92% 50.02%,41.55% 52.81%,40.09%
10 July 29 - Aug 04 Work was mainly on lexical selection rules. First half of Bible translation(~1,100 words) is understandable.
11 Aug 5 - Aug 11 1,341 1,661 75.35% 69.33% 48.97%,39.18% 53.99%,41.49% Lexical selection rules for 'na' and 'ya'. WER in eng-lin went up because I commented out some words in the bidix.
12 Aug 12 - Aug 18 1,444 1,700 76.5% 71.10% 48.52%,37.81% 50.13%,38.13% Added missing morphology for determinants and adjectives.

Notes

  • To count stems in lexc, try:
 grep -E ":\w+.*;" apertium-lin.lin.lexc | grep -v "[<>]" | wc -l
  • To count stems in the bidix, try this:
 grep "<p" apertium-eng-lin.eng-lin.dix  | wc -l
  • To get WER and PER use apertium-eval-translator-line