Difference between revisions of "User:Eden/GSoC progress"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
  +
== Community Bonding Period ==
  +
* Find Swahili-Lingala resources
  +
* Update Lingala lexc transducer to lexd
  +
* New lexd transducer for Swahili
  +
* Keep track of coverage for Lin and Swa transducers
  +
* Get familiar with apertium-recursive
  +
* Set up <code>swa-lin</code> pair using apertium-recursive
  +
* Update GSOC progress page
  +
 
== Status table ==
 
== Status table ==
   
Line 19: Line 28:
 
!Evaluation
 
!Evaluation
 
!Notes
 
!Notes
|-
 
| 0
 
| May 20 - May 26
 
| 727
 
| 139
 
| 61.95%
 
| 40.86%
 
| 86.79%,80.87%
 
| 75.27%,63.98%
 
|
 
|
 
 
|-
 
|-
 
| 1
 
| 1
| May 27 - June 02
+
| June 1 - June 7
| 904
 
| 139
 
| 62.57%
 
| 40.86%
 
| 86.79%,80.87%
 
| 75.27%,63.98%
 
|
 
|
 
 
|-
 
|-
 
| 2
 
| 2
| May 03 - June 09
+
| May 8 - June 14
| 1,154
 
| 1,416
 
| 63.17%
 
| 53.03%
 
| 87.02%,79.95%
 
| 74.46%,60.22%
 
|
 
 
|-
 
|-
 
| 3
 
| 3
| June 10 - June 16
+
| June 15 - June 21
| 1,172
 
| 1,501
 
|
 
| 61.60%
 
| 91.57%,79.04%
 
| 75.85%,62.90%
 
|
 
| WER for 'lin-eng' went up because of an incomplete rule for verbs that creates unnecessary pronouns. Main work next week will be on rules to dramatically improve WER and PER.
 
 
|-
 
|-
 
| 4
 
| 4
| June 17 - June 23
+
| June 22 - June 28
| 1,200
 
| 1,540
 
| 69.70%
 
| 62.70%
 
| 79.27%,64.24%
 
| 84.41%,72.58%
 
|
 
|
 
 
|-
 
|-
 
| 5
 
| 5
| June 24 - June 30
+
| June 29 - July 5
| 1,200
 
| 1,556
 
| 70.21%
 
| 61.90%
 
| 77.68%,67.88%
 
| 85.48%,73.92%
 
|
 
|
 
 
|-
 
|-
 
| 6
 
| 6
| July 1 - July 7
+
| July 6 - July 12
|
 
|
 
|
 
|
 
|
 
|
 
|
 
|
 
 
|-
 
|-
 
| 7
 
| 7
| July 8 - July 14
+
| July 13 - July 19
|1,236
 
|1,577
 
|69.35%
 
|60.47%
 
|60.59%,46.47%
 
|72.61%,58.68%
 
|
 
|Work was done on lexical selection and rules about determinants. Current lexical selection works well with the text currently in use, which is a more rigid and literary Lingala. Further tests will be run on texts from the Wikipedia corpus to generalize lexical rules.
 
 
|-
 
|-
 
| 8
 
| 8
| July 15 - July 21
+
| July 20 - July 26
|1,280
 
|1,580
 
|72.81%
 
|68.62%
 
|52.62%,42.82%
 
|59.04%,46.28%
 
|
 
| WER went down in both directions by approximately 2% after I added accents, and missing ɔ́ ɔ ɛ́ ɛ. Next focus will be on negation and trying to find a bigger corpus(>1000 words).
 
 
|-
 
|-
 
| 9
 
| 9
| July 22 - July 28
+
| July 27 - Aug 2
| 1,320
 
| 1,600
 
| 73.24%
 
| 68.92%
 
| 50.02%,41.55%
 
| 52.81%,40.09%
 
|
 
 
|-
 
|-
 
| 10
 
| 10
| July 29 - Aug 04
+
| July 3 - Aug 9
|
 
|
 
|
 
|
 
|
 
|
 
|
 
| Work was mainly on lexical selection rules. First half of Bible translation(~1,100 words) is understandable.
 
 
|-
 
|-
 
| 11
 
| 11
| Aug 5 - Aug 11
+
| Aug 10 - Aug 16
| 1,341
 
| 1,661
 
| 75.35%
 
| 69.33%
 
| 48.97%,39.18%
 
| 53.99%,41.49%
 
|
 
| Lexical selection rules for 'na' and 'ya'. WER in eng-lin went up because I commented out some words in the bidix.
 
 
|-
 
|-
 
| 12
 
| 12
| Aug 12 - Aug 18
+
| Aug 17 - Aug 23
| 1,444
 
| 1,700
 
| 76.5%
 
| 71.10%
 
| 48.52%,37.81%
 
| 50.13%,38.13%
 
|
 
| Added missing morphology for determinants and adjectives.
 
 
|-
 
|-
 
|}
 
|}

Revision as of 04:45, 19 May 2020

Community Bonding Period

  • Find Swahili-Lingala resources
  • Update Lingala lexc transducer to lexd
  • New lexd transducer for Swahili
  • Keep track of coverage for Lin and Swa transducers
  • Get familiar with apertium-recursive
  • Set up swa-lin pair using apertium-recursive
  • Update GSOC progress page

Status table

Week Stems naïve coverage WER,PER Progress
dates lin lin-eng lin lin-eng lin→eng eng→lin Evaluation Notes
1 June 1 - June 7
2 May 8 - June 14
3 June 15 - June 21
4 June 22 - June 28
5 June 29 - July 5
6 July 6 - July 12
7 July 13 - July 19
8 July 20 - July 26
9 July 27 - Aug 2
10 July 3 - Aug 9
11 Aug 10 - Aug 16
12 Aug 17 - Aug 23

Notes

  • To count stems in lexc, try:
 grep -E ":\w+.*;" apertium-lin.lin.lexc | grep -v "[<>]" | wc -l
  • To count stems in the bidix, try this:
 grep "<p" apertium-eng-lin.eng-lin.dix  | wc -l
  • To get WER and PER use apertium-eval-translator-line