Difference between revisions of "User:Eden/GSoC progress"

Revision as of 05:49, 18 July 2019

Week		Stems		naïve coverage		WER,PER		Progress
№	dates	lin	lin-eng	lin	lin-eng	lin→eng	eng→lin	Evaluation	Notes
0	May 20 - May 26	727	139	61.95%	40.86%	86.79%,80.87%	75.27%,63.98%
1	May 27 - June 02	904	139	62.57%	40.86%	86.79%,80.87%	75.27%,63.98%
2	May 03 - June 09	1,154	1,416	63.17%	53.03%	87.02%,79.95%	74.46%,60.22%
3	June 10 - June 16	1,172	1,501		61.60%	91.57%,79.04%	75.85%,62.90%		WER for 'lin-eng' went up because of an incomplete rule for verbs that creates unnecessary pronouns. Main work next week will be on rules to dramatically improve WER and PER.
4	June 17 - June 23	1,200	1,540	69.70%	62.70%	79.27%,64.24%	84.41%,72.58%
5	June 24 - June 30	1,200	1,556	70.21%	61.90%	77.68%,67.88%	85.48%,73.92%
6	July 1 - July 7
7	July 8 - July 14	1,236	1,577	69.35%	60.47%	60.59%,46.47%	72.61%,58.68%		Work was done on lexical selection and rules about determinants. Current lexical selection works well with the text currently in use, which is a more rigid and literary Lingala. Further tests will be run on texts from the Wikipedia corpus to generalize lexical rules.
8	July 15 - July 21					59.68%,47.84%	57.98%,45.21%		WER went down in both directions by approximately 2% after I added accents, and missing ɔ́ ɔ ɛ́ ɛ. Next focus will be on negation and trying to find a bigger corpus(>1000 words).

 grep -E ":\w+.*;" apertium-lin.lin.lexc | grep -v "[<>]" | wc -l

 grep "<p" apertium-eng-lin.eng-lin.dix  | wc -l

@@ Line 35: / Line 35: @@
 | 904
 | 139
-|
+| 62.57%
 | 40.86%
 | 86.79%,80.87%
@@ Line 50: / Line 50: @@
 | 87.02%,79.95%
 | 74.46%,60.22%
+|
 |-
 | 3
 | June 10 - June 16
-| 1,169
+| 1,172
-| 1,489
+| 1,501
 |
-| 54.89%
+| 61.60%
+| 91.57%,79.04%
+| 75.85%,62.90%
+|
+| WER for 'lin-eng' went up because of an incomplete rule for verbs that creates unnecessary pronouns. Main work next week will be on rules to dramatically improve WER and PER.
+|-
+| 4
+| June 17 - June 23
+| 1,200
+| 1,540
+| 69.70%
+| 62.70%
+| 79.27%,64.24%
+| 84.41%,72.58%
+|
 |
+|-
+| 5
+| June 24 - June 30
+| 1,200
+| 1,556
+| 70.21%
+| 61.90%
+| 77.68%,67.88%
+| 85.48%,73.92%
+|
 |
+|-
+| 6
+| July 1 - July 7
+|
+|
+|
+|
+|
+|
 |
 |
 |-
+| 7
+| July 8 - July 14
+|1,236
+|1,577
+|69.35%
+|60.47%
+|60.59%,46.47%
+|72.61%,58.68%
+|
+|Work was done on lexical selection and rules about determinants. Current lexical selection works well with the text currently in use, which is a more rigid and literary Lingala. Further tests will be run on texts from the Wikipedia corpus to generalize lexical rules.
+|-
+| 8
+| July 15 - July 21
+|
+|
+|
+|
+|59.68%,47.84%
+|57.98%,45.21%
+|
+| WER went down in both directions by approximately 2% after I added accents, and missing ɔ́ ɔ ɛ́ ɛ. Next focus will be on negation and trying to find a bigger corpus(>1000 words).
 |}