Difference between revisions of "User:Eden/GSoC progress"

Revision as of 11:25, 22 July 2019

Week		Stems		naïve coverage		WER,PER		Progress
№	dates	lin	lin-eng	lin	lin-eng	lin→eng	eng→lin	Evaluation	Notes
0	May 20 - May 26	727	139	61.95%	40.86%	86.79%,80.87%	75.27%,63.98%
1	May 27 - June 02	904	139	62.57%	40.86%	86.79%,80.87%	75.27%,63.98%
2	May 03 - June 09	1,154	1,416	63.17%	53.03%	87.02%,79.95%	74.46%,60.22%
3	June 10 - June 16	1,172	1,501		61.60%	91.57%,79.04%	75.85%,62.90%		WER for 'lin-eng' went up because of an incomplete rule for verbs that creates unnecessary pronouns. Main work next week will be on rules to dramatically improve WER and PER.
4	June 17 - June 23	1,200	1,540	69.70%	62.70%	79.27%,64.24%	84.41%,72.58%
5	June 24 - June 30	1,200	1,556	70.21%	61.90%	77.68%,67.88%	85.48%,73.92%
6	July 1 - July 7
7	July 8 - July 14	1,236	1,577	69.35%	60.47%	60.59%,46.47%	72.61%,58.68%		Work was done on lexical selection and rules about determinants. Current lexical selection works well with the text currently in use, which is a more rigid and literary Lingala. Further tests will be run on texts from the Wikipedia corpus to generalize lexical rules.
8	July 15 - July 21					52.62%,42.82%	59.04%,46.28%		WER went down in both directions by approximately 2% after I added accents, and missing ɔ́ ɔ ɛ́ ɛ. Next focus will be on negation and trying to find a bigger corpus(>1000 words).

 grep -E ":\w+.*;" apertium-lin.lin.lexc | grep -v "[<>]" | wc -l

 grep "<p" apertium-eng-lin.eng-lin.dix  | wc -l

@@ Line 113: / Line 113: @@
 |
 |
-|59.68%,47.84%
+|52.62%,42.82%
-|57.98%,45.21%
+|59.04%,46.28%
 |
 | WER went down in both directions by approximately 2% after I added accents, and missing ɔ́ ɔ ɛ́ ɛ. Next focus will be on negation and trying to find a bigger corpus(>1000 words).