Difference between revisions of "User:Eden/GSoC progress"

Revision as of 23:28, 14 August 2019

Status table

Week		Stems		naïve coverage		WER,PER		Progress
№	dates	lin	lin-eng	lin	lin-eng	lin→eng	eng→lin	Evaluation	Notes
0	May 20 - May 26	727	139	61.95%	40.86%	86.79%,80.87%	75.27%,63.98%
1	May 27 - June 02	904	139	62.57%	40.86%	86.79%,80.87%	75.27%,63.98%
2	May 03 - June 09	1,154	1,416	63.17%	53.03%	87.02%,79.95%	74.46%,60.22%
3	June 10 - June 16	1,172	1,501		61.60%	91.57%,79.04%	75.85%,62.90%		WER for 'lin-eng' went up because of an incomplete rule for verbs that creates unnecessary pronouns. Main work next week will be on rules to dramatically improve WER and PER.
4	June 17 - June 23	1,200	1,540	69.70%	62.70%	79.27%,64.24%	84.41%,72.58%
5	June 24 - June 30	1,200	1,556	70.21%	61.90%	77.68%,67.88%	85.48%,73.92%
6	July 1 - July 7
7	July 8 - July 14	1,236	1,577	69.35%	60.47%	60.59%,46.47%	72.61%,58.68%		Work was done on lexical selection and rules about determinants. Current lexical selection works well with the text currently in use, which is a more rigid and literary Lingala. Further tests will be run on texts from the Wikipedia corpus to generalize lexical rules.
8	July 15 - July 21	1,280	1,580	72.81%	68.62%	52.62%,42.82%	59.04%,46.28%		WER went down in both directions by approximately 2% after I added accents, and missing ɔ́ ɔ ɛ́ ɛ. Next focus will be on negation and trying to find a bigger corpus(>1000 words).
9	July 22 - July 28	1,320	1,600	73.24%	68.92%	50.02%,41.55%	52.81%,40.09%
10	July 29 - Aug 04								Work was mainly on lexical selection rules. First half of Bible translation(~1,100 words) is understandable.
11	Aug 5 - Aug 11	1,341	1,661	75.35%	69.33%	48.97%,39.18%	53.99%,41.49%		Lexical selection rules for 'na' and 'ya'. WER in eng-lin went up because I commented out some words in the bidix.
12	Aug 12 - Aug 18			77.06%					Added missing morphology for determinants and adjectives.

Notes

To count stems in lexc, try:

 grep -E ":\w+.*;" apertium-lin.lin.lexc | grep -v "[<>]" | wc -l

To count stems in the bidix, try this:

 grep "<p" apertium-eng-lin.eng-lin.dix  | wc -l

To get WER and PER use apertium-eval-translator-line

Coverage above is on 2019-05-20 Wikipedia dump.

@@ Line 154: / Line 154: @@
 |
 |
-| 76.21%
+| 77.06%
 |
 |

Difference between revisions of "User:Eden/GSoC progress"

Revision as of 23:28, 14 August 2019

Status table

Notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools