User:Aidana/Proposal/Working plan

From Apertium
Jump to navigation Jump to search

Corpora

Downloads

Expanding vocabulary

Coverage targets

Date Target Achieved Target
achieved
Stems Notes
5925 corpus GCI corpus Wiki 12500 words Akorda 5925 corpus GCI corpus Wiki 12500 words Akorda
23-04-2016 85.70% 93.32% 85.74% 83.21% 86.85% 93.85% 85.74% 83.21% yes 21613 Initial value
30-04-2016 86.00% 93.80% 86.80% 83.50% 88.22% 94.56% 88.14% 86.48% yes 21923
07-05-2016 86.50% 94.30% 87.50% 84.00% 89.36% 94.86% 89.30% 88.12% yes 22242
14-05-2016 87.00% 95.00% 87.70% 84.50% 89.84% 95.08% 90.71% 90.54% yes 22708
21-05-2016 87.50% 96.00% 88.00% 85.00% 89.87% 96.65% 90.79% 90.82% yes 23232 Official GSOC start date
23-05-2016 87.70% 96.50% 88.00% 85.00% 89.872% 96.65% 90.794% 90.836% yes 23238
1-06-2016 88.00% 97.00% 88.30% 85.50% 90.11% 97.24% 90.83% 90.85% yes 23308
10-06-2016 88.30% 97.50% 88.50% 85.70% 90.26% 98.95% 91.00% 92.77% yes 23497
16-06-2016 88.50% 98.00% 88.70% 86.00% 90.37% 98.95% 91.12% 92.94% yes 23593
27-06-2016 89.00% 98.50% 89.00% 86.50% Midterm evaluation
02-07-2016 89.30% 99.00% 89.30% 86.80% 90.46% 98.95% 91.12% 92.98% yes 23633
09-07-2016 89.70% 99.40% 89.70% 87.00%
16-07-2016 90.00% 99.40% 90.00% 87.30%
23-07-2016 90.50% 99.50% 90.30% 87.70%
30-07-2016 90.70% 99.60% 90.70% 88.00%
06-08-2016 91.00% 99.70% 91.00% 88.50%
13-08-2016 91.50% 99.80% 91.50% 89.00%
23-08-2016 92.00% 99.90% 92.00% 90.00% Final target


Midterm evaluation

WER% Before:

Statistics about input files


Number of words in reference: 800 Number of words in test: 572 Number of unknown words (marked with a star) in test: 46 Percentage of unknown words: 8.04 %

Results when removing unknown-word marks (stars)


Edit distance: 751 Word error rate (WER): 93.88 % Number of position-independent correct words: 110 Position-independent word error rate (PER): 86.25 %