Polish and Russian/Work plan

From Apertium
Jump to navigation Jump to search

Tasks

  • Add pronouns to bilingual dictionary
  • Add determiners to bilingual dictionary
  • Create frequency list of Polish

Weekly plan

Semana Dates Coverage Testvoc Eval. (%) cov. raw (%) cov. trimmed (%) WER Bidix Err. Achieved ?
pol rus pol→rus rus→pol pol→rus rus→pol pol→rus rus→pol
1 18 abril—24 abril 60% 40.3 53.9 21,455 175,678 114,856
2 25 abril—1 mayo 60% 76.9 89.5 59.2 63.8 176620 114375
3 2 mayo—8 mayo 60%
4 9 mayo—15 mayo 60% pr 500 76.9 91.8 62.8 67.9 89.71 82.61 32,575 378059 222370
5 16 mayo—22 mayo 70% 77.0 91.8 70.0 70.7 43075 391272 229155
6 23 mayo—29 mayo 72.5% 77.2 91.8 72.5 74.9 43557 396400 231118
7 30 mayo—5 junio 75% 78.3 93.2 72.6 76.8 43774 422039 231194
8 6 junio—12 junio 77.5% prn, conj 79.5 93.2 74.5 77.9 80.46 77.87 44219 427142 231361
9 13 junio—19 junio 80% 81.2 93.2 80.0 83.3
10 20 junio—26 junio 82%
11 27 junio—3 julio 84% n 500
12 4 julio—10 julio 86%
13 11 junio—17 julio 88% vblex
14 18 julio—24 julio 90%
15 25 julio—31 julio 90% adj
16 1 agosto—7 agosto 90%
17 8 agosto—14 agosto 90% 2000
18 15 agosto—21 agosto 90%

Calculating numbers

Errors (calculate in apertium-pol-rus)
$ sh dev/testvoc/generation.sh pol-rus | wc -l 
$ sh dev/testvoc/generation.sh rus-pol | wc -l
Bidix (calculate in apertium-pol-rus)
$ cat apertium-pol-rus.pol-rus.dix | grep '<l' | wc -l
Trimmed coverage (calculate in apertium-pol-rus)
$ cat pol.crp.txt | apertium -d . pol-rus-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/pol.trim.coverage.txt
$ calc `cat /tmp/pol.trim.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/pol.trim.coverage.txt | wc -l`
Raw coverage (calculate in apertium-pol, apertium-rus)
$ cat pol.crp.txt | apertium -d . pol-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/pol.raw.coverage.txt
$ calc `cat /tmp/pol.raw.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/pol.raw.coverage.txt | wc -l`

or:

$ cat pol.crp.txt | apertium -d . pol-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/pol.raw.coverage.txt
$ COVERED=`cat /tmp/pol.raw.coverage.txt | grep -v '\*' | wc -l `
$ TOTAL=`cat /tmp/pol.raw.coverage.txt | wc -l`
$ echo $COVERED/$TOTAL | bc -l