Difference between revisions of "Polish and Russian/Work plan"

From Apertium
Jump to navigation Jump to search
 
(170 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Tasks==

* Add pronouns to bilingual dictionary
* Add determiners to bilingual dictionary
* Create frequency list of Polish

==Weekly plan==
==Weekly plan==


Line 6: Line 12:
! pol !! rus !! pol→rus !! rus→pol !! pol→rus !! rus→pol !! pol→rus !! rus→pol
! pol !! rus !! pol→rus !! rus→pol !! pol→rus !! rus→pol !! pol→rus !! rus→pol
|-
|-
| 1 || 18 abril—24 abril || 76% || || || || || || || || || || 21,455 || 175,678 || 114,856 ||
| 1 || 18 abril—24 abril || 60% || || || || || || 40.3 || 53.9 || || || 21,455 || 175,678 || 114,856 ||
|-
|-
| 2 || 25 abril—1 mayo || 78% || || || || || || || || || || || || ||
| 2 || 25 abril—1 mayo || 60% || || || || 76.9 || 89.5 || 59.2 || 63.8 || || || ||176620 || 114375||
|-
|-
| 3 || 2 mayo—8 mayo || 80% || pr, cnj*, adv || 500 || || || || || || || || || || ||
| 3 || <s>2 mayo&mdash;8 mayo</s> || 60% || || || || || || || || || || || || ||
|-
|-
| 4 || 9 mayo&mdash;15 mayo || 80% || || || || || || || || || || || || ||
| 4 || 9 mayo&mdash;15 mayo || 60% || pr || 500 || || 76.9 || 91.8 || 62.8 || 67.9 || 89.71 || 82.61 || 32,575 || 378059 || 222370||
|-
|-
| 5 || 16 mayo&mdash;22 mayo || 80.5% || || || || || || || || || || || || ||
| 5 || 16 mayo&mdash;22 mayo || 70% || || || || 77.0 || 91.8 || 70.0 || 70.7 || || ||43075 || 391272 || 229155||
|-
|-
| 6 || 23 mayo&mdash;29 mayo || 81% || prn, det || || || || || || || || || || || ||
| 6 || '''23 mayo'''&mdash;29 mayo || 72.5% || || || || 77.2 || 91.8 || 72.5 || 74.9 || || || 43557|| 396400 || 231118||
|-
|-
| 7 || 30 mayo&mdash;5 junio || 81.5% || || || || || || || || || || || || ||
| 7 || 30 mayo&mdash;5 junio || 75% || || || || 78.3 || 93.2 || 72.6 || 76.8 || || || 43774|| 422039 || 231194||
|-
|-
| 8 || 6 junio&mdash;12 junio || 82% || || || || || || || || || || || || ||
| 8 || 6 junio&mdash;12 junio || 77.5% || prn, conj || || || 79.5 || 93.2 || 74.5 || 77.9 || 80.46 || 77.87 ||44219 || 427142 ||231361 ||
|-
|-
| 9 || 13 junio&mdash;19 junio || 83% || || || || || || || || || || || || ||
| 9 || 13 junio&mdash;19 junio || 80% || || || || 81.2 || 93.2 || 80.0 || 83.3 || || || 44233|| 685564 ||596049 ||
|-
|-
| 10 || 20 junio&mdash;26 junio || 84% || || || || || || || || || || || || ||
| 10 || 20 junio&mdash;26 junio || 82% || || || || 82.7 || 93.2 || 82.0 || 83.9 || || || 44290|| 698797 ||562499 ||
|-
|-
| 11 || 27 junio&mdash;3 julio || 85% || n || 500 || || || || || || || || || || ||
| 11 || 27 junio&mdash;3 julio || 84% || n || 500 || || || || 82.1 || || || || || || ||
|-
|-
| 12 || 4 julio&mdash;10 julio || 86% || || || || || || || || || || || || ||
| 12 || 4 julio&mdash;10 julio || 86% || || || || 85.3 || 93.2 || 84.0 || 83.8 || || || 45109||875760 ||565723 ||
|-
|-
| 13 || 11 junio&mdash;17 julio || 87% || vblex || || || || || || || || || || || ||
| 13 || 11 junio&mdash;17 julio || 88% || vblex || || || 87.3 || || 85.8 || || || ||48473 ||1392919 ||606133 ||
|-
|-
| <s>14</s> || <s>18 julio&mdash;24 julio</s> || 87% || || || || || || || || || || || || ||
| 14 || 18 julio&mdash;24 julio || 90% || || || || 87.6 || 93.8 || 85.9 || || || || 48742|| 1349103 || ||
|-
|-
| 13 || 25 junio&mdash;31 julio || 88% || adj || || || || || || || || || || || ||
| 15 || 25 julio&mdash;31 julio || 90% || adj || || || || 94.2 || 85.1 || || || || ||906150 || ||
|-
|-
| 14 || 1 agosto&mdash;7 agosto || 89% || || || || || || || || || || || || ||
| 16 || 1 agosto&mdash;7 agosto || 90% || || || || || || || || || || || || ||
|-
|-
| 15 || 8 agosto&mdash;14 agosto || 90% || || 2000 || || || || || || || || || || ||
| 17 || 8 agosto&mdash;14 agosto || 90% || || 2000 || || || || || || || || || || ||
|-
|-
| 16 || 15 agosto&mdash;21 agosto || 90% || || || || || || || || || || || || ||
| 18 || 15 agosto&mdash;21 agosto || 90% || || || || || || || || || || || || ||
|-
|-
|}
|}
Line 72: Line 78:
$ calc `cat /tmp/pol.raw.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/pol.raw.coverage.txt | wc -l`
$ calc `cat /tmp/pol.raw.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/pol.raw.coverage.txt | wc -l`
</pre>
</pre>

or:

<pre>
$ cat pol.crp.txt | apertium -d . pol-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/pol.raw.coverage.txt
$ COVERED=`cat /tmp/pol.raw.coverage.txt | grep -v '\*' | wc -l `
$ TOTAL=`cat /tmp/pol.raw.coverage.txt | wc -l`
$ echo $COVERED/$TOTAL | bc -l
</pre>



[[Category:Polish and Russian|Work plan]]
[[Category:Polish and Russian|Work plan]]

Latest revision as of 01:37, 1 August 2016

Tasks[edit]

  • Add pronouns to bilingual dictionary
  • Add determiners to bilingual dictionary
  • Create frequency list of Polish

Weekly plan[edit]

Semana Dates Coverage Testvoc Eval. (%) cov. raw (%) cov. trimmed (%) WER Bidix Err. Achieved ?
pol rus pol→rus rus→pol pol→rus rus→pol pol→rus rus→pol
1 18 abril—24 abril 60% 40.3 53.9 21,455 175,678 114,856
2 25 abril—1 mayo 60% 76.9 89.5 59.2 63.8 176620 114375
3 2 mayo—8 mayo 60%
4 9 mayo—15 mayo 60% pr 500 76.9 91.8 62.8 67.9 89.71 82.61 32,575 378059 222370
5 16 mayo—22 mayo 70% 77.0 91.8 70.0 70.7 43075 391272 229155
6 23 mayo—29 mayo 72.5% 77.2 91.8 72.5 74.9 43557 396400 231118
7 30 mayo—5 junio 75% 78.3 93.2 72.6 76.8 43774 422039 231194
8 6 junio—12 junio 77.5% prn, conj 79.5 93.2 74.5 77.9 80.46 77.87 44219 427142 231361
9 13 junio—19 junio 80% 81.2 93.2 80.0 83.3 44233 685564 596049
10 20 junio—26 junio 82% 82.7 93.2 82.0 83.9 44290 698797 562499
11 27 junio—3 julio 84% n 500 82.1
12 4 julio—10 julio 86% 85.3 93.2 84.0 83.8 45109 875760 565723
13 11 junio—17 julio 88% vblex 87.3 85.8 48473 1392919 606133
14 18 julio—24 julio 90% 87.6 93.8 85.9 48742 1349103
15 25 julio—31 julio 90% adj 94.2 85.1 906150
16 1 agosto—7 agosto 90%
17 8 agosto—14 agosto 90% 2000
18 15 agosto—21 agosto 90%

Calculating numbers[edit]

Errors (calculate in apertium-pol-rus)
$ sh dev/testvoc/generation.sh pol-rus | wc -l 
$ sh dev/testvoc/generation.sh rus-pol | wc -l
Bidix (calculate in apertium-pol-rus)
$ cat apertium-pol-rus.pol-rus.dix | grep '<l' | wc -l
Trimmed coverage (calculate in apertium-pol-rus)
$ cat pol.crp.txt | apertium -d . pol-rus-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/pol.trim.coverage.txt
$ calc `cat /tmp/pol.trim.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/pol.trim.coverage.txt | wc -l`
Raw coverage (calculate in apertium-pol, apertium-rus)
$ cat pol.crp.txt | apertium -d . pol-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/pol.raw.coverage.txt
$ calc `cat /tmp/pol.raw.coverage.txt | grep -v '\*' | wc -l `/`cat /tmp/pol.raw.coverage.txt | wc -l`

or:

$ cat pol.crp.txt | apertium -d . pol-morph | sed 's/\$\W*\^/$\n^/g' > /tmp/pol.raw.coverage.txt
$ COVERED=`cat /tmp/pol.raw.coverage.txt | grep -v '\*' | wc -l `
$ TOTAL=`cat /tmp/pol.raw.coverage.txt | wc -l`
$ echo $COVERED/$TOTAL | bc -l