Difference between revisions of "Serbo-Croatian and Macedonian/Final report"

From Apertium
Jump to navigation Jump to search
Line 7: Line 7:
; Dictionaries
; Dictionaries


* sh morphological analyser lexicon:
* sh morphological analyser lexicon: 7564 lemmata, 170787 surface forms (including ekavian/ijekavian)
* <code>apertium-sh-mk.sh-mk.dix</code> (unique: , total: )
* <code>apertium-sh-mk.sh-mk.dix</code> (unique: , total: )


Line 14: Line 14:
* (bs|hr|sr|sh) Wikipedia ( , std. dev.: )
* (bs|hr|sr|sh) Wikipedia ( , std. dev.: )
* (sr|hr) SETimes ( , std. dev.: )
* (sr|hr) SETimes ( , std. dev.: )

; Testvoc
{|class=wikitable
!
}


; Rules
; Rules


; Error rate (Realistic results for now only for <code>setimes.pilots.txt</code>, the rest is just preliminary postedited)
; Error rate


{|class=wikitable
{|class=wikitable
Line 24: Line 29:
| <code>setimes.pilots.txt</code> || 454 || 0.44% || 29.96% || 20.48% || - || -
| <code>setimes.pilots.txt</code> || 454 || 0.44% || 29.96% || 20.48% || - || -
|-
|-
| <code>setimes.tablice.txt</code> || - || - || - || - || - || -
| <code>setimes.tablice.txt</code> || 466 || 0.43% || 12.23% || 9.23% || - || -
|-
|-
| <code>setimes.klupa.txt</code> || - || - || - || - ||-||-
| <code>setimes.klupa.txt</code> || 477 || 18.12% || 14.68% || 12.37% ||-||-
|-
|-
| <code>setimes.povijest.txt</code> || - || - || - || - || - || -
| <code>setimes.povijest.txt</code> || 519 || 14.18% || 11.95% || 9.25% || - || -
|-
|-
| <code>wikipedia.kadinlar_askerler.sh.txt</code> || - || - || - || - || - || -
| <code>wikipedia.txt</code> || - || - || - || - || - || -
|-
|-
|}
|}

Revision as of 17:40, 25 August 2011

73.624631444 +/- 0.488418931215

Description

note: this is still just a sketch

Statistics

Dictionaries
  • sh morphological analyser lexicon: 7564 lemmata, 170787 surface forms (including ekavian/ijekavian)
  • apertium-sh-mk.sh-mk.dix (unique: , total: )
Coverage
  • (bs|hr|sr|sh) Wikipedia ( , std. dev.: )
  • (sr|hr) SETimes ( , std. dev.: )
Testvoc

}

Rules
Error rate (Realistic results for now only for setimes.pilots.txt, the rest is just preliminary postedited)
File Num. Words % OOV WER (Sur) PER (Sur) WER (Lem) PER (Lem)
setimes.pilots.txt 454 0.44% 29.96% 20.48% - -
setimes.tablice.txt 466 0.43% 12.23% 9.23% - -
setimes.klupa.txt 477 18.12% 14.68% 12.37% - -
setimes.povijest.txt 519 14.18% 11.95% 9.25% - -
wikipedia.txt - - - - - -

Future work