Serbo-Croatian and Macedonian/Final report

From Apertium
Jump to navigation Jump to search

73.624631444 +/- 0.488418931215

Description

Statistics

Dictionaries
  • sh morphological analyser lexicon: 7564 lemmata, 170787 surface forms (including ekavian/ijekavian)
  • apertium-sh-mk.sh-mk.dix (unique: 9985, total: 13032)
Coverage
  • (bs|hr|sr|sh) Wikipedia ( , std. dev.: )
  • (sr|hr) SETimes ( , std. dev.: )
Testvoc
POS Total Clean With @ With # Clean %
adj 2072699 2072699 0 0 100
vblex 246713 246713 0 0 100
vbmod 96020 96020 0 0 100
np 54190 54190 0 0 100
n 41477 41477 0 0 100
adv 7808 7808 0 0 100
prn 7662 7662 0 0 100
num 4284 4284 0 0 100
vbhaver 224 224 0 0 100
vbser 170 170 0 0 100
pr 77 77 0 0 100
abbr 33 33 0 0 100
cnjsub 30 30 0 0 100
cnjcoo 20 20 0 0 100
vaux 0 0 0 0 100
rel 0 0 0 0 100
preadv 0 0 0 0 100
ij 0 0 0 0 100
guio 0 0 0 0 100
det 0 0 0 0 100
cnjadv 0 0 0 0 100
cm 0 0 0 0 100
Rules

apertium-sh-mk.sh-mk.t1x: 51

apertium-sh-mk.sh-mk.t2x: 11

apertium-sh-mk.sh-mk.t3x: 1

Error rate (Realistic results for now only for setimes.pilots.txt, the rest is just preliminary postedited)
File Num. Words % OOV WER (Sur) PER (Sur) WER (Lem) PER (Lem)
setimes.pilots.txt 454 0.44% 29.96% 20.48% - -
setimes.tablice.txt 466 0.43% 12.23% 9.23% - -
setimes.klupa.txt 477 18.12% 14.68% 12.37% - -
setimes.povijest.txt 519 14.18% 11.95% 9.25% - -
wikipedia.txt - - - - - -

Future work