Talk:Scandinavian MT project

From Apertium
Revision as of 10:23, 8 March 2016 by Unhammer (talk | contribs) (complete rerun)
Jump to navigation Jump to search

Coverage on Wikipedia dumps ("w/o cmp" is with decompounding turned off, ie. without the -e switch to lt-proc).

E.g.

bzcat ~/corpora/nnclean2.txt.bz2 \
  |tr ' ' '\n' \
  |grep -m5113060 . \
  |apertium-deshtml \
  |lt-proc nno-dan.automorf.bin \
  |apertium-cleanstream -n \
  |awk 'BEGIN{OFS=FS="\t"} /^\^/{lu++} /\/\*/{u++} END{print "unk","known","tot","cov %";print u,lu-u,lu,100*(lu-u)/lu}'
Direction w/o cmp regular
nob-nno 90.9% 92.6%
nob-dan 89.8% 91.5%
nno-nob 89.2% 90.6%
nno-dan 87.4% 88.8%
dan-nob 85.1% 86.4%
swe-dan 80.4% 83.7%
dan-nno 82.5% 83.5%
dan-swe 80.6% 82.9%
nob-swe 74.9% 76.2%
nno-swe 73.5% 74.6%
swe-nob 69.2% 72.1%
swe-nno 69.1% 71.9%