Difference between revisions of "Talk:Scandinavian MT project"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
Coverage on Wikipedia dumps ("w/o cmp" is with decompounding turned off):
Coverage on Wikipedia dumps ("w/o cmp" is with decompounding turned off, ie. without the -e switch to lt-proc).

E.g.
<pre>
bzcat ~/corpora/nnclean2.txt.bz2 \
|tr ' ' '\n' \
|grep -m5113060 . \
|apertium-deshtml \
|lt-proc nno-dan.automorf.bin \
|apertium-cleanstream -n \
|awk 'BEGIN{OFS=FS="\t"} /^\^/{lu++} /\/\*/{u++} END{print "unk","known","tot","cov %";print u,lu-u,lu,100*(lu-u)/lu}'
</pre>


{|class=wikitable
{|class=wikitable

Revision as of 09:02, 8 March 2016

Coverage on Wikipedia dumps ("w/o cmp" is with decompounding turned off, ie. without the -e switch to lt-proc).

E.g.

bzcat ~/corpora/nnclean2.txt.bz2 \
  |tr ' ' '\n' \
  |grep -m5113060 . \
  |apertium-deshtml \
  |lt-proc nno-dan.automorf.bin \
  |apertium-cleanstream -n \
  |awk 'BEGIN{OFS=FS="\t"} /^\^/{lu++} /\/\*/{u++} END{print "unk","known","tot","cov %";print u,lu-u,lu,100*(lu-u)/lu}'
Direction w/o cmp regular
nob-nno
nno-nob
dan-nob
dan-nno
nno-dan
nob-dan 89.7% 91.4%
dan-swe 80.6% 83.0%
swe-dan 83.8%
swe-nno
swe-nob
nno-swe
nob-swe