Difference between revisions of "Talk:Scandinavian MT project"
Jump to navigation
Jump to search
| Line 1: | Line 1: | ||
Coverage on Wikipedia dumps ("w/o cmp" is with decompounding turned off) |
Coverage on Wikipedia dumps ("w/o cmp" is with decompounding turned off, ie. without the -e switch to lt-proc). |
||
E.g. |
|||
<pre> |
|||
bzcat ~/corpora/nnclean2.txt.bz2 \ |
|||
|tr ' ' '\n' \ |
|||
|grep -m5113060 . \ |
|||
|apertium-deshtml \ |
|||
|lt-proc nno-dan.automorf.bin \ |
|||
|apertium-cleanstream -n \ |
|||
|awk 'BEGIN{OFS=FS="\t"} /^\^/{lu++} /\/\*/{u++} END{print "unk","known","tot","cov %";print u,lu-u,lu,100*(lu-u)/lu}' |
|||
</pre> |
|||
{|class=wikitable |
{|class=wikitable |
||
Revision as of 09:02, 8 March 2016
Coverage on Wikipedia dumps ("w/o cmp" is with decompounding turned off, ie. without the -e switch to lt-proc).
E.g.
bzcat ~/corpora/nnclean2.txt.bz2 \
|tr ' ' '\n' \
|grep -m5113060 . \
|apertium-deshtml \
|lt-proc nno-dan.automorf.bin \
|apertium-cleanstream -n \
|awk 'BEGIN{OFS=FS="\t"} /^\^/{lu++} /\/\*/{u++} END{print "unk","known","tot","cov %";print u,lu-u,lu,100*(lu-u)/lu}'
| Direction | w/o cmp | regular |
|---|---|---|
| nob-nno | ||
| nno-nob | ||
| dan-nob | ||
| dan-nno | ||
| nno-dan | ||
| nob-dan | 89.7% | 91.4% |
| dan-swe | 80.6% | 83.0% |
| swe-dan | 83.8% | |
| swe-nno | ||
| swe-nob | ||
| nno-swe | ||
| nob-swe |