Difference between revisions of "Talk:Scandinavian MT project"

From Apertium
Jump to navigation Jump to search
(complete rerun)
Line 15: Line 15:
! Direction !! w/o cmp !! regular
! Direction !! w/o cmp !! regular
|-
|-
| nob-nno || ||
| nob-nno || 90.9% || 92.6%
|-
|-
| nno-nob || ||
| nob-dan || 89.8% || 91.5%
|-
|-
| dan-nob || ||
| nno-nob || 89.2% || 90.6%
|-
|-
| dan-nno || 84.5% ||
| nno-dan || 87.4% || 88.8%
|-
|-
| nno-dan || 87.6% || 89.0%
| dan-nob || 85.1% || 86.4%
|-
|-
| nob-dan || 89.7% || 91.4%
| swe-dan || 80.4% || 83.7%
|-
|-
| dan-swe || 80.6% || 83.0%
| dan-nno || 82.5% || 83.5%
|-
|-
| swe-dan || 80.4 || 83.8%
| dan-swe || 80.6% || 82.9%
|-
|-
| swe-nno || ||
| nob-swe || 74.9% || 76.2%
|-
|-
| swe-nob || ||
| nno-swe || 73.5% || 74.6%
|-
|-
| nno-swe || ||
| swe-nob || 69.2% || 72.1%
|-
|-
| nob-swe || ||
| swe-nno || 69.1% || 71.9%
|}
|}

Revision as of 10:23, 8 March 2016

Coverage on Wikipedia dumps ("w/o cmp" is with decompounding turned off, ie. without the -e switch to lt-proc).

E.g.

bzcat ~/corpora/nnclean2.txt.bz2 \
  |tr ' ' '\n' \
  |grep -m5113060 . \
  |apertium-deshtml \
  |lt-proc nno-dan.automorf.bin \
  |apertium-cleanstream -n \
  |awk 'BEGIN{OFS=FS="\t"} /^\^/{lu++} /\/\*/{u++} END{print "unk","known","tot","cov %";print u,lu-u,lu,100*(lu-u)/lu}'
Direction w/o cmp regular
nob-nno 90.9% 92.6%
nob-dan 89.8% 91.5%
nno-nob 89.2% 90.6%
nno-dan 87.4% 88.8%
dan-nob 85.1% 86.4%
swe-dan 80.4% 83.7%
dan-nno 82.5% 83.5%
dan-swe 80.6% 82.9%
nob-swe 74.9% 76.2%
nno-swe 73.5% 74.6%
swe-nob 69.2% 72.1%
swe-nno 69.1% 71.9%