Difference between revisions of "Testvoc"
Jump to navigation
Jump to search
(New page: A '''testvoc''' is literally a test of vocabulary. At the most basic level, it just expands an {{sc|sl}} dictionary, and runs each possibly analysed lexical form through all the transl...) |
|||
Line 1: | Line 1: | ||
A '''testvoc''' is literally a test of vocabulary. At the most basic level, it just expands an {{sc|sl}} dictionary, and runs each possibly analysed [[lexical form]] through all the translation stages to see that for each possible input, a sensible translation in the {{sc|tl}}, without <code>#</code>, or <code>@</code> symbols is generated. |
A '''testvoc''' is literally a test of vocabulary. At the most basic level, it just expands an {{sc|sl}} dictionary, and runs each possibly analysed [[lexical form]] through all the translation stages to see that for each possible input, a sensible translation in the {{sc|tl}}, without <code>#</code>, or <code>@</code> symbols is generated. |
||
<pre> |
|||
TMPDIR=/tmp |
|||
lt-expand ../apertium-br-fr.br.dix | grep -v '<prn><enc>' | grep -e ':<:' -e '\w:\w' | sed 's/:<:/%/g' | sed 's/:/%/g' | cut -f2 -d'%' | sed 's/^/^/g' | sed 's/$/$ ^.<sent>$/g' | tee $TMPDIR/tmp_testvoc1.txt |\ |
|||
apertium-pretransfer|\ |
|||
apertium-transfer ../apertium-br-fr.br-fr.t1x ../br-fr.t1x.bin ../br-fr.autobil.bin |\ |
|||
apertium-interchunk ../apertium-br-fr.br-fr.t2x ../br-fr.t2x.bin |\ |
|||
apertium-postchunk ../apertium-br-fr.br-fr.t3x ../br-fr.t3x.bin | tee $TMPDIR/tmp_testvoc2.txt |\ |
|||
lt-proc -d ../br-fr.autogen.bin > $TMPDIR/tmp_testvoc3.txt |
|||
paste -d _ $TMPDIR/tmp_testvoc1.txt $TMPDIR/tmp_testvoc2.txt $TMPDIR/tmp_testvoc3.txt | sed 's/\^.<sent>\$//g' | sed 's/_/ ---------> /g' |
|||
</pre> |
|||
[[Category:Terminology]] |
[[Category:Terminology]] |
||
[[Category:Quality control]] |
|||
[[Category:Development]] |
[[Category:Development]] |
Revision as of 14:26, 7 December 2009
A testvoc is literally a test of vocabulary. At the most basic level, it just expands an sl dictionary, and runs each possibly analysed lexical form through all the translation stages to see that for each possible input, a sensible translation in the tl, without #
, or @
symbols is generated.
TMPDIR=/tmp lt-expand ../apertium-br-fr.br.dix | grep -v '<prn><enc>' | grep -e ':<:' -e '\w:\w' | sed 's/:<:/%/g' | sed 's/:/%/g' | cut -f2 -d'%' | sed 's/^/^/g' | sed 's/$/$ ^.<sent>$/g' | tee $TMPDIR/tmp_testvoc1.txt |\ apertium-pretransfer|\ apertium-transfer ../apertium-br-fr.br-fr.t1x ../br-fr.t1x.bin ../br-fr.autobil.bin |\ apertium-interchunk ../apertium-br-fr.br-fr.t2x ../br-fr.t2x.bin |\ apertium-postchunk ../apertium-br-fr.br-fr.t3x ../br-fr.t3x.bin | tee $TMPDIR/tmp_testvoc2.txt |\ lt-proc -d ../br-fr.autogen.bin > $TMPDIR/tmp_testvoc3.txt paste -d _ $TMPDIR/tmp_testvoc1.txt $TMPDIR/tmp_testvoc2.txt $TMPDIR/tmp_testvoc3.txt | sed 's/\^.<sent>\$//g' | sed 's/_/ ---------> /g'