Difference between revisions of "Multitrans"
Naan Dhaan (talk | contribs) m |
|||
(4 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
==modes== |
==modes== |
||
===-b=== |
===-b | --biltrans=== |
||
This will output the source along with all target translations, like lt-proc -b. |
This will output the source along with all target translations, like lt-proc -b. |
||
Doing just |
Doing just |
||
<pre> |
<pre> |
||
multitrans sl-tl.autobil.bin |
multitrans -b sl-tl.autobil.bin |
||
</pre> |
</pre> |
||
is equivalent to doing <code>lt-proc -b sl-tl.autobil.bin</code> if the input consists of just correctly formatted lexical units (lt-proc -b fails on some misformattings that multitrans ignores). |
is equivalent to doing <code>lt-proc -b sl-tl.autobil.bin</code> if the input consists of just correctly formatted lexical units (lt-proc -b fails on some misformattings that multitrans ignores). |
||
===-p=== |
===-p | --tagger-output=== |
||
This will output the source side only, so used alone it turns into cat, but used with -t you can trim the tags to what bidix has. |
This will output the source side only, so used alone it turns into cat, but used with -t you can trim the tags to what bidix has. |
||
So if bidix has an entry for kake<n><f>, you'll get |
So if bidix has an entry for kake<n><f>, you'll get |
||
<pre> |
<pre> |
||
$ echo '^kake<n><f><sg><def>$' |multitrans nno-nob.autobil.bin |
$ echo '^kake<n><f><sg><def>$' |multitrans -p -t nno-nob.autobil.bin |
||
^kake<n><f><*>$ |
^kake<n><f><*>$ |
||
</pre> |
</pre> |
||
===-m=== |
===-m | --multitrans=== |
||
This will output one entry on each line with a pair of translations, e.g. |
This will output one entry on each line with a pair of translations, e.g. |
||
<pre> |
<pre> |
||
$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans nor-eng.autobil.bin |
$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -m nor-eng.autobil.bin |
||
.[][0 0].[] ^obsternasig<adj><pst><sg><ind>/obstinate<adj><pst><sg><ind>$ |
.[][0 0].[] ^obsternasig<adj><pst><sg><ind>/obstinate<adj><pst><sg><ind>$ |
||
.[][0 1].[] ^obsternasig<adj><pst><sg><ind>/obdurate<adj><pst><sg><ind>$ |
.[][0 1].[] ^obsternasig<adj><pst><sg><ind>/obdurate<adj><pst><sg><ind>$ |
||
Line 32: | Line 32: | ||
==Options== |
==Options== |
||
===-t=== |
===-t | --trim-lines=== |
||
Trims off tags that don't appear in bidix, e.g. if bidix has an entry for kake<n><f>: |
Trims off tags that don't appear in bidix, e.g. if bidix has an entry for kake<n><f>: |
||
<pre> |
<pre> |
||
$ echo '^kake<n><f><sg><def>$' |multitrans nno-nob.autobil.bin |
$ echo '^kake<n><f><sg><def>$' |multitrans -p -t nno-nob.autobil.bin |
||
^kake<n><f><*>$ |
^kake<n><f><*>$ |
||
</pre> |
</pre> |
||
Can be used with -m or -b as well: |
|||
===-f=== |
|||
<pre> |
|||
what does this do? |
|||
$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -m -t nor-eng.autobil.bin |
|||
.[][0 0].[] ^obsternasig<adj><*>/obstinate<adj><*>$ |
|||
.[][0 1].[] ^obsternasig<adj><*>/obdurate<adj><*>$ |
|||
.[][0 2].[] ^obsternasig<adj><*>/stubborn<adj><*>$ |
|||
.[][0 3].[] ^obsternasig<adj><*>/refractory<adj><*>$ |
|||
$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -b -t nor-eng.autobil.bin |
|||
^obsternasig<adj><*>/obstinate<adj><*>/obdurate<adj><*>/stubborn<adj><*>/refractory<adj><*>$ |
|||
</pre> |
|||
===-f | --filter-lines=== |
|||
Applies filters on the sentences. When applied, outputs only sentences having ambiguous words, <code>fertility < 10000</code>(number of combinations of sentences that can be formed using the ambiguous words) and <code>coverage >= 90</code>(some filter on the number of ambiguous words) |
|||
===-n=== |
===-n | --number-lines=== |
||
Numbers the lines. Doesn't seem to make a difference under the -m mode. |
Numbers the lines. Doesn't seem to make a difference under the -m mode. |
||
===-z | --null-flush=== |
|||
https://wiki.apertium.org/wiki/Null_flush |
|||
[[Category:Lexical selection]] |
[[Category:Lexical selection]] |
Latest revision as of 05:40, 22 August 2021
multitrans is a program found in apertium-lex-tools, used as a helper when training (see Learning rules from parallel and non-parallel corpora).
Contents
modes[edit]
-b | --biltrans[edit]
This will output the source along with all target translations, like lt-proc -b.
Doing just
multitrans -b sl-tl.autobil.bin
is equivalent to doing lt-proc -b sl-tl.autobil.bin
if the input consists of just correctly formatted lexical units (lt-proc -b fails on some misformattings that multitrans ignores).
-p | --tagger-output[edit]
This will output the source side only, so used alone it turns into cat, but used with -t you can trim the tags to what bidix has.
So if bidix has an entry for kake<n><f>, you'll get
$ echo '^kake<n><f><sg><def>$' |multitrans -p -t nno-nob.autobil.bin ^kake<n><f><*>$
-m | --multitrans[edit]
This will output one entry on each line with a pair of translations, e.g.
$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -m nor-eng.autobil.bin .[][0 0].[] ^obsternasig<adj><pst><sg><ind>/obstinate<adj><pst><sg><ind>$ .[][0 1].[] ^obsternasig<adj><pst><sg><ind>/obdurate<adj><pst><sg><ind>$ .[][0 2].[] ^obsternasig<adj><pst><sg><ind>/stubborn<adj><pst><sg><ind>$ .[][0 3].[] ^obsternasig<adj><pst><sg><ind>/refractory<adj><pst><sg><ind>$
Options[edit]
-t | --trim-lines[edit]
Trims off tags that don't appear in bidix, e.g. if bidix has an entry for kake<n><f>:
$ echo '^kake<n><f><sg><def>$' |multitrans -p -t nno-nob.autobil.bin ^kake<n><f><*>$
Can be used with -m or -b as well:
$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -m -t nor-eng.autobil.bin .[][0 0].[] ^obsternasig<adj><*>/obstinate<adj><*>$ .[][0 1].[] ^obsternasig<adj><*>/obdurate<adj><*>$ .[][0 2].[] ^obsternasig<adj><*>/stubborn<adj><*>$ .[][0 3].[] ^obsternasig<adj><*>/refractory<adj><*>$ $ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -b -t nor-eng.autobil.bin ^obsternasig<adj><*>/obstinate<adj><*>/obdurate<adj><*>/stubborn<adj><*>/refractory<adj><*>$
-f | --filter-lines[edit]
Applies filters on the sentences. When applied, outputs only sentences having ambiguous words, fertility < 10000
(number of combinations of sentences that can be formed using the ambiguous words) and coverage >= 90
(some filter on the number of ambiguous words)
-n | --number-lines[edit]
Numbers the lines. Doesn't seem to make a difference under the -m mode.