Difference between revisions of "Multitrans"

Latest revision as of 05:40, 22 August 2021

multitrans is a program found in apertium-lex-tools, used as a helper when training (see Learning rules from parallel and non-parallel corpora).

modes[edit]

-b | --biltrans[edit]

This will output the source along with all target translations, like lt-proc -b.

Doing just

multitrans -b sl-tl.autobil.bin

is equivalent to doing lt-proc -b sl-tl.autobil.bin if the input consists of just correctly formatted lexical units (lt-proc -b fails on some misformattings that multitrans ignores).

-p | --tagger-output[edit]

This will output the source side only, so used alone it turns into cat, but used with -t you can trim the tags to what bidix has.

So if bidix has an entry for kake<n><f>, you'll get

$ echo '^kake<n><f><sg><def>$' |multitrans -p -t nno-nob.autobil.bin
^kake<n><f><*>$

-m | --multitrans[edit]

This will output one entry on each line with a pair of translations, e.g.

$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -m nor-eng.autobil.bin
.[][0 0].[]     ^obsternasig<adj><pst><sg><ind>/obstinate<adj><pst><sg><ind>$
.[][0 1].[]     ^obsternasig<adj><pst><sg><ind>/obdurate<adj><pst><sg><ind>$
.[][0 2].[]     ^obsternasig<adj><pst><sg><ind>/stubborn<adj><pst><sg><ind>$
.[][0 3].[]     ^obsternasig<adj><pst><sg><ind>/refractory<adj><pst><sg><ind>$

Options[edit]

-t | --trim-lines[edit]

Trims off tags that don't appear in bidix, e.g. if bidix has an entry for kake<n><f>:

$ echo '^kake<n><f><sg><def>$' |multitrans -p -t nno-nob.autobil.bin
^kake<n><f><*>$

Can be used with -m or -b as well:

$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -m -t nor-eng.autobil.bin
.[][0 0].[]     ^obsternasig<adj><*>/obstinate<adj><*>$
.[][0 1].[]     ^obsternasig<adj><*>/obdurate<adj><*>$
.[][0 2].[]     ^obsternasig<adj><*>/stubborn<adj><*>$
.[][0 3].[]     ^obsternasig<adj><*>/refractory<adj><*>$

$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -b -t nor-eng.autobil.bin
^obsternasig<adj><*>/obstinate<adj><*>/obdurate<adj><*>/stubborn<adj><*>/refractory<adj><*>$

-f | --filter-lines[edit]

Applies filters on the sentences. When applied, outputs only sentences having ambiguous words, fertility < 10000(number of combinations of sentences that can be formed using the ambiguous words) and coverage >= 90(some filter on the number of ambiguous words)

-n | --number-lines[edit]

Numbers the lines. Doesn't seem to make a difference under the -m mode.

-z | --null-flush[edit]

https://wiki.apertium.org/wiki/Null_flush

@@ Line 3: / Line 3: @@
 ==modes==
-===-b===
+===-b | --biltrans===
 This will output the source along with all target translations, like lt-proc -b.
 Doing just
 <pre>
-multitrans sl-tl.autobil.bin -b
+multitrans -b sl-tl.autobil.bin
 </pre>
 is equivalent to doing <code>lt-proc -b sl-tl.autobil.bin</code> if the input consists of just correctly formatted lexical units (lt-proc -b fails on some misformattings that multitrans ignores).
-===-p===
+===-p | --tagger-output===
 This will output the source side only, so used alone it turns into cat, but used with -t you can trim the tags to what bidix has.
 So if bidix has an entry for kake&lt;n&gt;&lt;f&gt;, you'll get
 <pre>
-$ echo '^kake<n><f><sg><def>$' |multitrans nno-nob.autobil.bin -p -t
+$ echo '^kake<n><f><sg><def>$' |multitrans -p -t nno-nob.autobil.bin
 ^kake<n><f><*>$
 </pre>
-===-m===
+===-m | --multitrans===
 This will output one entry on each line with a pair of translations, e.g.
 <pre>
-$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans nor-eng.autobil.bin -m
+$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -m nor-eng.autobil.bin
 .[][0 0].[]     ^obsternasig<adj><pst><sg><ind>/obstinate<adj><pst><sg><ind>$
 .[][0 1].[]     ^obsternasig<adj><pst><sg><ind>/obdurate<adj><pst><sg><ind>$
@@ Line 32: / Line 32: @@
 ==Options==
-===-t===
+===-t | --trim-lines===
 Trims off tags that don't appear in bidix, e.g. if bidix has an entry for kake&lt;n&gt;&lt;f&gt;:
 <pre>
-$ echo '^kake<n><f><sg><def>$' |multitrans nno-nob.autobil.bin -p -t
+$ echo '^kake<n><f><sg><def>$' |multitrans -p -t nno-nob.autobil.bin
 ^kake<n><f><*>$
 </pre>
@@ Line 41: / Line 41: @@
 Can be used with -m or -b as well:
 <pre>
-$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans nor-eng.autobil.bin -m -t
+$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -m -t nor-eng.autobil.bin
 .[][0 0].[]     ^obsternasig<adj><*>/obstinate<adj><*>$
 .[][0 1].[]     ^obsternasig<adj><*>/obdurate<adj><*>$
@@ Line 47: / Line 47: @@
 .[][0 3].[]     ^obsternasig<adj><*>/refractory<adj><*>$
-$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans nor-eng.autobil.bin -b -t
+$ echo '^obsternasig<adj><pst><sg><ind>$' |multitrans -b -t nor-eng.autobil.bin
 ^obsternasig<adj><*>/obstinate<adj><*>/obdurate<adj><*>/stubborn<adj><*>/refractory<adj><*>$
 </pre>
-===-f===
+===-f | --filter-lines===
+Applies filters on the sentences. When applied, outputs only sentences having ambiguous words, <code>fertility < 10000</code>(number of combinations of sentences that can be formed using the ambiguous words) and <code>coverage >= 90</code>(some filter on the number of ambiguous words)
-what does this do?
-===-n===
+===-n | --number-lines===
 Numbers the lines. Doesn't seem to make a difference under the -m mode.
+===-z | --null-flush===
+https://wiki.apertium.org/wiki/Null_flush
 [[Category:Lexical selection]]

Difference between revisions of "Multitrans"

Latest revision as of 05:40, 22 August 2021

Contents

modes[edit]

-b | --biltrans[edit]

-p | --tagger-output[edit]

-m | --multitrans[edit]

Options[edit]

-t | --trim-lines[edit]

-f | --filter-lines[edit]

-n | --number-lines[edit]

-z | --null-flush[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools