Difference between revisions of "Lt-trim"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
'''lt-trim''' is the application responsible for trimming compiled dictionaries. The
+
'''lt-trim''' is the [[lttoolbox]] application responsible for trimming compiled dictionaries. The
 
analyses (right-side when compiling lr) of analyser_binary are trimmed
 
analyses (right-side when compiling lr) of analyser_binary are trimmed
 
to the input side of bidix_binary (left-side when compiling lr,
 
to the input side of bidix_binary (left-side when compiling lr,
Line 7: Line 7:
 
Both compund tags (`<compound-only-L>', `<compound-R>') and join
 
Both compund tags (`<compound-only-L>', `<compound-R>') and join
 
elements (`&lt;j/&gt;' in XML, `+' in the stream) and the group element
 
elements (`&lt;j/&gt;' in XML, `+' in the stream) and the group element
(`&lt;g/&gt;' in XML, `#' in the stream) should be handled correctly.
+
(`&lt;g/&gt;' in XML, `#' in the stream) should be handled correctly, even
  +
combinations of + followed by # in monodix are handled.
  +
  +
One minor caveat: If you have the capitalised lemma "Foo" in the
  +
monodix, but "foo" in the bidix, an analysis "^Foo<tag>$" would pass
  +
through bidix when doing lt-proc -b, but will not make it through
  +
trimming. Make sure your lemmas have the same capitalisation in the
  +
different dictionaries.
   
 
You should not trim a generator unless you have a '''very''' simple
 
You should not trim a generator unless you have a '''very''' simple
Line 13: Line 20:
 
through transfer.
 
through transfer.
   
  +
==Usage==
  +
<pre>$ lt-trim analyser_binary bidix_binary trimmed_analyser_binary</pre>
  +
  +
E.g. to trim ca-en.automorf.bin using ca-en.autobil.bin:
  +
<pre>$ lt-trim ca-en.automorf.bin ca-en.autobil.bin ca-en.automorf-trimmed.bin</pre>
  +
  +
==Implementation==
   
 
==See also==
 
==See also==

Revision as of 08:27, 11 February 2014

lt-trim is the lttoolbox application responsible for trimming compiled dictionaries. The analyses (right-side when compiling lr) of analyser_binary are trimmed to the input side of bidix_binary (left-side when compiling lr, right-side when compiling rl), such that only analyses which would pass through `lt-proc -b bidix_binary' are kept.

Both compund tags (`<compound-only-L>', `<compound-R>') and join elements (`<j/>' in XML, `+' in the stream) and the group element (`<g/>' in XML, `#' in the stream) should be handled correctly, even combinations of + followed by # in monodix are handled.

One minor caveat: If you have the capitalised lemma "Foo" in the monodix, but "foo" in the bidix, an analysis "^Foo<tag>$" would pass through bidix when doing lt-proc -b, but will not make it through trimming. Make sure your lemmas have the same capitalisation in the different dictionaries.

You should not trim a generator unless you have a very simple translator pipeline, since the output of bidix seldom goes unchanged through transfer.

Usage

$ lt-trim analyser_binary bidix_binary trimmed_analyser_binary

E.g. to trim ca-en.automorf.bin using ca-en.autobil.bin:

$ lt-trim ca-en.automorf.bin ca-en.autobil.bin ca-en.automorf-trimmed.bin

Implementation

See also