Difference between revisions of "Talk:Why we trim"

From Apertium
Jump to navigation Jump to search
(idea)
 
Line 20: Line 20:
 
# Interchunk should need no change, since it doesn't deal with the insides of the chunk.
 
# Interchunk should need no change, since it doesn't deal with the insides of the chunk.
 
# Postchunk might need a slight change to notice and output the original surface form.
 
# Postchunk might need a slight change to notice and output the original surface form.
# Generation might need a slight change to notice and output the original surface form iff there was no possible generation.
 
 
</pre>
 
</pre>

Revision as of 11:40, 12 April 2013

A possible way of dealing with keeping surface forms when splitting mwe's: put full surface on the first part, no surface on the rest. Example, assuming "magasin" is missing from bidix:

$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin
^vannmagasin/vann<n><nt><sg><ind><cmp>+magasin<n><nt><sg><ind>/vann<n><nt><sg><ind><cmp>+magasin<n><nt><pl><ind>$
$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob
^vannmagasin/vann<n><nt><sg><ind><cmp>+magasin<n><nt><sg><ind>$

$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer
^vannmagasin/vann<n><nt><sg><ind><cmp>$ ^/magasin<n><nt><sg><ind>$
# Currently, apertium-pretransfer outputs ^magasin<n><nt><sg><ind>$, we'd need it to ensure an empty surface form here

$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer | lt-proc -o nb-nn.autobil.bin
^*vannmagasin/vann<n><nt><sg><ind><cmp>/vatn<n><nt><sg><ind><cmp>$ ^/magasin<n><nt><sg><ind>/@magasin<n><nt><sg><ind>$
# Currently, lt-proc -o only _accepts_ surface forms, it doesn't output them (nor does it output @analysis correctly)

$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer | lt-proc -o nb-nn.autobil.bin | apertium-transfer -o apertium-nn-nb.nb-nn.t1x nb-nn.t1x.bin 
^n_n<n><nt><sg><ind>{^vannmagasin/vatn<n><nt><sg><ind><cmp>$^/@magasin<n><nt><sg><ind>$}$
# apertium-transfer would need an -o option that is able to pass through the surface form

# Interchunk should need no change, since it doesn't deal with the insides of the chunk. 
# Postchunk might need a slight change to notice and output the original surface form.