Talk:Why we trim
Jump to navigation
Jump to search
A possible way of dealing with keeping surface forms when splitting mwe's: put full surface on the first part, no surface on the rest. Example, assuming "magasin" is missing from bidix:
$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin ^vannmagasin/vann<n><nt><sg><ind><cmp>+magasin<n><nt><sg><ind>/vann<n><nt><sg><ind><cmp>+magasin<n><nt><pl><ind>$ $ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob ^vannmagasin/vann<n><nt><sg><ind><cmp>+magasin<n><nt><sg><ind>$ $ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer ^vannmagasin/vann<n><nt><sg><ind><cmp>$ ^/magasin<n><nt><sg><ind>$ # Currently, apertium-pretransfer outputs ^magasin<n><nt><sg><ind>$, we'd need it to ensure an empty surface form here $ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer | lt-proc -o nb-nn.autobil.bin ^*vannmagasin/vann<n><nt><sg><ind><cmp>/vatn<n><nt><sg><ind><cmp>$ ^/magasin<n><nt><sg><ind>/@magasin<n><nt><sg><ind>$ # Currently, lt-proc -o only _accepts_ surface forms, it doesn't output them (nor does it output @analysis correctly) $ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer | lt-proc -o nb-nn.autobil.bin | apertium-transfer -o apertium-nn-nb.nb-nn.t1x nb-nn.t1x.bin ^n_n<n><nt><sg><ind>{^vannmagasin/vatn<n><nt><sg><ind><cmp>$^/@magasin<n><nt><sg><ind>$}$ # apertium-transfer would need an -o option that is able to pass through the surface form # Interchunk should need no change, since it doesn't deal with the insides of the chunk. # Postchunk might need a slight change to notice and output the original surface form.