Difference between revisions of "Faroese and English"
(→Tests) |
|||
Line 49: | Line 49: | ||
* [http://omilia.uio.no/scanlex/toflur/fo.html pronoun list] |
* [http://omilia.uio.no/scanlex/toflur/fo.html pronoun list] |
||
==Problems== |
==Problems== |
||
===Node insertion=== |
|||
How do we insert a node? Ie. what's the Matxin equivalent of <code><out><lu>...</lu></out></code>? This will be necessary for eg. adding determiners in front of nouns (í dag fáa vit svar => today we get '''a''' reply) |
|||
===Deformatter=== |
|||
For some reason, the deformatter always adds "space dot" to the end of everything, |
|||
$ echo "Hví ikki?" | matxin-destxt /tmp/foo |
|||
gives |
|||
Hví ikki? . |
|||
===Interchunk movement=== |
|||
[[Documentation of Matxin#Basque interchunk ordering grammars]] says |
|||
x2+x1 -- The child chunk (x2) is put immediately before the parent chunk (x1) |
|||
however, is any behaviour defined for x1+x2? When I try "fáa vit" (where "fáa" is the parent) and the rule |
|||
true true =1 x1.x2 |
|||
I get the expected "Get we", however, if I have <code>x1+x2</code>, I get "We get". So it seems like if you have a +, it just ignores the order of the x-es. I tested this with the >1 position too, same thing ("fáa vit ikki"). |
|||
==Tests== |
==Tests== |
Revision as of 18:16, 19 November 2009
A test pair using Matxin for the transfer step.
Download from:
https://matxin.svn.sourceforge.net/svnroot/matxin/branches/matxin-fo-en
Needs VISL CG-3 from SVN and hacked Matxin:
https://matxin.svn.sourceforge.net/svnroot/matxin/branches/matxin
Good luck!
Notes for developers
The pipeline looks like this:
lt-proc -w fo-en.automorf.bin |\ cg-proc -w fo-en.dis.rlx.bin |\ apertium-tagger -pg fo-en.prob |\ cg-proc -wf2 fo-en.dep.rlx.bin |\ matxin-xfer-lex -s fo-en.en_sem.bin -c matxin-fo-en.fo-en.chunk_type.dat fo-en.autobil.bin |\ matxin-gen-intra matxin-fo-en.en.order_intrachunk.dat matxin-fo-en.en.changes_sint.dat |\ matxin-gen-inter matxin-fo-en.en.order_interchunk.dat |\ matxin-gen-morph fo-en.autogen.bin |\ matxin-reformat
Writing a dependency grammar CG for Matxin
End your file with something like this:
AFTER-SECTIONS SUBSTITUTE (@SUBJ→) (@SUBJ→ CHUNK) TARGET (@SUBJ→); SUBSTITUTE (@←SUBJ) (@←SUBJ CHUNK) TARGET (@←SUBJ); SUBSTITUTE (@OBJ→) (@OBJ→ CHUNK) TARGET (@OBJ→); SUBSTITUTE (@←OBJ) (@←OBJ CHUNK) TARGET (@←OBJ); ...
to define what you want to be a chunk.
Note that since Matxin expects that no non-CHUNK nodes may have CHUNK children, any word which might get a CHUNK daughter in the CG has to also get the CHUNK label.
See also
Problems
Node insertion
How do we insert a node? Ie. what's the Matxin equivalent of <out><lu>...</lu></out>
? This will be necessary for eg. adding determiners in front of nouns (í dag fáa vit svar => today we get a reply)
Deformatter
For some reason, the deformatter always adds "space dot" to the end of everything,
$ echo "Hví ikki?" | matxin-destxt /tmp/foo
gives
Hví ikki? .
Interchunk movement
Documentation of Matxin#Basque interchunk ordering grammars says
x2+x1 -- The child chunk (x2) is put immediately before the parent chunk (x1)
however, is any behaviour defined for x1+x2? When I try "fáa vit" (where "fáa" is the parent) and the rule
true true =1 x1.x2
I get the expected "Get we", however, if I have x1+x2
, I get "We get". So it seems like if you have a +, it just ignores the order of the x-es. I tested this with the >1 position too, same thing ("fáa vit ikki").