Difference between revisions of "Faroese and English"
Jump to navigation
Jump to search
m |
|||
Line 1: | Line 1: | ||
A test pair using [[Matxin]] for the transfer step. |
|||
Download from: |
Download from: |
||
Line 7: | Line 7: | ||
</pre> |
</pre> |
||
Needs hacked Matxin: |
Needs [[VISL CG-3]] from SVN and hacked Matxin: |
||
<pre> |
<pre> |
||
Line 14: | Line 14: | ||
Good luck! |
Good luck! |
||
==Notes for developers== |
|||
The pipeline looks like this: |
|||
<pre> |
|||
lt-proc -w fo-en.automorf.bin |\ |
|||
cg-proc -w fo-en.dis.rlx.bin |\ |
|||
apertium-tagger -pg fo-en.prob |\ |
|||
cg-proc -wf2 fo-en.dep.rlx.bin |\ |
|||
matxin-xfer-lex -s fo-en.en_sem.bin -c matxin-fo-en.fo-en.chunk_type.dat fo-en.autobil.bin |\ |
|||
matxin-gen-intra matxin-fo-en.en.order_intrachunk.dat matxin-fo-en.en.changes_sint.dat |\ |
|||
matxin-gen-inter matxin-fo-en.en.order_interchunk.dat |\ |
|||
matxin-gen-morph fo-en.autogen.bin |\ |
|||
matxin-reformat |
|||
</pre> |
|||
===Writing a dependency grammar CG for Matxin=== |
|||
End your file with something like this: |
|||
<pre> |
|||
AFTER-SECTIONS |
|||
SUBSTITUTE (@SUBJ→) (@SUBJ→ CHUNK) TARGET (@SUBJ→); |
|||
SUBSTITUTE (@←SUBJ) (@←SUBJ CHUNK) TARGET (@←SUBJ); |
|||
SUBSTITUTE (@OBJ→) (@OBJ→ CHUNK) TARGET (@OBJ→); |
|||
SUBSTITUTE (@←OBJ) (@←OBJ CHUNK) TARGET (@←OBJ); |
|||
... |
|||
</pre> |
|||
to define what you want to be a chunk. |
|||
Note that since Matxin expects that no non-CHUNK nodes may have CHUNK children, any word which might get a CHUNK daughter in the CG has to also get the CHUNK label. |
|||
==See also== |
==See also== |
Revision as of 16:57, 18 November 2009
A test pair using Matxin for the transfer step.
Download from:
https://matxin.svn.sourceforge.net/svnroot/matxin/branches/matxin-fo-en
Needs VISL CG-3 from SVN and hacked Matxin:
https://matxin.svn.sourceforge.net/svnroot/matxin/branches/matxin
Good luck!
Contents
Notes for developers
The pipeline looks like this:
lt-proc -w fo-en.automorf.bin |\ cg-proc -w fo-en.dis.rlx.bin |\ apertium-tagger -pg fo-en.prob |\ cg-proc -wf2 fo-en.dep.rlx.bin |\ matxin-xfer-lex -s fo-en.en_sem.bin -c matxin-fo-en.fo-en.chunk_type.dat fo-en.autobil.bin |\ matxin-gen-intra matxin-fo-en.en.order_intrachunk.dat matxin-fo-en.en.changes_sint.dat |\ matxin-gen-inter matxin-fo-en.en.order_interchunk.dat |\ matxin-gen-morph fo-en.autogen.bin |\ matxin-reformat
Writing a dependency grammar CG for Matxin
End your file with something like this:
AFTER-SECTIONS SUBSTITUTE (@SUBJ→) (@SUBJ→ CHUNK) TARGET (@SUBJ→); SUBSTITUTE (@←SUBJ) (@←SUBJ CHUNK) TARGET (@←SUBJ); SUBSTITUTE (@OBJ→) (@OBJ→ CHUNK) TARGET (@OBJ→); SUBSTITUTE (@←OBJ) (@←OBJ CHUNK) TARGET (@←OBJ); ...
to define what you want to be a chunk.
Note that since Matxin expects that no non-CHUNK nodes may have CHUNK children, any word which might get a CHUNK daughter in the CG has to also get the CHUNK label.
See also
Problems
- How do we insert a node? Ie. what's the Matxin equivalent of
<out><lu>...</lu></out>
? This will be necessary for eg. adding determiners in front of nouns (í dag fáa vit svar => today we get a reply)
Tests
Faroese and English/Regression tests Faroese and English/Pending tests