Difference between revisions of "Faroese and English"

From Apertium
Jump to navigation Jump to search
(Created page with 'a test pair using Matxin for the transfer step')
 
 
(15 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{TOCD}}
a test pair using [[Matxin]] for the transfer step
A test pair using [[Matxin]] for the transfer step.

Download from:

<pre>
https://matxin.svn.sourceforge.net/svnroot/matxin/branches/matxin-fo-en
</pre>

Needs [[VISL CG-3]] from SVN and hacked Matxin:

<pre>
https://matxin.svn.sourceforge.net/svnroot/matxin/branches/matxin
</pre>

Good luck!

==Notes for developers==
The pipeline looks like this:
<pre>
lt-proc -w fo-en.automorf.bin |\
cg-proc -w fo-en.dis.rlx.bin |\
apertium-tagger -pg fo-en.prob |\
cg-proc -wf2 fo-en.dep.rlx.bin |\
matxin-xfer-lex -s fo-en.en_sem.bin -c matxin-fo-en.fo-en.chunk_type.dat fo-en.autobil.bin |\
matxin-gen-intra matxin-fo-en.en.order_intrachunk.dat matxin-fo-en.en.changes_sint.dat |\
matxin-gen-inter matxin-fo-en.en.order_interchunk.dat |\
matxin-gen-morph fo-en.autogen.bin |\
matxin-reformat
</pre>

===Writing a dependency grammar CG for Matxin===
End your file with something like this:
<pre>
AFTER-SECTIONS

SUBSTITUTE (@SUBJ→) (@SUBJ→ CHUNK) TARGET (@SUBJ→);
SUBSTITUTE (@←SUBJ) (@←SUBJ CHUNK) TARGET (@←SUBJ);
SUBSTITUTE (@OBJ→) (@OBJ→ CHUNK) TARGET (@OBJ→);
SUBSTITUTE (@←OBJ) (@←OBJ CHUNK) TARGET (@←OBJ);
...
</pre>
to define what you want to be a chunk.

Note that since Matxin expects that no non-CHUNK nodes may have CHUNK children, any word which might get a CHUNK daughter in the CG has to also get the CHUNK label.

==See also==
* [[Hfst#Using|how to use the Faroese FST from giellatekno]]
* [http://omilia.uio.no/scanlex/toflur/fo.html pronoun list]
==Problems==
===Node insertion===
How do we insert a node? Ie. what's the Matxin equivalent of <code><out><lu>...</lu></out></code>? This will be necessary for eg. adding determiners in front of nouns (í dag fáa vit svar => today we get '''a''' reply)
===Deformatter===
For some reason, the deformatter always adds "space dot" to the end of everything,

$ echo "Hví ikki?" | matxin-destxt /tmp/foo

gives

Hví ikki? .

==Tests==
* [[Faroese and English/Regression tests]]
* [[Faroese and English/Pending tests]]

[[Category:Faroese and English|*]]
[[Category:Matxin]]

Latest revision as of 13:29, 10 December 2010

A test pair using Matxin for the transfer step.

Download from:

https://matxin.svn.sourceforge.net/svnroot/matxin/branches/matxin-fo-en

Needs VISL CG-3 from SVN and hacked Matxin:

https://matxin.svn.sourceforge.net/svnroot/matxin/branches/matxin

Good luck!

Notes for developers[edit]

The pipeline looks like this:

 lt-proc -w fo-en.automorf.bin |\
 cg-proc -w fo-en.dis.rlx.bin |\
 apertium-tagger -pg fo-en.prob |\
 cg-proc -wf2 fo-en.dep.rlx.bin |\
 matxin-xfer-lex -s fo-en.en_sem.bin -c matxin-fo-en.fo-en.chunk_type.dat fo-en.autobil.bin |\
 matxin-gen-intra matxin-fo-en.en.order_intrachunk.dat matxin-fo-en.en.changes_sint.dat |\
 matxin-gen-inter matxin-fo-en.en.order_interchunk.dat |\
 matxin-gen-morph fo-en.autogen.bin |\
 matxin-reformat

Writing a dependency grammar CG for Matxin[edit]

End your file with something like this:

 
AFTER-SECTIONS 

SUBSTITUTE (@SUBJ→)   (@SUBJ→ CHUNK) TARGET (@SUBJ→);
SUBSTITUTE (@←SUBJ)  (@←SUBJ CHUNK) TARGET (@←SUBJ);
SUBSTITUTE (@OBJ→)    (@OBJ→ CHUNK)  TARGET (@OBJ→);
SUBSTITUTE (@←OBJ)   (@←OBJ CHUNK)  TARGET (@←OBJ);
...

to define what you want to be a chunk.

Note that since Matxin expects that no non-CHUNK nodes may have CHUNK children, any word which might get a CHUNK daughter in the CG has to also get the CHUNK label.

See also[edit]

Problems[edit]

Node insertion[edit]

How do we insert a node? Ie. what's the Matxin equivalent of <out><lu>...</lu></out>? This will be necessary for eg. adding determiners in front of nouns (í dag fáa vit svar => today we get a reply)

Deformatter[edit]

For some reason, the deformatter always adds "space dot" to the end of everything,

$ echo "Hví ikki?" | matxin-destxt /tmp/foo

gives

Hví ikki? .

Tests[edit]