Difference between revisions of "Faroese and English"

Latest revision as of 13:29, 10 December 2010

Notes for developers[edit]

The pipeline looks like this:

 lt-proc -w fo-en.automorf.bin |\
 cg-proc -w fo-en.dis.rlx.bin |\
 apertium-tagger -pg fo-en.prob |\
 cg-proc -wf2 fo-en.dep.rlx.bin |\
 matxin-xfer-lex -s fo-en.en_sem.bin -c matxin-fo-en.fo-en.chunk_type.dat fo-en.autobil.bin |\
 matxin-gen-intra matxin-fo-en.en.order_intrachunk.dat matxin-fo-en.en.changes_sint.dat |\
 matxin-gen-inter matxin-fo-en.en.order_interchunk.dat |\
 matxin-gen-morph fo-en.autogen.bin |\
 matxin-reformat

Writing a dependency grammar CG for Matxin[edit]

End your file with something like this:

 
AFTER-SECTIONS 

SUBSTITUTE (@SUBJ→)   (@SUBJ→ CHUNK) TARGET (@SUBJ→);
SUBSTITUTE (@←SUBJ)  (@←SUBJ CHUNK) TARGET (@←SUBJ);
SUBSTITUTE (@OBJ→)    (@OBJ→ CHUNK)  TARGET (@OBJ→);
SUBSTITUTE (@←OBJ)   (@←OBJ CHUNK)  TARGET (@←OBJ);
...

to define what you want to be a chunk.

Note that since Matxin expects that no non-CHUNK nodes may have CHUNK children, any word which might get a CHUNK daughter in the CG has to also get the CHUNK label.

Problems[edit]

Node insertion[edit]

How do we insert a node? Ie. what's the Matxin equivalent of <out><lu>...</lu></out>? This will be necessary for eg. adding determiners in front of nouns (í dag fáa vit svar => today we get a reply)

Deformatter[edit]

For some reason, the deformatter always adds "space dot" to the end of everything,

$ echo "Hví ikki?" | matxin-destxt /tmp/foo

gives

Hví ikki? .

Tests[edit]

Difference between revisions of "Faroese and English"

Latest revision as of 13:29, 10 December 2010

Contents

Notes for developers[edit]

Writing a dependency grammar CG for Matxin[edit]

See also[edit]

Problems[edit]

Node insertion[edit]

Deformatter[edit]

Tests[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
+{{TOCD}}
-a test pair using [[Matxin]] for the transfer step
+A test pair using [[Matxin]] for the transfer step.
 Download from:
@@ Line 7: / Line 8: @@
 </pre>
-Needs hacked Matxin:
+Needs [[VISL CG-3]] from SVN and hacked Matxin:
 <pre>
@@ Line 14: / Line 15: @@
 Good luck!
+==Notes for developers==
+The pipeline looks like this:
+<pre>
+ lt-proc -w fo-en.automorf.bin |\
+ cg-proc -w fo-en.dis.rlx.bin |\
+ apertium-tagger -pg fo-en.prob |\
+ cg-proc -wf2 fo-en.dep.rlx.bin |\
+ matxin-xfer-lex -s fo-en.en_sem.bin -c matxin-fo-en.fo-en.chunk_type.dat fo-en.autobil.bin |\
+ matxin-gen-intra matxin-fo-en.en.order_intrachunk.dat matxin-fo-en.en.changes_sint.dat |\
+ matxin-gen-inter matxin-fo-en.en.order_interchunk.dat |\
+ matxin-gen-morph fo-en.autogen.bin |\
+ matxin-reformat
+</pre>
+===Writing a dependency grammar CG for Matxin===
+End your file with something like this:
+<pre>
+AFTER-SECTIONS
+SUBSTITUTE (@SUBJ→)   (@SUBJ→ CHUNK) TARGET (@SUBJ→);
+SUBSTITUTE (@←SUBJ)  (@←SUBJ CHUNK) TARGET (@←SUBJ);
+SUBSTITUTE (@OBJ→)    (@OBJ→ CHUNK)  TARGET (@OBJ→);
+SUBSTITUTE (@←OBJ)   (@←OBJ CHUNK)  TARGET (@←OBJ);
+...
+</pre>
+to define what you want to be a chunk.
+Note that since Matxin expects that no non-CHUNK nodes may have CHUNK children, any word which might get a CHUNK daughter in the CG has to also get the CHUNK label.
+==See also==
+* [[Hfst#Using|how to use the Faroese FST from giellatekno]]
+* [http://omilia.uio.no/scanlex/toflur/fo.html pronoun list]
+==Problems==
+===Node insertion===
+How do we insert a node? Ie. what's the Matxin equivalent of <code><out><lu>...</lu></out></code>? This will be necessary for eg. adding determiners in front of nouns (í dag fáa vit svar => today we get '''a''' reply)
+===Deformatter===
+For some reason, the deformatter always adds "space dot" to the end of everything,
+ $ echo "Hví ikki?" | matxin-destxt /tmp/foo
+gives
+ Hví ikki? .
 ==Tests==
-[[Faroese and English/Regression tests]]
+* [[Faroese and English/Regression tests]]
-[[Faroese and English/Pending tests]]
+* [[Faroese and English/Pending tests]]
-[[Category:Language pairs]]
+[[Category:Faroese and English|*]]
+[[Category:Matxin]]