Difference between revisions of "Faroese and English"

Revision as of 18:16, 19 November 2009

Notes for developers

The pipeline looks like this:

 lt-proc -w fo-en.automorf.bin |\
 cg-proc -w fo-en.dis.rlx.bin |\
 apertium-tagger -pg fo-en.prob |\
 cg-proc -wf2 fo-en.dep.rlx.bin |\
 matxin-xfer-lex -s fo-en.en_sem.bin -c matxin-fo-en.fo-en.chunk_type.dat fo-en.autobil.bin |\
 matxin-gen-intra matxin-fo-en.en.order_intrachunk.dat matxin-fo-en.en.changes_sint.dat |\
 matxin-gen-inter matxin-fo-en.en.order_interchunk.dat |\
 matxin-gen-morph fo-en.autogen.bin |\
 matxin-reformat

Writing a dependency grammar CG for Matxin

End your file with something like this:

 
AFTER-SECTIONS 

SUBSTITUTE (@SUBJ→)   (@SUBJ→ CHUNK) TARGET (@SUBJ→);
SUBSTITUTE (@←SUBJ)  (@←SUBJ CHUNK) TARGET (@←SUBJ);
SUBSTITUTE (@OBJ→)    (@OBJ→ CHUNK)  TARGET (@OBJ→);
SUBSTITUTE (@←OBJ)   (@←OBJ CHUNK)  TARGET (@←OBJ);
...

to define what you want to be a chunk.

Note that since Matxin expects that no non-CHUNK nodes may have CHUNK children, any word which might get a CHUNK daughter in the CG has to also get the CHUNK label.

Problems

Node insertion

How do we insert a node? Ie. what's the Matxin equivalent of <out><lu>...</lu></out>? This will be necessary for eg. adding determiners in front of nouns (í dag fáa vit svar => today we get a reply)

Deformatter

For some reason, the deformatter always adds "space dot" to the end of everything,

$ echo "Hví ikki?" | matxin-destxt /tmp/foo

gives

Hví ikki? .

Interchunk movement

Documentation of Matxin#Basque interchunk ordering grammars says

x2+x1 -- The child chunk (x2) is put immediately before the parent chunk (x1)

however, is any behaviour defined for x1+x2? When I try "fáa vit" (where "fáa" is the parent) and the rule

true	true	=1	x1.x2

I get the expected "Get we", however, if I have x1+x2, I get "We get". So it seems like if you have a +, it just ignores the order of the x-es. I tested this with the >1 position too, same thing ("fáa vit ikki").

Tests

@@ Line 49: / Line 49: @@
 * [http://omilia.uio.no/scanlex/toflur/fo.html pronoun list]
 ==Problems==
+===Node insertion===
-* How do we insert a node? Ie. what's the Matxin equivalent of <code><out><lu>...</lu></out></code>? This will be necessary for eg. adding determiners in front of nouns (í dag fáa vit svar => today we get '''a''' reply)
+How do we insert a node? Ie. what's the Matxin equivalent of <code><out><lu>...</lu></out></code>? This will be necessary for eg. adding determiners in front of nouns (í dag fáa vit svar => today we get '''a''' reply)
+===Deformatter===
+For some reason, the deformatter always adds "space dot" to the end of everything,
-* For some reason, the deformatter always adds "space dot" to the end of everything, <code>$ echo "Hví ikki?" | matxin-destxt /tmp/foo</code> gives <code>Hví ikki? .</code>
+ $ echo "Hví ikki?" | matxin-destxt /tmp/foo
+gives
+ Hví ikki? .
+===Interchunk movement===
+[[Documentation of Matxin#Basque interchunk ordering grammars]] says
+ x2+x1 -- The child chunk (x2) is put immediately before the parent chunk (x1)
+however, is any behaviour defined for x1+x2? When I try "fáa vit" (where "fáa" is the parent) and the rule
+ true	true	=1	x1.x2
+I get the expected "Get we", however, if I have <code>x1+x2</code>, I get "We get". So it seems like if you have a +, it just ignores the order of the x-es. I tested this with the >1 position too, same thing ("fáa vit ikki").
 ==Tests==

Difference between revisions of "Faroese and English"

Revision as of 18:16, 19 November 2009

Contents

Notes for developers

Writing a dependency grammar CG for Matxin

See also

Problems

Node insertion

Deformatter

Interchunk movement

Tests

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools