Difference between revisions of "Recursive transfer"

Latest revision as of 10:50, 9 February 2015

Todo[edit]

~~Make the parser output optionally original parse tree (SL syntax) and target parse tree (TL syntax).~~
Attribute structures. These are defined in typical .t1x format with def-attrs
Make the parser robust — we should never get parse errors, though our trees may be mangled.

Process[edit]

The parser has two trees, both are built simultaneously:

The source tree is parser-internal
The target tree is the "abstract syntax tree".

When a sentence terminal (S) is reached, the target tree is traversed and printed out.

Questions[edit]

What to do with a parse-fail.
- Implicit glue rules
  - How do we make sure that we never get Syntax error (e.g. really robust glue rules).
- the glue rules would not compute anything, just allow for partial parses
How about unknown words...
- they would be some non-terminal UNK that would be glued by the all-encompassing glue rule from above.
Ambiguous grammars -> can be automatically disambiguated ?
- Learn shift/reduce using target-language information ?
Converting right-recursive to left-recursive grammars.
How to apply macros in rules which have >1 non-terminal.
What on earth to do with blanks / formatting...
Do we try and find syntactic relations in the transfer, or do we pre-annotate (e.g. with CG) then use the tags from CG to constraint the parser?
Can/should we do unification in the grammar (e.g. to avoid rules like SN -> adj n matching when adj.G and n.G are not the same)?
If a language uses CG, the rule SN -> @A→ @N would only match where CG mapped @A→ (and CG can do unification with less trouble, not mapping @A→ where gender differs)
- However, if we are to propagate attributes up the tree as well, it makes sense to have unification as well, so we can say NP[gen=X] -> D[gen=X] N[gen=X]
Should the transfer allow for >1 possible TL translation ? to allow 'lexical selection' inside transfer as well as outside ?
Can we learn transfer grammars from aligned treebanks ?

Algorithms[edit]

CKY (bottom-up)
LALR(1) (bottom-up)
GLR (bottom-up)
Earley (top-down)

Usage[edit]

$ svn co https://svn.code.sf.net/p/apertium/svn/branches/transfer4

$ cd transfer4

$ cd eng-kaz

$ make

Files

eng-kaz.grammar: Transfer grammar file for English→Kazakh
eng-kaz.t1x: Categories (terminals) and attributes for English→Kazakh

Apply the transfer grammar

$ cat input/input.01.txt | ./eng-kaz.parser 
^you<prn><subj><p2><mf><sp>/сен<prn><pers><subj><p2><mf><sp>$ ^Kazakhstan<np><top><sg>/Қазақстан<np><top><nom>$ ^to<pr>/$ 
^go<vblex><past>/бар<v><iv><past>$ ^that<cnjsub>/$ ^I<prn><subj><p1><mf><sg>/Мен<prn><pers><subj><p1><mf><sg>$ 
^know<vblex><pres>/біл<v><tv><pres>$ ^.<sent>/.<sent>$

Print out the source tree

$ cat input/input.01.txt | ./eng-kaz.parser -s -p >/dev/null
(S (S1 (PRNS (subj_pron (^I<prn><subj><p1><mf><sg>/Мен<prn><pers><subj><p1><mf><sg>$))) 
(SV (V (pers_verb (^know<vblex><pres>/біл<v><tv><pres>$))))) (Ssub (cnjsub (^that<cnjsub>/$)) 
(S1 (PRNS (subj_pron (^you<prn><subj><p2><mf><sp>/сен<prn><pers><subj><p2><mf><sp>$))) 
(SV (V (pers_verb (^go<vblex><past>/бар<v><iv><past>$))) (SP (prep (^to<pr>/$)) 
(SN1 (SN (N (nom (^Kazakhstan<np><top><sg>/Қазақстан<np><top><nom>$))))))))) (X (sent (^.<sent>/.<sent>$))))

Print out the target tree

$ cat input/input.01.txt | ./eng-kaz.parser -p >/dev/null
(S (Ssub (S1 (PRNS (subj_pron (^you<prn><subj><p2><mf><sp>/сен<prn><pers><subj><p2><mf><sp>$))) 
(SV (SP (SN1 (SN (N (nom (^Kazakhstan<np><top><sg>/Қазақстан<np><top><nom>$))))) (prep (^to<pr>/$))) 
(V (pers_verb (^go<vblex><past>/бар<v><iv><past>$))))) (cnjsub (^that<cnjsub>/$))) 
(S1 (PRNS (subj_pron (^I<prn><subj><p1><mf><sg>/Мен<prn><pers><subj><p1><mf><sg>$))) 
(SV (V (pers_verb (^know<vblex><pres>/біл<v><tv><pres>$))))) (X (sent (^.<sent>/.<sent>$))))

References[edit]

Prószéky & Tihanyi (2002) "MetaMorpho: A Pattern-Based Machine Translation System"
White (1985) "Characteristics of the METAL machine translation system at Production Stage" (§6)
Slocum (1982) "The LRC Machine translation system: An application of State-of-the-Art ..." (p.18)

External links[edit]

@@ Line 1: / Line 1: @@
+{{TOCD}}
+==Todo==
+* <s>Make the parser output optionally original parse tree (SL syntax) and target parse tree (TL syntax).</s>
+* Attribute structures. These are defined in typical .t1x format with <code>def-attrs</code>
+* Make the parser robust &mdash; we should never get parse errors, though our trees may be mangled.
+==Process==
+The parser has two trees, both are built simultaneously:
+* The '''source''' tree is parser-internal
+* The '''target''' tree is the "abstract syntax tree".
+When a sentence terminal (<code>S</code>) is reached, the target tree is traversed and printed out.
+==Questions==
+* What to do with a parse-fail.
+** Implicit glue rules
+*** How do we make sure that we never get <code>Syntax error</code> (e.g. really robust glue rules).
+** the glue rules would not compute anything, just allow for partial parses
+* How about unknown words...
+** they would be some non-terminal UNK that would be glued  by the all-encompassing glue rule from above.
+* Ambiguous grammars -> can be automatically disambiguated ?
+** Learn shift/reduce using target-language information ?
+* Converting right-recursive to left-recursive grammars.
+* How to apply macros in rules which have >1 non-terminal.
+* What on earth to do with blanks / formatting...
+* Do we try and find syntactic relations in the transfer, or do we pre-annotate (e.g. with CG) then use the tags from CG to constraint the parser?
+* Can/should we do unification in the grammar (e.g. to avoid rules like SN -> adj n matching when adj.G and n.G are not the same)?
+*: If a language uses CG, the rule SN -> @A→ @N would only match where CG mapped @A→ (and CG can do unification with less trouble, not mapping @A→ where gender differs)
+** However, if we are to propagate attributes up the tree as well, it makes sense to have unification as well, so we can say <code>NP[gen=X] -&gt; D[gen=X] N[gen=X]</code>
+* Should the transfer allow for >1 possible TL translation ? to allow 'lexical selection' inside transfer as well as outside ?
+* Can we learn transfer grammars from aligned treebanks ?
 ==Algorithms==
+* [http://en.wikipedia.org/wiki/CYK_algorithm CKY] (bottom-up)
-* CKY
+* [http://en.wikipedia.org/wiki/LALR_parser LALR(1)] (bottom-up)
-* LALR(1)
+* [http://en.wikipedia.org/wiki/GLR_parser GLR] (bottom-up)
-* GLR
+* [http://en.wikipedia.org/wiki/Earley_parser Earley] (top-down)
-* Earley
+==Usage==
+<pre>
+$ svn co https://svn.code.sf.net/p/apertium/svn/branches/transfer4
+$ cd transfer4
+$ cd eng-kaz
+$ make
+</pre>
+;Files
+* <code>eng-kaz.grammar</code>: Transfer grammar file for English→Kazakh
+* <code>eng-kaz.t1x</code>: Categories (terminals) and attributes for English→Kazakh
+;Apply the transfer grammar
+<pre>
+$ cat input/input.01.txt | ./eng-kaz.parser
+^you<prn><subj><p2><mf><sp>/сен<prn><pers><subj><p2><mf><sp>$ ^Kazakhstan<np><top><sg>/Қазақстан<np><top><nom>$ ^to<pr>/$
+^go<vblex><past>/бар<v><iv><past>$ ^that<cnjsub>/$ ^I<prn><subj><p1><mf><sg>/Мен<prn><pers><subj><p1><mf><sg>$
+^know<vblex><pres>/біл<v><tv><pres>$ ^.<sent>/.<sent>$
+</pre>
+; Print out the source tree
+<pre>
+$ cat input/input.01.txt | ./eng-kaz.parser -s -p >/dev/null
+(S (S1 (PRNS (subj_pron (^I<prn><subj><p1><mf><sg>/Мен<prn><pers><subj><p1><mf><sg>$)))
+(SV (V (pers_verb (^know<vblex><pres>/біл<v><tv><pres>$))))) (Ssub (cnjsub (^that<cnjsub>/$))
+(S1 (PRNS (subj_pron (^you<prn><subj><p2><mf><sp>/сен<prn><pers><subj><p2><mf><sp>$)))
+(SV (V (pers_verb (^go<vblex><past>/бар<v><iv><past>$))) (SP (prep (^to<pr>/$))
+(SN1 (SN (N (nom (^Kazakhstan<np><top><sg>/Қазақстан<np><top><nom>$))))))))) (X (sent (^.<sent>/.<sent>$))))
+</pre>
+; Print out the target tree
+<pre>
+$ cat input/input.01.txt | ./eng-kaz.parser -p >/dev/null
+(S (Ssub (S1 (PRNS (subj_pron (^you<prn><subj><p2><mf><sp>/сен<prn><pers><subj><p2><mf><sp>$)))
+(SV (SP (SN1 (SN (N (nom (^Kazakhstan<np><top><sg>/Қазақстан<np><top><nom>$))))) (prep (^to<pr>/$)))
+(V (pers_verb (^go<vblex><past>/бар<v><iv><past>$))))) (cnjsub (^that<cnjsub>/$)))
+(S1 (PRNS (subj_pron (^I<prn><subj><p1><mf><sg>/Мен<prn><pers><subj><p1><mf><sg>$)))
+(SV (V (pers_verb (^know<vblex><pres>/біл<v><tv><pres>$))))) (X (sent (^.<sent>/.<sent>$))))
+</pre>
 ==References==
@@ Line 13: / Line 97: @@
 * Slocum (1982) "The LRC Machine translation system: An application of State-of-the-Art ..." (p.18)
-==External links==
+==Further reading==
+* [[User:Mlforcada/Robust LR for Transfer]]
+* MUHUA ZHU, JINGBO ZHU and HUIZHEN WANG (2013) "Improving shift-reduce constituency parsing with large-scale unlabeled data". ''Natural Language Engineering ''. October 2013, pp. 1--26
+* http://www.cs.cmu.edu/~./alavie/papers/thesis.pdf
+* http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-743.pdf
+==See also==
+* https://svn.code.sf.net/p/apertium/svn/branches/transfer4
+==External links==
+* [http://smlweb.cpsc.ucalgary.ca/start.html CFG tool]
+* [http://erg.delph-in.net/logon LOGON: Parse with the ERG]
 [[Category:Development]]
+[[Category:Transfer]]
+[[Category:Documentation in English]]

Difference between revisions of "Recursive transfer"

Latest revision as of 10:50, 9 February 2015

Contents

Todo[edit]

Process[edit]

Questions[edit]

Algorithms[edit]

Usage[edit]

References[edit]

Further reading[edit]

See also[edit]

External links[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools