Difference between revisions of "Recursive transfer"
Jump to navigation
Jump to search
Line 70: | Line 70: | ||
* How to apply macros in rules which have >1 non-terminal. |
* How to apply macros in rules which have >1 non-terminal. |
||
* What on earth to do with blanks / formatting... |
* What on earth to do with blanks / formatting... |
||
* Do we try and find syntactic relations in the transfer, or do we pre-annotate (e.g. with CG) then use the tags from CG to constraint the parser? |
|||
* Can/should we do unification in the grammar (e.g. to avoid rules like SN -> adj n matching when adj.G and n.G are not the same)? |
|||
==Algorithms== |
==Algorithms== |
Revision as of 10:56, 11 December 2013
Deliverables
Deliverable 1
- A program which reads a grammar using bison, parses a sentence and outputs the syntax tree as text, or graphViz or something.
Deliverable 2
- Program which takes output of lt-proc -b (biltrans) and applies a grammar, doing only reordering (and "insertion"/"deletion"), no tag changes
- The input would be ^sl/tl$ and the output would be ^tl$
- The grammar can be specified using a simple text-based CFG grammar formalism, converted into bison and compiled.
- Input
^Hau<prn><dem><sg>/This<prn><dem><sg>$ ^irabazle<n>/winner<n><ND>$ ^bat<num><sg>/a<det><ind><sg>$ ^en<post>/of<pr>$ ^historia<n>/story<n><ND>$ ^a<det><art><sg>/the<det><def><sg>$ ^izan<vbsint><pri><NR_HU>/be<vbser><pri><NR_HU>$ ^.<sent>/.<sent>$
- Output
^This<prn><dem><sg>$ ^be<vbser><pri><NR_HU>$ ^the<det><def><sg>$ ^story<n><ND>$ ^of<pr>$ ^a<det><ind><sg>$ ^winner<n><ND>$ ^.<sent>$
- Grammar
S -> SN SV sent { $1 $2 $3 } SV -> SN v { $2 $1 } SN -> N3 art { $2 $1 } | N3 { $1 } N3 -> SNGen N2 { $2 $1 } | N2 { $1 } N2 -> nom { $1 } | prn { $1 } SNGen -> SN genpost { $2 $1 } sent -> "sent" { $1 } v -> "vbser.*" { $1 } | "vblex.*" { $1 } art -> "det.art.*" { $1 } | "num.sg" { $1 } nom -> "n" { $1 } prn -> "prn.*" { $1 }
Deliverable 3
- An XML format for the rules, based on the current format, taking into account transfer operations
Questions
- What to do with a parse-fail.
- Implicit glue rules
- the glue rules would not compute anything, just allow for partial parses
- How about unknown words...
- they would be some non-terminal UNK that would be glued by the all-encompassing glue rule from above.
- Ambiguous grammars -> can be automatically disambiguated ?
- Learn shift/reduce using target-language information ?
- Converting right-recursive to left-recursive grammars.
- How to apply macros in rules which have >1 non-terminal.
- What on earth to do with blanks / formatting...
- Do we try and find syntactic relations in the transfer, or do we pre-annotate (e.g. with CG) then use the tags from CG to constraint the parser?
- Can/should we do unification in the grammar (e.g. to avoid rules like SN -> adj n matching when adj.G and n.G are not the same)?
Algorithms
References
- Prószéky & Tihanyi (2002) "MetaMorpho: A Pattern-Based Machine Translation System"
- White (1985) "Characteristics of the METAL machine translation system at Production Stage" (§6)
- Slocum (1982) "The LRC Machine translation system: An application of State-of-the-Art ..." (p.18)