Difference between revisions of "Recursive transfer"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
− | |||
− | ==Deliverables== |
||
− | |||
− | ===Deliverable 1=== |
||
− | |||
− | * A program which reads a grammar using bison, parses a sentence and outputs the syntax tree as text, or graphViz or something. |
||
− | ** See: https://svn.code.sf.net/p/apertium/svn/branches/transfer4/format-parse.py |
||
− | |||
− | ===Deliverable 2=== |
||
− | |||
− | * Program which takes output of lt-proc -b (biltrans) and applies a grammar, doing only reordering (and "insertion"/"deletion"), no tag changes |
||
− | ** The input would be ^sl/tl$ and the output would be ^tl$ |
||
− | ** The grammar can be specified using a simple text-based CFG grammar formalism, converted into bison and compiled. |
||
− | |||
− | ;Input: |
||
− | <pre> |
||
− | ^Hau<prn><dem><sg>/This<prn><dem><sg>$ |
||
− | ^irabazle<n>/winner<n><ND>$ |
||
− | ^bat<num><sg>/a<det><ind><sg>$ |
||
− | ^en<post>/of<pr>$ |
||
− | ^historia<n>/story<n><ND>$ |
||
− | ^a<det><art><sg>/the<det><def><sg>$ |
||
− | ^izan<vbsint><pri><NR_HU>/be<vbser><pri><NR_HU>$ |
||
− | ^.<sent>/.<sent>$ |
||
− | </pre> |
||
− | |||
− | ;Output: |
||
− | <pre> |
||
− | ^This<prn><dem><sg>$ |
||
− | ^be<vbser><pri><NR_HU>$ |
||
− | ^the<det><def><sg>$ |
||
− | ^story<n><ND>$ |
||
− | ^of<pr>$ |
||
− | ^a<det><ind><sg>$ |
||
− | ^winner<n><ND>$ |
||
− | ^.<sent>$ |
||
− | </pre> |
||
− | |||
− | ;Grammar |
||
− | |||
− | <pre> |
||
− | S -> SN SV sent { $1 $2 $3 } |
||
− | SV -> SN v { $2 $1 } |
||
− | SN -> N3 art { $2 $1 } | N3 { $1 } |
||
− | N3 -> SNGen N2 { $2 $1 } | N2 { $1 } |
||
− | N2 -> nom { $1 } | prn { $1 } |
||
− | SNGen -> SN genpost { $2 $1 } |
||
− | sent -> "sent" { $1 } |
||
− | v -> "vbser.*" { $1 } | "vblex.*" { $1 } |
||
− | art -> "det.art.*" { $1 } | "num.sg" { $1 } |
||
− | nom -> "n" { $1 } |
||
− | prn -> "prn.*" { $1 } |
||
− | </pre> |
||
− | |||
− | ===Deliverable 3=== |
||
− | |||
− | * An XML format for the rules, based on the current format, taking into account transfer operations |
||
==Todo== |
==Todo== |
Revision as of 16:33, 17 April 2014
Todo
- Make the parser output optionally original parse tree (SL syntax) and target parse tree (TL syntax).
Process
The parser has two trees, both are built simultaneously:
- The source tree is parser-internal
- The target tree is the "abstract syntax tree".
When a sentence terminal (S
) is reached, the target tree is traversed and printed out.
Questions
- What to do with a parse-fail.
- Implicit glue rules
- How do we make sure that we never get
Syntax error
(e.g. really robust glue rules).
- How do we make sure that we never get
- the glue rules would not compute anything, just allow for partial parses
- Implicit glue rules
- How about unknown words...
- they would be some non-terminal UNK that would be glued by the all-encompassing glue rule from above.
- Ambiguous grammars -> can be automatically disambiguated ?
- Learn shift/reduce using target-language information ?
- Converting right-recursive to left-recursive grammars.
- How to apply macros in rules which have >1 non-terminal.
- What on earth to do with blanks / formatting...
- Do we try and find syntactic relations in the transfer, or do we pre-annotate (e.g. with CG) then use the tags from CG to constraint the parser?
- Can/should we do unification in the grammar (e.g. to avoid rules like SN -> adj n matching when adj.G and n.G are not the same)?
- If a language uses CG, the rule SN -> @A→ @N would only match where CG mapped @A→ (and CG can do unification with less trouble, not mapping @A→ where gender differs)
- However, if we are to propagate attributes up the tree as well, it makes sense to have unification as well, so we can say
NP[gen=X] -> D[gen=X] N[gen=X]
- Should the transfer allow for >1 possible TL translation ? to allow 'lexical selection' inside transfer as well as outside ?
- Can we learn transfer grammars from aligned treebanks ?
Algorithms
References
- Prószéky & Tihanyi (2002) "MetaMorpho: A Pattern-Based Machine Translation System"
- White (1985) "Characteristics of the METAL machine translation system at Production Stage" (§6)
- Slocum (1982) "The LRC Machine translation system: An application of State-of-the-Art ..." (p.18)
Further reading
- MUHUA ZHU, JINGBO ZHU and HUIZHEN WANG (2013) "Improving shift-reduce constituency parsing with large-scale unlabeled data". Natural Language Engineering . October 2013, pp. 1--26