Difference between revisions of "Recursive transfer"

Revision as of 18:13, 9 January 2014

Deliverables

Deliverable 1

A program which reads a grammar using bison, parses a sentence and outputs the syntax tree as text, or graphViz or something.
- See: https://svn.code.sf.net/p/apertium/svn/branches/transfer4/format-parse.py

Deliverable 2

Program which takes output of lt-proc -b (biltrans) and applies a grammar, doing only reordering (and "insertion"/"deletion"), no tag changes
- The input would be ^sl/tl$ and the output would be ^tl$
- The grammar can be specified using a simple text-based CFG grammar formalism, converted into bison and compiled.

Input

^Hau<prn><dem><sg>/This<prn><dem><sg>$ 
^irabazle<n>/winner<n><ND>$ 
^bat<num><sg>/a<det><ind><sg>$ 
^en<post>/of<pr>$ 
^historia<n>/story<n><ND>$ 
^a<det><art><sg>/the<det><def><sg>$ 
^izan<vbsint><pri><NR_HU>/be<vbser><pri><NR_HU>$
^.<sent>/.<sent>$

Output

^This<prn><dem><sg>$ 
^be<vbser><pri><NR_HU>$
^the<det><def><sg>$ 
^story<n><ND>$ 
^of<pr>$ 
^a<det><ind><sg>$ 
^winner<n><ND>$ 
^.<sent>$

Grammar

S -> SN SV sent { $1 $2 $3 }
SV -> SN v { $2 $1 }
SN -> N3 art { $2 $1 } | N3 { $1 } 
N3 -> SNGen N2 { $2 $1 } | N2 { $1 } 
N2 -> nom { $1 } | prn { $1 } 
SNGen -> SN genpost { $2 $1 }
sent -> "sent" { $1 } 
v -> "vbser.*" { $1 } | "vblex.*" { $1 } 
art -> "det.art.*" { $1 } | "num.sg" { $1 } 
nom -> "n" { $1 } 
prn -> "prn.*" { $1 }

Deliverable 3

An XML format for the rules, based on the current format, taking into account transfer operations

Todo

Make the parser output optionally original parse tree (SL syntax) and target parse tree (TL syntax).

Process

The parser has two trees, both are built simultaneously:

The source tree is parser-internal
The target tree is the "abstract syntax tree".

When a sentence terminal (S) is reached, the target tree is traversed and printed out.

Questions

What to do with a parse-fail.
- Implicit glue rules
  - How do we make sure that we never get Syntax error (e.g. really robust glue rules).
- the glue rules would not compute anything, just allow for partial parses
How about unknown words...
- they would be some non-terminal UNK that would be glued by the all-encompassing glue rule from above.
Ambiguous grammars -> can be automatically disambiguated ?
- Learn shift/reduce using target-language information ?
Converting right-recursive to left-recursive grammars.
How to apply macros in rules which have >1 non-terminal.
What on earth to do with blanks / formatting...
Do we try and find syntactic relations in the transfer, or do we pre-annotate (e.g. with CG) then use the tags from CG to constraint the parser?
Can/should we do unification in the grammar (e.g. to avoid rules like SN -> adj n matching when adj.G and n.G are not the same)?
If a language uses CG, the rule SN -> @A→ @N would only match where CG mapped @A→ (and CG can do unification with less trouble, not mapping @A→ where gender differs)
Should the transfer allow for >1 possible TL translation ? to allow 'lexical selection' inside transfer as well as outside ?
Can we learn transfer grammars from aligned treebanks ?

Algorithms

CKY (bottom-up)
LALR(1) (bottom-up)
GLR (bottom-up)
Earley (top-down)

References

Prószéky & Tihanyi (2002) "MetaMorpho: A Pattern-Based Machine Translation System"
White (1985) "Characteristics of the METAL machine translation system at Production Stage" (§6)
Slocum (1982) "The LRC Machine translation system: An application of State-of-the-Art ..." (p.18)

External links

@@ Line 86: / Line 86: @@
 * Do we try and find syntactic relations in the transfer, or do we pre-annotate (e.g. with CG) then use the tags from CG to constraint the parser?
 * Can/should we do unification in the grammar (e.g. to avoid rules like SN -> adj n matching when adj.G and n.G are not the same)?
+*: If a language uses CG, the rule SN -> @A→ @N would only match where CG mapped @A→ (and CG can do unification with less trouble, not mapping @A→ where gender differs)
 * Should the transfer allow for >1 possible TL translation ? to allow 'lexical selection' inside transfer as well as outside ?
 * Can we learn transfer grammars from aligned treebanks ?

Difference between revisions of "Recursive transfer"

Revision as of 18:13, 9 January 2014

Contents

Deliverables

Deliverable 1

Deliverable 2

Deliverable 3

Todo

Process

Questions

Algorithms

References

Further reading

See also

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools