Difference between revisions of "Recursive transfer"

Revision as of 13:05, 7 January 2014

Deliverables

Deliverable 1

A program which reads a grammar using bison, parses a sentence and outputs the syntax tree as text, or graphViz or something.
- See: https://svn.code.sf.net/p/apertium/svn/branches/transfer4/format-parse.py

Deliverable 2

Program which takes output of lt-proc -b (biltrans) and applies a grammar, doing only reordering (and "insertion"/"deletion"), no tag changes
- The input would be ^sl/tl$ and the output would be ^tl$
- The grammar can be specified using a simple text-based CFG grammar formalism, converted into bison and compiled.

Input

^Hau<prn><dem><sg>/This<prn><dem><sg>$ 
^irabazle<n>/winner<n><ND>$ 
^bat<num><sg>/a<det><ind><sg>$ 
^en<post>/of<pr>$ 
^historia<n>/story<n><ND>$ 
^a<det><art><sg>/the<det><def><sg>$ 
^izan<vbsint><pri><NR_HU>/be<vbser><pri><NR_HU>$
^.<sent>/.<sent>$

Output

^This<prn><dem><sg>$ 
^be<vbser><pri><NR_HU>$
^the<det><def><sg>$ 
^story<n><ND>$ 
^of<pr>$ 
^a<det><ind><sg>$ 
^winner<n><ND>$ 
^.<sent>$

Grammar

S -> SN SV sent { $1 $2 $3 }
SV -> SN v { $2 $1 }
SN -> N3 art { $2 $1 } | N3 { $1 } 
N3 -> SNGen N2 { $2 $1 } | N2 { $1 } 
N2 -> nom { $1 } | prn { $1 } 
SNGen -> SN genpost { $2 $1 }
sent -> "sent" { $1 } 
v -> "vbser.*" { $1 } | "vblex.*" { $1 } 
art -> "det.art.*" { $1 } | "num.sg" { $1 } 
nom -> "n" { $1 } 
prn -> "prn.*" { $1 }

Deliverable 3

An XML format for the rules, based on the current format, taking into account transfer operations

Process

The parser has two trees, both are built simultaneously:

The source tree is parser-internal
The target tree is the "abstract syntax tree".

When a sentence terminal (S) is reached, the target tree is traversed and printed out.

Questions

What to do with a parse-fail.
- Implicit glue rules
  - How do we make sure that we never get Syntax error (e.g. really robust glue rules).
- the glue rules would not compute anything, just allow for partial parses
How about unknown words...
- they would be some non-terminal UNK that would be glued by the all-encompassing glue rule from above.
Ambiguous grammars -> can be automatically disambiguated ?
- Learn shift/reduce using target-language information ?
Converting right-recursive to left-recursive grammars.
How to apply macros in rules which have >1 non-terminal.
What on earth to do with blanks / formatting...
Do we try and find syntactic relations in the transfer, or do we pre-annotate (e.g. with CG) then use the tags from CG to constraint the parser?
Can/should we do unification in the grammar (e.g. to avoid rules like SN -> adj n matching when adj.G and n.G are not the same)?
Should the transfer allow for >1 possible TL translation ? to allow 'lexical selection' inside transfer as well as outside ?
Can we learn transfer grammars from aligned treebanks ?

Algorithms

CKY (bottom-up)
LALR(1) (bottom-up)
GLR (bottom-up)
Earley (top-down)

References

Prószéky & Tihanyi (2002) "MetaMorpho: A Pattern-Based Machine Translation System"
White (1985) "Characteristics of the METAL machine translation system at Production Stage" (§6)
Slocum (1982) "The LRC Machine translation system: An application of State-of-the-Art ..." (p.18)

External links

CFG tool

@@ Line 83: / Line 83: @@
 * Can/should we do unification in the grammar (e.g. to avoid rules like SN -> adj n matching when adj.G and n.G are not the same)?
 * Should the transfer allow for >1 possible TL translation ? to allow 'lexical selection' inside transfer as well as outside ?
+* Can we learn transfer grammars from aligned treebanks ?
 ==Algorithms==

Difference between revisions of "Recursive transfer"

Revision as of 13:05, 7 January 2014

Contents

Deliverables

Deliverable 1

Deliverable 2

Deliverable 3

Process

Questions

Algorithms

References

Further reading

See also

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools