Earley-based structural transfer for Apertium
Perhaps Earley's algorithm to parse context-free grammars (which has a left-to-right longest-match philosophy as Apertium) could be used to perform more complex syntactical transformations; this could be useful for distant language pairs containing embedded structures.
Open questions
- Currently, Apertium uses text streams to communicate. I assume this would not be possible here.
When would one call the bilingual dictionary? Apertium Level 2 calls it in the first stage.- We should check whether this has been done before.
- The English → Urdu translation system linked here seems to use LFG and Earley-based parsing.
- In case there is more than one parse of a sentence, there should be a way to select the most likely.
Existing parsers
- Main article: Parsers
Current free-software parsers which might be worth looking at:
- AGFL parser (GPL)
Further reading
- Koichi Takeda Pattern-Based Context-Free Grammars for Machine Translation (private access)
- This paper proposes the use of "pattern-based" context-free grammars as a basis for building machine translation (MT) systems.
- Randall Sharp and Oliver Streiter Simplifying the Complexity of Machine Translation
- J. Earley, (1970) "An efficient context-free parsing algorithm", Communications of the Association for Computing Machinery, 13:2:94--102, 1970.