Difference between revisions of "Earley-based structural transfer for Apertium"

From Apertium
Jump to navigation Jump to search
 
(14 intermediate revisions by 8 users not shown)
Line 1: Line 1:
Perhaps [Earley's algorithm http://en.wikipedia.org/wiki/Earley's_algorithm] to parse context-free grammars (which has a left-to-right longest-match philosophy as Apertium) could be used to perform more complex syntactical transformations; this could be useful for distant language pairs containing embedded structures.
Perhaps [http://en.wikipedia.org/wiki/Earley's_algorithm Earley's algorithm] to parse context-free grammars (which has a left-to-right longest-match philosophy as Apertium) could be used to perform more complex syntactical transformations; this could be useful for distant language pairs containing embedded structures.


Open questions:
==Open questions==


* Currently, Apertium uses text streams to communicate. I assume this would not be possible here.
* Currently, Apertium uses text streams to communicate. I assume this would not be possible here.
* When would one call the bilingual dictionary? Apertium Level 2 calls it in the first stage.
* <s>When would one call the bilingual dictionary? Apertium Level 2 calls it in the first stage.</s>
* We should check whether this has been done before.
* We should check whether this has been done before.
::The English → Urdu translation system linked [[Specific resources per language#Urdu|here]] seems to use LFG and Earley-based parsing.
* In case there is more than one parse of a sentence, there should be a way to select the most likely.


==Existing parsers==
{{main|Parsers}}
Current free-software parsers which might be worth looking at:

* [http://www.agfl.cs.ru.nl/ AGFL parser] (GPL)


==Further reading==
==Further reading==


* Koichi Takeda [http://acl.ldc.upenn.edu/P/P96/P96-1020.pdf Pattern-Based Context-Free Grammars for Machine Translation] (private access)
* [http://66.102.9.104/search?q=cache:GwOQsQtddJIJ:www.isi.edu/natural-language/mt/hlt-naacl-06-zhang.pdf+earley%27s+algorithm+machine+translation&hl=en&ct=clnk&cd=3&client=iceweasel-a http://www.isi.edu/natural-language/mt/hlt-naacl-06-zhang.pdf]
:This paper proposes the use of "pattern-based" context-free grammars as a basis for building machine translation (MT) systems.
* [http://66.102.9.104/search?q=cache:PZewJi8kmc0J:www.slt.atr.jp/IWSLT2006/proceedings/EC_13_NTT.pdf+earley+algorithm+%22machine+translation%22&hl=en&ct=clnk&cd=5&client=iceweasel-a http://www.slt.atr.jp/IWSLT2006/proceedings/EC_13_NTT.pdf]
* Randall Sharp and Oliver Streiter [http://www.iai.uni-sb.de/docs/meta93.pdf Simplifying the Complexity of Machine Translation]
* J. Earley, (1970) "[http://portal.acm.org/citation.cfm?doid=362007.362035 An efficient context-free parsing algorithm]", ''Communications of the Association for Computing Machinery'', 13:2:94--102, 1970.


[[Category:Development]]
[[Category:Documentation in English]]
[[Category:Transfer]]

Latest revision as of 21:21, 2 October 2013

Perhaps Earley's algorithm to parse context-free grammars (which has a left-to-right longest-match philosophy as Apertium) could be used to perform more complex syntactical transformations; this could be useful for distant language pairs containing embedded structures.

Open questions[edit]

  • Currently, Apertium uses text streams to communicate. I assume this would not be possible here.
  • When would one call the bilingual dictionary? Apertium Level 2 calls it in the first stage.
  • We should check whether this has been done before.
The English → Urdu translation system linked here seems to use LFG and Earley-based parsing.
  • In case there is more than one parse of a sentence, there should be a way to select the most likely.

Existing parsers[edit]

Main article: Parsers

Current free-software parsers which might be worth looking at:

Further reading[edit]

This paper proposes the use of "pattern-based" context-free grammars as a basis for building machine translation (MT) systems.