Difference between revisions of "Ideas for Google Summer of Code/superblank handling algorithm"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
Here is an attempt to formalize a bit an alternative to the actual superblank handling strategies used in Apertium, to try to solve the problem of [[Reordering superblanks]], and the strategy is basically equivalent to [[User:Tino Didriksen]]'s [http://comments.gmane.org/gmane.comp.nlp.apertium/3921 approach], which in turn is related to the approach in [http://gramtrans.com/ Gramtrans] and also the one used by [https://www.mediawiki.org/wiki/Content_translation/Developers/Markup#Annotation_mapping_using_translation_subsequence_approximation Wikipedia Content Translation].
+
Here is an attempt to formalize a bit an alternative to the actual superblank handling strategies used in Apertium, to try to solve the problem of [[Reordering superblanks]], and the strategy is basically equivalent to [[User:Tino Didriksen]]'s [http://comments.gmane.org/gmane.comp.nlp.apertium/3921 approach], which in turn is related to the approach in [http://gramtrans.com/ Gramtrans] and also the one used by [https://www.mediawiki.org/wiki/Content_translation/Developers/Markup#Annotation_mapping_using_translation_subsequence_approximation Wikipedia Content Translation]. Unlike in [[User:Tino Didriksen]]'s, the method here does not distinguish block tags from inline tags, and avoids outputting closing tags that will be opened again, but this may need to be checked
   
 
This will be illustrated with an example using XML-style tags:
 
This will be illustrated with an example using XML-style tags:
Line 36: Line 36:
 
|-
 
|-
 
| <nowiki><i></nowiki>
 
| <nowiki><i></nowiki>
| 0 p 1 b 2 a 3
+
| 0 p 1 b 2 i 3
 
|
 
|
 
|
 
|
 
|-
 
|-
 
| my
 
| my
| 0 p 1 b 2 a 3
+
| 0 p 1 b 2 i 3
| 0 p 1 b 2 a 3 my
+
| 0 p 1 b 2 i 3 my
 
| the current stack is attached to the word as output
 
| the current stack is attached to the word as output
 
|-
 
|-
 
| sister
 
| sister
| 0 p 1 b 2 a 3
+
| 0 p 1 b 2 i 3
| 0 p 1 b 2 a 3 sister
+
| 0 p 1 b 2 i 3 sister
  +
|
  +
|-
  +
| </i>
  +
| 0 p 1 b 2 i 3
  +
| 0 p 1 b 2 i 3 sister
 
|
 
|
 
|-
 
|-

Revision as of 08:09, 11 February 2016

Here is an attempt to formalize a bit an alternative to the actual superblank handling strategies used in Apertium, to try to solve the problem of Reordering superblanks, and the strategy is basically equivalent to User:Tino Didriksen's approach, which in turn is related to the approach in Gramtrans and also the one used by Wikipedia Content Translation. Unlike in User:Tino Didriksen's, the method here does not distinguish block tags from inline tags, and avoids outputting closing tags that will be opened again, but this may need to be checked

This will be illustrated with an example using XML-style tags:

  • Before any reordering of words takes place, tags are handled with a stack; tags between words are removed and the contents of the tag stack is associated to each words.
  • After all reordering has occurred, tag stacks for consecutive words are compared to decide what to output.

Stealing (and slightly modifying) User:Tino Didriksen's example:

<p><b><i>my sister</i><br/>lives</b> <u>in Wales</u></p>

the process before the reordering, step by step, would be:

Input Stack Output Comment
(start) 0 Number IDs in stacks may be pointers to where the actual tags are stored; the name of the tag is included for clarity
<p> 0 p 1 A new ID is generated whenever a new element is stacked
<b> 0 p 1 b 2
<i> 0 p 1 b 2 i 3
my 0 p 1 b 2 i 3 0 p 1 b 2 i 3 my the current stack is attached to the word as output
sister 0 p 1 b 2 i 3 0 p 1 b 2 i 3 sister
0 p 1 b 2 i 3 0 p 1 b 2 i 3 sister


)to be completed)