User:Khannatanmai/Wordbound blanks

From Apertium
Jump to navigation Jump to search

This page will follow the development of word bound blanks in the apertium stream format.

Features

Rationale

Formalism

Examples

Markup Handling

$ echo 'legal <b>persons</b>' | apertium en-es -f html
Personas <b>legales</b>

$ echo 'I <b>am</b> David' | apertium en-es -f html
Soy</b> David
Spanish: <p>Es <s>además</s> de Valencia.</p>
Catalan: <p>És <s>a més</s> de València.</p>
English: <p>The <b>big <i>red</i></b> dog</p>
Spanish: <p>El perro <b><i>rojo</i> grande</b></p>
<p>Bees <b>cannot</b> swim</p>
<p>Las Abejas <b>no pueden</b> nadar</p>
<a href="Conway">Conway</a> stated that young <a href="children">children</a>
<i>“understand <a href="Object_permanence">object permanence</a>.
<a href="Concealment">Concealed</a> <a href="Object">objects</a> feature in
their awareness.”</i><span typeof="mw:Extension/ref"><a href="#ref-5">[5]</a></span>
<b>(<a href="Nielsen">Nielsen</a> equivalence).</b>
<p><b><i>my sister</i><br/>lives</b> <u>in Wales</u></p>
<a id="foobar" href="http://example.com">Foo <b>bar</b>.</a>

Ideal Output:
<a id="foobar" href="http://example.com"><b>Бар</b> фоо.</a>
The sister's dog


Tests

* https://github.com/unhammer/apertium/blob/blank-handling/tests/pretransfer/__init__.py

Previous Attempts

* https://wiki.apertium.org/wiki/User:SilentFlame/Progress * https://github.com/junaidiiith/apertium/tree/blank-handling GsoC2016 project * https://github.com/unhammer/apertium/tree/blank-handling older, unfinished implementation of the changes required in apertium-transfer, with notes at https://github.com/unhammer/apertium/blob/blank-handling/blank_notes.org#consequences-of-this-type-of-blank-handling * https://github.com/junaidiiith/apertium * https://github.com/junaidiiith/Apertium_Code * Make transfer output the non-inline blanks before the rule output AND Make transfer handle inline-blanks, and ignore :: work in progress for this and the above: https://github.com/unhammer/apertium/commit/b5c73fbe82544d83a98eb16b921c2fa224f6d40c References * https://www.mediawiki.org/wiki/Content_translation/Developers/Markup * https://www.mediawiki.org/wiki/Content_translation/Product_Definition/LinearDoc * https://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/superblank_handling_algorithm