User:Khannatanmai/Wordbound blanks

From Apertium
Jump to navigation Jump to search

This page will follow the development of word bound blanks in the apertium stream format.

Features

Rationale

Formalism

Examples

Markup Handling

$ echo 'legal <b>persons</b>' | apertium en-es -f html
Personas <b>legales</b>

$ echo 'I <b>am</b> David' | apertium en-es -f html
Soy</b> David
Spanish: <p>Es <s>además</s> de Valencia.</p>
Catalan: <p>És <s>a més</s> de València.</p>
English: <p>The <b>big <i>red</i></b> dog</p>
Spanish: <p>El perro <b><i>rojo</i> grande</b></p>
<p>Bees <b>cannot</b> swim</p>
<p>Las Abejas <b>no pueden</b> nadar</p>
<a href="Conway">Conway</a> stated that young <a href="children">children</a>
<i>“understand <a href="Object_permanence">object permanence</a>.
<a href="Concealment">Concealed</a> <a href="Object">objects</a> feature in
their awareness.”</i><span typeof="mw:Extension/ref"><a href="#ref-5">[5]</a></span>
<b>(<a href="Nielsen">Nielsen</a> equivalence).</b>
<p><b><i>my sister</i><br/>lives</b> <u>in Wales</u></p>
<a id="foobar" href="http://example.com">Foo <b>bar</b>.</a>

Ideal Output:
<a id="foobar" href="http://example.com"><b>Бар</b> фоо.</a>
<b>The</b> <i>sister</i>'s <em>dog</em>

From [[1]]

source: '<p>A <b>Japanese</b> <i>BBC</i> article</p>',
target: '<p>Un artículo de <i>BBC</i> <b>japonés</b></p>',

source: '<div>A <b>modern</b> Britain.</div>',
target: '<div>Una Gran Bretaña <b>moderna</b>.</div>',

source: '<p>The <b>big <i>red</i></b> dog</p>',
target: '<p>El perro <b><i>rojo</i></b> <b>grande</b></p>',

source: '<p>He said "<i>I tile <a href="x">bathrooms</a>.</i>"</p>',
target: '<p>Diga que "<i>enladrillo</i> <i><a href="x">baños</a></i>."</p>',

source: '<p>The <b>big red</b> dog</p>',
target: '<p>El perro <b>rojo grande</b></p>',

source: '<p>The <b>big</b> <b>red</b> dog</p>',
target: '<p>El perro <b>rojo</b> <b>grande</b></p>',

source: '<p>The <a href="1">big</a> <a href="2">red</a> dog</p>',
target: '<p>El perro <a href="2">rojo</a> <a href="1">grande</a></p>',
		
source: '<p id="8"><span class="cx-segment" data-segmentid="9"><a class="cx-link" data-linkid="17" href="./The_New_York_Times" rel="mw:WikiLink" title="The New York Times">The New York Times</a>, which has an <b>executive editor</b> over the news pages and an <b>editorial page editor</b> over opinion pages.</span></p>',
4c508d7f6e64	
target: '<p id="8"><span data-segmentid="9" class="cx-segment"><a title="The New York Times" rel="mw:WikiLink" href="./The_New_York_Times" data-linkid="17" class="cx-link">The New York Times</a>, el cual tiene un <b>editor ejecutivo</b> sobre las páginas noticiosas y un <b>editor de página del editorial</b> encima páginas de opinión.</span></p>',

source: '<p id="8"><style>b{color:red;}</style></p>',
target: '<p id="8"><style>b{color:red;}</style></p>',	

Pretransfer Tests:

input: [[<i>]]^a<vblex><pres>+c<po># b$ ^a<vblex><pres>+c<po># b$
output:[[<i>]]^a# b<vblex><pres>$ [[<i>]]^c<po>$ ^a# b<vblex><pres>$ ^c<po>$

Tests

Previous Attempts

References