Difference between revisions of "User:Khannatanmai/Wordbound blanks"

From Apertium
Jump to navigation Jump to search
Line 5: Line 5:
 
= Rationale =
 
= Rationale =
   
= Formalism =
+
= Formalism =
  +
Wordbound blanks will be denoted by double square brackets and will always appear right before a Lexical Unit.
  +
  +
<code>[[wordboundblank]]^LU<tags>$</code>
   
 
= Examples =
 
= Examples =

Revision as of 08:21, 24 June 2020

This page will follow the development of word bound blanks in the apertium stream format.

Features

Rationale

Formalism

Wordbound blanks will be denoted by double square brackets and will always appear right before a Lexical Unit.

wordboundblank^LU<tags>$

Examples

Markup Handling

$ echo 'legal <b>persons</b>' | apertium en-es -f html
Personas <b>legales</b>

$ echo 'I <b>am</b> David' | apertium en-es -f html
Soy</b> David
Spanish: <p>Es <s>además</s> de Valencia.</p>
Catalan: <p>És <s>a més</s> de València.</p>
English: <p>The <b>big <i>red</i></b> dog</p>
Spanish: <p>El perro <b><i>rojo</i> grande</b></p>
<p>Bees <b>cannot</b> swim</p>
<p>Las Abejas <b>no pueden</b> nadar</p>
<a href="Conway">Conway</a> stated that young <a href="children">children</a>
<i>“understand <a href="Object_permanence">object permanence</a>.
<a href="Concealment">Concealed</a> <a href="Object">objects</a> feature in
their awareness.”</i><span typeof="mw:Extension/ref"><a href="#ref-5">[5]</a></span>
<b>(<a href="Nielsen">Nielsen</a> equivalence).</b>
<p><b><i>my sister</i><br/>lives</b> <u>in Wales</u></p>
<a id="foobar" href="http://example.com">Foo <b>bar</b>.</a>

Ideal Output:
<a id="foobar" href="http://example.com"><b>Бар</b> фоо.</a>
<b>The</b> <i>sister</i>'s <em>dog</em>

From [[1]]

source: '<p>A <b>Japanese</b> <i>BBC</i> article</p>',
target: '<p>Un artículo de <i>BBC</i> <b>japonés</b></p>',

source: '<div>A <b>modern</b> Britain.</div>',
target: '<div>Una Gran Bretaña <b>moderna</b>.</div>',

source: '<p>The <b>big <i>red</i></b> dog</p>',
target: '<p>El perro <b><i>rojo</i></b> <b>grande</b></p>',

source: '<p>He said "<i>I tile <a href="x">bathrooms</a>.</i>"</p>',
target: '<p>Diga que "<i>enladrillo</i> <i><a href="x">baños</a></i>."</p>',

source: '<p>The <b>big red</b> dog</p>',
target: '<p>El perro <b>rojo grande</b></p>',

source: '<p>The <b>big</b> <b>red</b> dog</p>',
target: '<p>El perro <b>rojo</b> <b>grande</b></p>',

source: '<p>The <a href="1">big</a> <a href="2">red</a> dog</p>',
target: '<p>El perro <a href="2">rojo</a> <a href="1">grande</a></p>',
		
source: '<p id="8"><span class="cx-segment" data-segmentid="9"><a class="cx-link" data-linkid="17" href="./The_New_York_Times" rel="mw:WikiLink" title="The New York Times">The New York Times</a>, which has an <b>executive editor</b> over the news pages and an <b>editorial page editor</b> over opinion pages.</span></p>',
4c508d7f6e64	
target: '<p id="8"><span data-segmentid="9" class="cx-segment"><a title="The New York Times" rel="mw:WikiLink" href="./The_New_York_Times" data-linkid="17" class="cx-link">The New York Times</a>, el cual tiene un <b>editor ejecutivo</b> sobre las páginas noticiosas y un <b>editor de página del editorial</b> encima páginas de opinión.</span></p>',

source: '<p id="8"><style>b{color:red;}</style></p>',
target: '<p id="8"><style>b{color:red;}</style></p>',	

Pretransfer Tests:

input: [[<i>]]^a<vblex><pres>+c<po># b$ ^a<vblex><pres>+c<po># b$
output:[[<i>]]^a# b<vblex><pres>$ [[<i>]]^c<po>$ ^a# b<vblex><pres>$ ^c<po>$

Tests

Previous Attempts

References