Difference between revisions of "User:Khannatanmai/Wordbound blanks"

From Apertium
Jump to navigation Jump to search
(22 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
= Features =
 
= Features =
   
== Transfer ([https://github.com/apertium/apertium/pull/90 Pull Request]) ==
+
== Transfer ([https://github.com/apertium/apertium/pull/90 Pull Request 1], [https://github.com/apertium/apertium/pull/94 Pull Request 2]) ==
=== Chunker ===
+
=== Chunker/Single-stage transfer ===
 
* Wordbound blanks are a part of transfer word as a new side: blank.
 
* Wordbound blanks are a part of transfer word as a new side: blank.
 
* Are ignored in pattern matching
 
* Are ignored in pattern matching
Line 13: Line 13:
 
* When MLUs are formed the blanks are merged.
 
* When MLUs are formed the blanks are merged.
 
* Tests added
 
* Tests added
  +
* If rule pattern has only one LU, the wordbound blank gets output with all output LUs of the rule
   
 
=== Interchunk ===
 
=== Interchunk ===
Line 25: Line 26:
 
* When MLUs are formed the blanks are merged.
 
* When MLUs are formed the blanks are merged.
 
* Tests added
 
* Tests added
  +
* If rule pattern chunk has only one LU, the wordbound blank gets output with all output LUs of the rule
   
 
== Pretransfer ([https://github.com/apertium/apertium/pull/93 Pull Request]) ==
 
== Pretransfer ([https://github.com/apertium/apertium/pull/93 Pull Request]) ==
Line 30: Line 32:
   
 
== Separable ([https://github.com/apertium/apertium-separable/pull/29 Pull Request]) ==
 
== Separable ([https://github.com/apertium/apertium-separable/pull/29 Pull Request]) ==
* Merge wordbound blanks and add to all LUs in rule output
+
* Merge wordbound blanks and add to all LUs in rule output.
  +
* Works for both autoseq and revautoseq.
   
== Analysis, Biltrans, Generation, Postgeneration ([https://github.com/apertium/lttoolbox/pull/101 Pull Request]) ==
+
== Analysis, Biltrans, Generation ([https://github.com/apertium/lttoolbox/pull/101 Pull Request]) ==
* Parsing wordbound blanks as normal blanks for analysis, generation, biltrans, postgeneration.
+
* Parsing wordbound blanks as normal blanks for analysis, generation, biltrans.
 
* Added a test for wordbound blank analysis.
 
* Added a test for wordbound blank analysis.
  +
  +
== Streamparser ([https://github.com/apertium/streamparser/pull/37 Pull Request]) ==
  +
* Wordbound blanks parsed as part of a lexical unit in the stream parser.
  +
* Can be accessed by class member: <code>LexicalUnit.wordbound_blank</code>.
  +
  +
== Postgeneration ([https://github.com/apertium/lttoolbox/pull/102 Pull Request]) ==
  +
* Wordbound blanks merge when words merge.
  +
* Wordbound blanks apply to all output words when output of postgen rule are more than input words.
  +
* No regression for postgeneration without wordbound blanks.
  +
* Lots of tests added.
   
 
= Rationale =
 
= Rationale =
 
Wordbound blanks will store information about a lexical unit that can help us with several applications where we want to send information through the pipeline but this information can't be sent as tags because it would break the FST matching in the modules.
 
Wordbound blanks will store information about a lexical unit that can help us with several applications where we want to send information through the pipeline but this information can't be sent as tags because it would break the FST matching in the modules.
  +
  +
We want to store it with a lexical unit, as throughout the pipe lexical units split, merge, delete and get added, and we want that this information distributes over multiple output words, merges on the output words, etc.
   
 
= Formalism =
 
= Formalism =
Line 44: Line 59:
 
<pre>[[wordboundblank]]^LU<tags>$</pre>
 
<pre>[[wordboundblank]]^LU<tags>$</pre>
   
  +
If there is no Lexical Unit in the stream (before the morph analyser and after the generator), then we have an end wblank as well.
= Examples =
 
== Markup Handling ==
 
   
  +
<pre>[[wordboundblank]]word[[/]] word2 word3 [[wordboundblank]]word4[[/]]</pre>
=== Working Examples ===
 
<pre>
 
Transfer Input:
 
^The<det><def><sp>/El<det><def><GD><ND>$ [[tbqum2bhp]]^big<adj><sint>/grande<adj><mf>$ [[t:b:qum2bhp; t:i:M0JZW3Q]]^red<adj>/rojo<adj>$ ^dog<n><sg>/perro<n><GD><sg>$[
 
]
 
   
Transfer Output:
 
^El<det><def><m><sg>$ ^perro<n><m><sg>$ [[t:b:qum2bhp; t:i:M0JZW3Q]]^rojo<adj><m><sg>$ [[tbqum2bhp]]^grande<adj><mf><sg>$[
 
]
 
</pre>
 
   
<pre>
 
Postchunk Input:
 
^Det_adj<SA>{^el<det><def>$ [[t:b:qum2bhp]]^grande# test<adj>$}$ ^inf<SV><vblex><pres><p3><ND>{[[t:i:M0JZW3Q]]^vivir<vblex><3>$}$ ^default<default>{[[t:b:qum2bhp; t:i:M0JZW3Q]]^rojo<adj>$}$ ^nom<SN><sg>{^perro<n><3>$}$ ^nom<SN><sg>{[[t:s:123456]]^test<n><3># abc$}$ ^have_enc_pp<SV><tx><tps><PD><ND>{[[t:x:1234ab]]^xyz<cnjadv>$ [[t:s:p2rthg]]^abc<vbhaver><ger>$ [[t:x:y265hk]]^uvwx<vblex><pp>$}$ ^have_enc_pp<SV><tx><tps><PD><ND>{[[t:x:1234ab; t:y:poposj]]^xyz<cnjadv>$ [[t:s:p2rthg; t:b:123456]]^abc<vbhaver><ger>$ [[t:x:y265hk]]^uvwx<vblex><pp>$}$[
 
]
 
   
  +
= Full Pipe Testing =
Postchunk Output:
 
^El<det><def>$ [[t:b:qum2bhp]]^grande# test<adj>$ [[t:i:M0JZW3Q]]^vivir<vblex><pres><p3><ND>$ [[t:b:qum2bhp; t:i:M0JZW3Q]]^rojo<adj>$ ^perro<n>$ [[t:s:123456]]^test<n># abc$ [[t:x:1234ab; t:s:p2rthg]]^xyz<cnjadv>+abc<vbhaver><ger>$ [[t:x:y265hk]]^uvwx<vblex><pp>$ [[t:x:1234ab; t:y:poposj; t:s:p2rthg; t:b:123456]]^xyz<cnjadv>+abc<vbhaver><ger>$ [[t:x:y265hk]]^uvwx<vblex><pp>$[
 
]
 
</pre>
 
   
  +
Current Translation Command: <code>apertium-deshtml < html_input-eng.in | apertium -f none -d $PREFIX/apertium-eng-spa eng-spa | apertium-retxt</code>
<pre>
 
   
  +
Wordbound blank with Transfuse Command: <code>tf-html-fragment $PREFIX/apertium-eng-spa/modes/eng-spa.mode < html_input-eng.in</code>
   
  +
== Spanish - Catalan ==
***********
 
  +
<pre>
 
  +
Source: <p>Es <s>además</s> de Valencia.</p>
lt-proc output:
 
  +
Current Translation: <p>És <s>a més de</s> València.</p>
^legal/legal<adj>$ ^persons/person<n><pl>$[]
 
  +
Ideal Translation: <p>Es <s>además</s> de Valencia.</p>
 
  +
After wordbound blanks: <p>És <s>*ademà s</s> de València.</p>
 
 
postchunk output:
 
^Persona<n><f><pl>$ ^legal<adj><mf><pl>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^legal/legal<adj>$ [[t:b:e4XkhY]]^persons/person<n><pl>$[]
 
 
 
 
postchunk output:
 
[[t:b:e4XkhY]]^persona<n><f><pl>$ ^legal<adj><mf><pl>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^I/I<num><mf><sg>/prpers<prn><subj><p1><mf><sg>$ ^am/be<vbser><pri><p1><sg>$ ^David/David<np><ant><m><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^ser<vbser><pri><p1><sg>$ ^David<np><ant><m><sg>$ ^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
[[t:b:Steu7o1]]^I/I<num><mf><sg>/prpers<prn><subj><p1><mf><sg>$ [[t:b:Steu7o2]]^am/be<vbser><pri><p1><sg>$ ^David/David<np><ant><m><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
[[t:b:Steu7o2]]^Ser<vbser><pri><p1><sg>$ ^David<np><ant><m><sg>$ ^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^I/I<num><mf><sg>/prpers<prn><subj><p1><mf><sg>$ [[t:b:Steu7o1]]^am/be<vbser><pri><p1><sg>$ [[t:b:Steu7o2]]^David/David<np><ant><m><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
[[t:b:Steu7o1]]^Ser<vbser><pri><p1><sg>$ [[t:b:Steu7o2]]^David<np><ant><m><sg>$ ^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
[[t:b:Steu7o1]]^I/I<num><mf><sg>/prpers<prn><subj><p1><mf><sg>$ ^am/be<vbser><pri><p1><sg>$ [[t:b:Steu7o2]]^David/David<np><ant><m><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^Ser<vbser><pri><p1><sg>$ [[t:b:Steu7o2]]^David<np><ant><m><sg>$ ^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^Bees/Bee<n><pl>$ ^cannot/can<vaux><pres>+not<adv>$ ^swim/swim<vblex><inf>/swim<vblex><pres>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^El<det><def><f><pl>$ ^abeja<n><f><pl>$ ^no<adv>$ ^poder<vbmod><pri><p3><pl>$ ^nadar<vblex><inf>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^Bees/Bee<n><pl>$ [[t:i:NaFC2iv]]^cannot/can<vaux><pres>+not<adv>$ ^swim/swim<vblex><inf>/swim<vblex><pres>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^El<det><def><f><pl>$ ^abeja<n><f><pl>$ [[t:i:NaFC2iv]]^no<adv>$ [[t:i:NaFC2iv]]^poder<vbmod><pri><p3><pl>$ ^nadar<vblex><inf>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^Conway/*Conway$ ^stated/state<vblex><past>/state<vblex><pp>$ ^that/that<cnjsub>/that<det><dem><sg>/that<prn><tn><mf><sg>/that<rel><an><mf><sp>$ ^young/young<adj><sint>$ ^children/child<n><pl>$ "^understand/understand<vblex><inf>/understand<vblex><pres>$ ^object/object<n><sg>/object<vblex><inf>/object<vblex><pres>$ ^permanence/permanence<n><sg>$^./.<sent>$ ^Concealed/Conceal<vblex><past>/Conceal<vblex><pp>$ ^objects/object<n><pl>/object<vblex><pri><p3><sg>$ ^feature/feature<n><sg>/feature<vblex><inf>/feature<vblex><pres>$ ^in/in<pr>$ ^their/their<det><pos><sp>$ ^awareness/awareness<n><sg>$^./.<sent>$" ^(/(<lpar>$^Nielsen/*Nielsen$ ^equivalence/equivalence<n><sg>$^)/)<rpar>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^*Conway$ ^Declarar<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^el<det><def><m><pl>$ ^niño<n><m><pl>$ ^joven<adj><mf><pl>$ "^entender<vblex><pri><p3><pl>$ ^permanencia<n><f><sg>$ ^de<pr>$ ^objeto<n><m><sg>$^.<sent>$ ^Encubrir<vblex><pp><m><sg>$ ^objetar<vblex><pri><p3><sg>$ ^característica<n><f><sg>$ ^en<pr>$ ^suyo<det><pos><mf><sg>$ ^concienciación<n><f><sg>$^.<sent>$" ^(<lpar>$^*Nielsen$ ^Equivalencia<n><f><sg>$^)<rpar>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
[[t:a:NaFC2iv]]^Conway/*Conway$ ^stated/state<vblex><past>/state<vblex><pp>$ ^that/that<cnjsub>/that<det><dem><sg>/that<prn><tn><mf><sg>/that<rel><an><mf><sp>$ ^young/young<adj><sint>$ ^children/child<n><pl>$ "[[t:i:M0JZW3Q]]^understand/understand<vblex><inf>/understand<vblex><pres>$ [[t:i:M0JZW3Q1; t:a:qN1fD2pi1]]^object/object<n><sg>/object<vblex><inf>/object<vblex><pres>$ [[t:i:M0JZW3Q2; t:a:qN1fD2pi2]]^permanence/permanence<n><sg>$[[t:i:M0JZW3Q3]]^./.<sent>$ [[t:i:M0JZW3Q4; t:a:ZVcC0MJ]]^Concealed/Conceal<vblex><past>/Conceal<vblex><pp>$ [[t:i:M0JZW3Q5; t:a:xDp3Y3y]]^objects/object<n><pl>/object<vblex><pri><p3><sg>$ [[t:i:M0JZW3Q6]]^feature/feature<n><sg>/feature<vblex><inf>/feature<vblex><pres>$ [[t:i:M0JZW3Q7]]^in/in<pr>$ [[t:i:M0JZW3Q8]]^their/their<det><pos><sp>$ [[t:i:M0JZW3Q9]]^awareness/awareness<n><sg>$[[t:i:M0JZW3Q10]]^./.<sent>$" ^(/(<lpar>$^Nielsen/*Nielsen$ ^equivalence/equivalence<n><sg>$^)/)<rpar>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
[[t:a:NaFC2iv]]^*Conway$ ^Declarar<vblex><ifi><p3><sg>$ ^que<cnjsub>$ ^el<det><def><m><pl>$ ^niño<n><m><pl>$ ^joven<adj><mf><pl>$ "[[t:i:M0JZW3Q]]^entender<vblex><pri><p3><pl>$ [[t:i:M0JZW3Q2; t:a:qN1fD2pi2]]^permanencia<n><f><sg>$ ^de<pr>$ [[t:i:M0JZW3Q1; t:a:qN1fD2pi1]]^objeto<n><m><sg>$[[t:i:M0JZW3Q3]]^.<sent>$ [[t:i:M0JZW3Q4; t:a:ZVcC0MJ]]^Encubrir<vblex><pp><m><sg>$ [[t:i:M0JZW3Q5; t:a:xDp3Y3y]]^objetar<vblex><pri><p3><sg>$ [[t:i:M0JZW3Q6]]^característica<n><f><sg>$ [[t:i:M0JZW3Q7]]^en<pr>$ [[t:i:M0JZW3Q8]]^suyo<det><pos><mf><sg>$ [[t:i:M0JZW3Q9]]^concienciación<n><f><sg>$[[t:i:M0JZW3Q10]]^.<sent>$" ^(<lpar>$^*Nielsen$ ^Equivalencia<n><f><sg>$^)<rpar>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^My/My<det><pos><sp>$ ^sister/sister<n><sg>$ ^lives/life<n><pl>/live<vblex><pri><p3><sg>$ ^in/in<pr>$ ^Wales/Wales<np><loc><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^Mío<det><pos><mf><pl>$ ^vida<n><f><pl>$ ^de<pr>$ ^hermano<n><f><sg>$ ^en<pr>$ ^Gales<np><loc><m><sg>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
[[t:b:qum2bhp1]]^My/My<det><pos><sp>$ [[t:b:qum2bhp2; t:i:KPL7B551]]^sister/sister<n><sg>$ [[t:b:qum2bhp3; t:i:KPL7B552]]^lives/life<n><pl>/live<vblex><pri><p3><sg>$ [[t:u:WyW2HW1]]^in/in<pr>$ [[t:u:WyW2HW2]]^Wales/Wales<np><loc><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
[[t:b:qum2bhp1]]^Mío<det><pos><mf><pl>$ [[t:b:qum2bhp3; t:i:KPL7B552]]^vida<n><f><pl>$ ^de<pr>$ [[t:b:qum2bhp2; t:i:KPL7B551]]^hermano<n><f><sg>$ [[t:u:WyW2HW1]]^en<pr>$ [[t:u:WyW2HW2]]^Gales<np><loc><m><sg>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^The/The<det><def><sp>$ ^sister/sister<n><sg>$ ^'s/'s<gen>/be<vbser><pri><p3><sg>$ ^dog/dog<n><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^El<det><def><m><sg>$ ^perro<n><m><sg>$ ^de<pr>$ ^el<det><def><f><sg>$ ^hermano<n><f><sg>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
[[t:b:8gaY]]^The/The<det><def><sp>$ [[t:i:QypP0e]]^sister/sister<n><sg>$ ^'s/'s<gen>/be<vbser><pri><p3><sg>$ ^dog/dog<n><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^El<det><def><m><sg>$ ^perro<n><m><sg>$ ^de<pr>$ [[t:b:8gaY]]^el<det><def><f><sg>$ [[t:i:QypP0e]]^hermano<n><f><sg>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^A/A<det><ind><sg>$ ^Japanese/japanese<adj>/Japanese<n><sg>/Japanese<n><pl>$ ^BBC/BBC<n><acr><sg>$ ^article/article<n><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^Uno<det><ind><f><sg>$ ^prenda<n><f><sg>$ ^de<pr>$ ^BBC<n><acr><f><sg>$ ^japonés<adj><f><sg>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^A/A<det><ind><sg>$ [[t:b:qum2bhp]]^Japanese/japanese<adj>/Japanese<n><sg>/Japanese<n><pl>$ [[t:i:M0JZW3Q]]^BBC/BBC<n><acr><sg>$ ^article/article<n><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^Uno<det><ind><f><sg>$ ^prenda<n><f><sg>$ ^de<pr>$ [[t:i:M0JZW3Q]]^BBC<n><acr><f><sg>$ [[t:b:qum2bhp]]^japonés<adj><f><sg>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^A/A<det><ind><sg>$ ^modern/modern<adj>$ ^Britain/Britain<np><loc><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^Uno<det><ind><f><sg>$ ^Gran Bretaña<np><loc><f><sg>$ ^moderno<adj><f><sg>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^A/A<det><ind><sg>$ [[t:b:qum2bhp]]^modern/modern<adj>$ ^Britain/Britain<np><loc><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^Uno<det><ind><f><sg>$ ^Gran Bretaña<np><loc><f><sg>$ [[t:b:qum2bhp]]^moderno<adj><f><sg>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^The/The<det><def><sp>$ ^big/big<adj><sint>$ ^red/red<adj>/red<n><sg>$ ^dog/dog<n><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^El<det><def><m><sg>$ ^perro<n><m><sg>$ ^rojo<adj><m><sg>$ ^grande<adj><mf><sg>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^The/The<det><def><sp>$ [[t:b:qum2bhp1]]^big/big<adj><sint>$ [[t:b:qum2bhp2; t:i:M0JZW3Q]]^red/red<adj>/red<n><sg>$ ^dog/dog<n><sg>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^El<det><def><m><sg>$ ^perro<n><m><sg>$ [[t:b:qum2bhp2; t:i:M0JZW3Q]]^rojo<adj><m><sg>$ [[t:b:qum2bhp1]]^grande<adj><mf><sg>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
^He/Prpers<prn><subj><p3><m><sg>$ ^said/say<vblex><past>/say<vblex><pp>$ "^I/I<num><mf><sg>/prpers<prn><subj><p1><mf><sg>$ ^tile/tile<n><sg>/tile<vblex><inf>/tile<vblex><pres>$ ^bathrooms/bathroom<n><pl>$^./.<sent>$"[]
 
 
 
 
postchunk output:
 
^Decir<vblex><prs><p3><sg>$ "^I<num><mf><pl>$ ^baño<n><m><pl>$ ^de<pr>$ ^azulejo<n><m><sg>$^.<sent>$"[]
 
 
 
 
***********
 
 
lt-proc output:
 
^He/Prpers<prn><subj><p3><m><sg>$ ^said/say<vblex><past>/say<vblex><pp>$ "[[t:i:M0JZW3Q1]]^I/I<num><mf><sg>/prpers<prn><subj><p1><mf><sg>$ [[t:i:M0JZW3Q2]]^tile/tile<n><sg>/tile<vblex><inf>/tile<vblex><pres>$ [[t:i:M0JZW3Q3; t:a:NaFC2iv]]^bathrooms/bathroom<n><pl>$[[t:i:M0JZW3Q4]]^./.<sent>$"[]
 
 
 
 
postchunk output:
 
^Decir<vblex><prs><p3><sg>$ "[[t:i:M0JZW3Q1]]^I<num><mf><pl>$ [[t:i:M0JZW3Q3; t:a:NaFC2iv]]^baño<n><m><pl>$ ^de<pr>$ [[t:i:M0JZW3Q2]]^azulejo<n><m><sg>$[[t:i:M0JZW3Q4]]^.<sent>$"[]
 
 
 
 
***********
 
 
lt-proc output:
 
^The New York Times/The New York Times<np><al><sg>$^,/,<cm>$ ^which/which<det><itg><sp>/which<prn><itg><m><sp>/which<rel><an><mf><sp>$ ^has/have<vbhaver><pri><p3><sg>/have<vblex><pri><p3><sg>$ ^an/a<det><ind><sg>$ ^executive/executive<adj>/executive<n><sg>$ ^editor/editor<n><sg>$ ^over/over<adv>/over<pr>$ ^the/the<det><def><sp>$ ^news/news<adj>/news<n><sg>/news<n><pl>$ ^pages/page<n><pl>$ ^and/and<cnjcoo>$ ^an/a<det><ind><sg>$ ^editorial/editorial<n><sg>$ ^page/page<n><sg>$ ^editor/editor<n><sg>$ ^over/over<adv>/over<pr>$ ^opinion/opinion<n><sg>$ ^pages/page<n><pl>$^./.<sent>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
^The New York Times<np><al><m><sg>$^,<cm>$ ^el cual<rel><nn><m><sg>$ ^tener<vblex><pri><p3><sg>$ ^uno<det><ind><m><sg>$ ^editor<n><m><sg>$ ^ejecutivo<adj><m><sg>$ ^sobre<pr>$ ^el<det><def><f><pl>$ ^página<n><f><pl>$ ^noticioso<adj><f><pl>$ ^y<cnjcoo>$ ^uno<det><ind><m><sg>$ ^editor<n><m><sg>$ ^de<pr>$ ^página<n><f><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ ^editorial<n><m><sg>$ ^encima<adv>$ ^página<n><f><pl>$ ^de<pr>$ ^opinión<n><f><sg>$^.<sent>$^.<sent>$[]
 
 
 
 
***********
 
 
lt-proc output:
 
[[t:a:ETwYHMW]]^The New York Times/The New York Times<np><al><sg>$^,/,<cm>$ ^which/which<det><itg><sp>/which<prn><itg><m><sp>/which<rel><an><mf><sp>$ ^has/have<vbhaver><pri><p3><sg>/have<vblex><pri><p3><sg>$ ^an/a<det><ind><sg>$ [[t:b:QjxgZ1]]^executive/executive<adj>/executive<n><sg>$ [[t:b:QjxgZ2]]^editor/editor<n><sg>$ ^over/over<adv>/over<pr>$ ^the/the<det><def><sp>$ ^news/news<adj>/news<n><sg>/news<n><pl>$ ^pages/page<n><pl>$ ^and/and<cnjcoo>$ ^an/a<det><ind><sg>$ [[t:b:QjxgZ3]]^editorial/editorial<n><sg>$ [[t:b:QjxgZ4]]^page/page<n><sg>$ [[t:b:QjxgZ5]]^editor/editor<n><sg>$ ^over/over<adv>/over<pr>$ ^opinion/opinion<n><sg>$ ^pages/page<n><pl>$^./.<sent>$^./.<sent>$[]
 
 
 
 
postchunk output:
 
[[t:a:ETwYHMW]]^The New York Times<np><al><m><sg>$^,<cm>$ ^el cual<rel><nn><m><sg>$ ^tener<vblex><pri><p3><sg>$ ^uno<det><ind><m><sg>$ [[t:b:QjxgZ2]]^editor<n><m><sg>$ [[t:b:QjxgZ1]]^ejecutivo<adj><m><sg>$ ^sobre<pr>$ ^el<det><def><f><pl>$ ^página<n><f><pl>$ ^noticioso<adj><f><pl>$ ^y<cnjcoo>$ ^uno<det><ind><m><sg>$ [[t:b:QjxgZ5]]^editor<n><m><sg>$ ^de<pr>$ [[t:b:QjxgZ4]]^página<n><f><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ [[t:b:QjxgZ3]]^editorial<n><m><sg>$ ^encima<adv>$ ^página<n><f><pl>$ ^de<pr>$ ^opinión<n><f><sg>$^.<sent>$^.<sent>$[]
 
 
</pre>
 
</pre>
   
=== Examples that should work ===
+
== Spanish - English ==
 
 
<pre>
 
<pre>
$ echo 'legal <b>persons</b>' | apertium en-es -f html
+
Source: legal <b>persons</b>
Personas <b>legales</b>
+
Current Translation: Personas jurídicas <b></b>
  +
Ideal Translation: <b>Personas</b> legales
  +
After wordbound blanks: <b>Personas</b>
   
  +
legales
Ideal:
 
<b>Personas</b> legales
 
   
$ echo 'I <b>am</b> David' | apertium en-es -f html
+
Source: I <b>am</b> David
Soy</b> David
+
Current Translation: <b>soy David</b>
  +
Ideal Translation: <b>Soy</b> David
  +
After wordbound blanks: <b>soy</b> David
   
  +
Source: <p>Bees <b>cannot</b> swim</p>
Ideal:
 
  +
Current Translation: <p>Las abejas <b>no pueden</b> nadar</p>
<b>Soy</b> David
 
  +
Ideal Translation: <p>Las Abejas <b>no pueden</b> nadar</p>
</pre>
 
  +
After wordbound blanks: <p>las abejas <b>no pueden</b> nadar</p>
   
  +
Source: <a href="Conway">Conway</a> stated that young <a href="children">children</a><i>“understand <a href="Object_permanence">object permanence</a>. <a href="Concealment">Concealed</a> <a href="Object">objects</a> feature in their awareness.”</i><span typeof="mw:Extension/ref"><a href="#ref-5">[5]</a></span><b>(<a href="Nielsen">Nielsen</a> equivalence).</b>
<pre>
 
  +
Current Translation: <a href="Conway">*Conway</a> Declaró que los niños <a href="children">jóvenes</a><i>“entienden <a href="Object_permanence">permanencia de objeto</a>. <a href="Concealment">Encubierto</a> <a href="Object">objeta</a> característica en su concienciación.”</i><span typeof="mw:Extension/ref"><a href="#ref-5">[5]</a></span><b>(<a href="Nielsen">*Nielsen</a> equivalencia).</b>
Spanish: <p>Es <s>además</s> de Valencia.</p>
 
  +
Ideal Translation:
Catalan: <p>És <s>a més</s> de València.</p>
 
  +
After wordbound blanks: <a href="Conway">*Conway</a> declaró que los <a href="children">niños</a> jóvenes“<i>entienden <a href="Object_permanence">permanencia</a></i> de <i> <a href="Object_permanence">objeto</a></i><i>. <a href="Concealment">Encubierto</a> </i> <i><a href="Object">objeta</a> </i> <i>característica en</i> <i>su concienciación</i><i>.</i>”<span typeof="mw:Extension/ref"><a href="#ref-5">\[</a></span><span typeof="mw:Extension/ref"><a href="#ref-5">5</a></span><span typeof="mw:Extension/ref"><a href="#ref-5">\]</a></span><b>(<a href="Nielsen">*Nielsen</a> </b>
</pre>
 
   
  +
<b>equivalencia)</b>
<pre>
 
<p>Bees <b>cannot</b> swim</p>
 
<p>Las Abejas <b>no pueden</b> nadar</p>
 
</pre>
 
   
<pre>
+
<b>.</b>
<a href="Conway">Conway</a> stated that young <a href="children">children</a>
 
<i>“understand <a href="Object_permanence">object permanence</a>.
 
<a href="Concealment">Concealed</a> <a href="Object">objects</a> feature in
 
their awareness.”</i><span typeof="mw:Extension/ref"><a href="#ref-5">[5]</a></span>
 
<b>(<a href="Nielsen">Nielsen</a> equivalence).</b>
 
</pre>
 
   
  +
Source: <p><b><i>my sister</i><br/>lives</b> <u>in Wales</u></p>
<pre>
 
<p><b><i>my sister</i><br/>lives</b> <u>in Wales</u></p>
+
Current Translation: <p><b><i>Mis vidas</i><br/>de hermana</b> <u>en Gales</u></p>
  +
Ideal Translation:
</pre>
 
  +
After wordbound blanks:
   
  +
Source: <b>The</b> <i>sister</i>'s <em>dog</em>
<pre>
 
  +
Current Translation: <b>El perro</i> de la <em></b> <i>hermana</em>
<a id="foobar" href="http://example.com">Foo <b>bar</b>.</a>
 
  +
Ideal Translation:
  +
After wordbound blanks: <p><b><i>Mis</i><br></b> <b>vidas</b> de <b><i>hermana</i><br></b> <u>en Gales</u></p>
   
  +
<em>el perro</em>
Ideal Output:
 
  +
</pre>
<a id="foobar" href="http://example.com"><b>Бар</b> фоо.</a>
 
</pre>
 
   
  +
From [[https://phabricator.wikimedia.org/diffusion/GCXS/browse/master/test/mt/Apertium.test.js|Wikimedia tests]]:
 
<pre>
 
<pre>
<b>The</b> <i>sister</i>'s <em>dog</em>
+
Source: <p>A <b>Japanese</b> <i>BBC</i> article</p>
  +
Current Translation: <p>Una <b>prenda</b> <i>de BBC</i> japonesa</p>
</pre>
 
  +
Ideal Translation:
  +
After wordbound blanks: <b>de L</b> <i>a hermana</i><p>Una prenda de <i>BBC</i> <b>japonesa</b> </p>
   
  +
Source: <div>A <b>modern</b> Britain.</div>
From [[https://phabricator.wikimedia.org/diffusion/GCXS/browse/master/test/mt/Apertium.test.js|wikimedia_tests]]
 
  +
Current Translation: <div>Una <b>Gran Bretaña</b> moderna.</div>
<pre>
 
  +
Ideal Translation: <div>Una Gran Bretaña <b>moderna</b>.</div>
source: '<p>A <b>Japanese</b> <i>BBC</i> article</p>',
 
target: '<p>Un artículo de <i>BBC</i> <b>japonés</b></p>',
+
After wordbound blanks: <div>Una Gran Bretaña <b>moderna</b> .</div>
   
source: '<div>A <b>modern</b> Britain.</div>',
+
Source: <p>The <b>big <i>red</i></b> dog</p>
  +
Current Translation: <p>El <b>perro <i>rojo</i></b> grande</p>
target: '<div>Una Gran Bretaña <b>moderna</b>.</div>',
 
  +
Ideal Translation: <p>El perro <b><i>rojo</i></b> <b>grande</b></p>
  +
After wordbound blanks: <p>El perro <b> <i>rojo</i></b> <b>grande</b> </p>
   
source: '<p>The <b>big <i>red</i></b> dog</p>',
+
Source: <p>He said "<i>I tile <a href="x">bathrooms</a>.</i>"</p>
target: '<p>El perro <b><i>rojo</i></b> <b>grande</b></p>',
+
Current Translation: <p> Diga "<i>#I baños <a href="x">de azulejo</a>.</i>"</p>
  +
Ideal Translation: <p>Diga que "<i>enladrillo</i> <i><a href="x">baños</a></i>."</p>
  +
After wordbound blanks: <p>diga "<i>#I <a href="x">baños</a></i> de <i>azulejo.</i>"</p>
   
source: '<p>He said "<i>I tile <a href="x">bathrooms</a>.</i>"</p>',
+
Source: <p>The <b>big red</b> dog</p>
  +
Current Translation: <p>El <b>perro rojo</b> grande</p>
target: '<p>Diga que "<i>enladrillo</i> <i><a href="x">baños</a></i>."</p>',
 
  +
Ideal Translation: <p>El perro <b>rojo grande</b></p>
  +
After wordbound blanks: <p>El perro <b>rojo grande</b> </p>
   
source: '<p>The <b>big red</b> dog</p>',
+
Source: <p>The <b>big</b> <b>red</b> dog</p>
target: '<p>El perro <b>rojo grande</b></p>',
+
Current Translation: <p>El <b>perro</b> <b>rojo</b> grande</p>
  +
Ideal Translation: <p>El perro <b>rojo</b> <b>grande</b></p>
  +
After wordbound blanks: <p>El perro <b>rojo</b> <b>grande</b> </p>
   
source: '<p>The <b>big</b> <b>red</b> dog</p>',
+
Source: <p>The <a href="1">big</a> <a href="2">red</a> dog</p>
target: '<p>El perro <b>rojo</b> <b>grande</b></p>',
+
Current Translation: <p>El <a href="1">perro</a> <a href="2">rojo</a> grande</p>
  +
Ideal Translation: <p>El perro <a href="2">rojo</a> <a href="1">grande</a></p>
  +
After wordbound blanks: <p>El perro <a href="2">rojo</a> <a href="1">grande</a> </p>
   
  +
Source: <p id="8"><span class="cx-segment" data-segmentid="9"><a class="cx-link" data-linkid="17" href="./The_New_York_Times" rel="mw:WikiLink" title="The New York Times">The New York Times</a>, which has an <b>executive editor</b> over the news pages and an <b>editorial page editor</b> over opinion pages.</span></p>
source: '<p>The <a href="1">big</a> <a href="2">red</a> dog</p>',
 
  +
Current Translation: <p id="8"><span class="cx-segment" data-segmentid="9"><a class="cx-link" data-linkid="17" href="./The_New_York_Times" rel="mw:WikiLink" title="The New York Times">The New York Times</a>, el cual tiene un <b>editor ejecutivo</b> sobre las páginas noticiosas y un <b>editor de página del editorial</b> encima páginas de opinión.</span></p>
target: '<p>El perro <a href="2">rojo</a> <a href="1">grande</a></p>',
 
  +
Ideal Translation: <p id="8"><span data-segmentid="9" class="cx-segment"><a title="The New York Times" rel="mw:WikiLink" href="./The_New_York_Times" data-linkid="17" class="cx-link">The New York Times</a>, el cual tiene un <b>editor ejecutivo</b> sobre las páginas noticiosas y un <b>editor de página del editorial</b> encima páginas de opinión.</span></p>
 
source: '<p id="8"><span class="cx-segment" data-segmentid="9"><a class="cx-link" data-linkid="17" href="./The_New_York_Times" rel="mw:WikiLink" title="The New York Times">The New York Times</a>, which has an <b>executive editor</b> over the news pages and an <b>editorial page editor</b> over opinion pages.</span></p>',
+
After wordbound blanks: <p id="8"><span class="cx-segment" data-segmentid="9"><a class="cx-link" data-linkid="17" href="./The_New_York_Times" rel="mw:WikiLink" title="The New York Times">The New York Times</a>, el cual tiene un <b>editor ejecutivo</b> sobre las páginas noticiosas y un <b>editor</b> de <b>página</b> del <b>editorial</b> encima páginas de opinión.</span></p>
4c508d7f6e64
 
target: '<p id="8"><span data-segmentid="9" class="cx-segment"><a title="The New York Times" rel="mw:WikiLink" href="./The_New_York_Times" data-linkid="17" class="cx-link">The New York Times</a>, el cual tiene un <b>editor ejecutivo</b> sobre las páginas noticiosas y un <b>editor de página del editorial</b> encima páginas de opinión.</span></p>',
 
 
# Tino says: There's no text. This would never even reach the pipe.
 
source: '<p id="8"><style>b{color:red;}</style></p>',
 
target: '<p id="8"><style>b{color:red;}</style></p>',
 
 
</pre>
 
</pre>
   
  +
= Previous Attempts =
Pretransfer Tests:
 
<pre>
 
input: [[<i>]]^a<vblex><pres>+c<po># b$ ^a<vblex><pres>+c<po># b$
 
output:[[<i>]]^a# b<vblex><pres>$ [[<i>]]^c<po>$ ^a# b<vblex><pres>$ ^c<po>$
 
</pre>
 
 
== Tests ==
 
 
<pre>
 
Input:
 
The [[t:b:qum2bhp]]big [[t:b:qum2bhp; t:i:M0JZW3Q]]red dog[]
 
 
Transfer Input:
 
^The<det><def><sp>/El<det><def><GD><ND>$ [[t:b:qum2bhp]]^big<adj><sint>/grande<adj><mf>$ [[t:b:qum2bhp; t:i:M0JZW3Q]]^red<adj>/rojo<adj>$ ^dog<n><sg>/perro<n><GD><sg>$
 
 
Transfer Output:
 
^Det_nom_adj_adj<SN><DET><GD><sg>{^el<det><def><3><4>$ [[t:b:qum2bhp]]^perro<n><3><4>$ [[t:b:qum2bhp; t:i:M0JZW3Q]]^rojo<adj><3><4>$ ^grande<adj><mf><4>$}$
 
</pre>
 
 
* https://github.com/unhammer/apertium/blob/blank-handling/tests/pretransfer/__init__.py
 
 
== Previous Attempts ==
 
   
 
* https://wiki.apertium.org/wiki/User:SilentFlame/Progress
 
* https://wiki.apertium.org/wiki/User:SilentFlame/Progress
Line 467: Line 170:
 
* https://github.com/junaidiiith/apertium
 
* https://github.com/junaidiiith/apertium
 
* https://github.com/junaidiiith/Apertium_Code
 
* https://github.com/junaidiiith/Apertium_Code
* Make transfer output the non-inline blanks before the rule output AND Make transfer handle inline-blanks, and ignore <b pos="N">:: work in progress for this and the above: https://github.com/unhammer/apertium/commit/b5c73fbe82544d83a98eb16b921c2fa224f6d40c
+
* https://github.com/unhammer/apertium/commit/b5c73fbe82544d83a98eb16b921c2fa224f6d40c
   
 
References
 
References

Revision as of 08:35, 26 July 2020

This page will follow the development of word bound blanks in the apertium stream format.

Features

Transfer (Pull Request 1, Pull Request 2)

Chunker/Single-stage transfer

  • Wordbound blanks are a part of transfer word as a new side: blank.
  • Are ignored in pattern matching
  • Wordbound blanks are added just before the output LU from the LU that the lem/lemh is clipped from.
  • If the lem/lemh comes from a variable in the output then the balnk come from the LU which the lemma comes from, by tracing its variable assignment in <let>.
  • No regression. Stream without wordbound blanks work as-is.
  • Normal blanks don't move around while wordbound blanks move around.
  • When MLUs are formed the blanks are merged.
  • Tests added
  • If rule pattern has only one LU, the wordbound blank gets output with all output LUs of the rule

Interchunk

  • No change needed as inter chunk doesn't access LUs inside the chunk.

Postchunk

  • Wordbound blanks are ignored in pattern matching
  • Wordbound blanks are added just before the output LU from the LU that the lem/lemh/whole is clipped from.
  • If the lem/lemh comes from a variable in the output then the blank comes from the LU which the lemma comes from, by tracing its variable assignment in .
  • No regression. Stream without wordbound blanks work as-is.
  • Normal blanks don't move around while wordbound blanks move around.
  • When MLUs are formed the blanks are merged.
  • Tests added
  • If rule pattern chunk has only one LU, the wordbound blank gets output with all output LUs of the rule

Pretransfer (Pull Request)

  • Wordbound blanks distribute across parts when compounds are split into individual LUs

Separable (Pull Request)

  • Merge wordbound blanks and add to all LUs in rule output.
  • Works for both autoseq and revautoseq.

Analysis, Biltrans, Generation (Pull Request)

  • Parsing wordbound blanks as normal blanks for analysis, generation, biltrans.
  • Added a test for wordbound blank analysis.

Streamparser (Pull Request)

  • Wordbound blanks parsed as part of a lexical unit in the stream parser.
  • Can be accessed by class member: LexicalUnit.wordbound_blank.

Postgeneration (Pull Request)

  • Wordbound blanks merge when words merge.
  • Wordbound blanks apply to all output words when output of postgen rule are more than input words.
  • No regression for postgeneration without wordbound blanks.
  • Lots of tests added.

Rationale

Wordbound blanks will store information about a lexical unit that can help us with several applications where we want to send information through the pipeline but this information can't be sent as tags because it would break the FST matching in the modules.

We want to store it with a lexical unit, as throughout the pipe lexical units split, merge, delete and get added, and we want that this information distributes over multiple output words, merges on the output words, etc.

Formalism

Wordbound blanks will be denoted by double square brackets and will always appear right before a Lexical Unit.

[[wordboundblank]]^LU<tags>$

If there is no Lexical Unit in the stream (before the morph analyser and after the generator), then we have an end wblank as well.

[[wordboundblank]]word[[/]] word2 word3 [[wordboundblank]]word4[[/]]


Full Pipe Testing

Current Translation Command: apertium-deshtml < html_input-eng.in | apertium -f none -d $PREFIX/apertium-eng-spa eng-spa | apertium-retxt

Wordbound blank with Transfuse Command: tf-html-fragment $PREFIX/apertium-eng-spa/modes/eng-spa.mode < html_input-eng.in

Spanish - Catalan

Source: <p>Es <s>además</s> de Valencia.</p>
Current Translation: <p>És <s>a més de</s>  València.</p>
Ideal Translation: <p>Es <s>además</s> de Valencia.</p>
After wordbound blanks: <p>És <s>*ademà s</s> de València.</p>

Spanish - English

Source: legal <b>persons</b>
Current Translation: Personas jurídicas <b></b>
Ideal Translation: <b>Personas</b> legales
After wordbound blanks:  <b>Personas</b>

legales

Source: I <b>am</b> David
Current Translation:  <b>soy David</b> 
Ideal Translation: <b>Soy</b> David
After wordbound blanks: <b>soy</b> David

Source: <p>Bees <b>cannot</b> swim</p>
Current Translation: <p>Las abejas <b>no pueden</b> nadar</p>
Ideal Translation: <p>Las Abejas <b>no pueden</b> nadar</p>
After wordbound blanks: <p>las abejas <b>no pueden</b> nadar</p>

Source: <a href="Conway">Conway</a> stated that young <a href="children">children</a><i>“understand <a href="Object_permanence">object permanence</a>. <a href="Concealment">Concealed</a> <a href="Object">objects</a> feature in their awareness.”</i><span typeof="mw:Extension/ref"><a href="#ref-5">[5]</a></span><b>(<a href="Nielsen">Nielsen</a> equivalence).</b>
Current Translation: <a href="Conway">*Conway</a> Declaró que los niños <a href="children">jóvenes</a><i>“entienden <a href="Object_permanence">permanencia de objeto</a>. <a href="Concealment">Encubierto</a> <a href="Object">objeta</a> característica en su concienciación.”</i><span typeof="mw:Extension/ref"><a href="#ref-5">[5]</a></span><b>(<a href="Nielsen">*Nielsen</a> equivalencia).</b>
Ideal Translation:
After wordbound blanks: <a href="Conway">*Conway</a> declaró que los <a href="children">niños</a> jóvenes“<i>entienden <a href="Object_permanence">permanencia</a></i> de <i> <a href="Object_permanence">objeto</a></i><i>. <a href="Concealment">Encubierto</a> </i> <i><a href="Object">objeta</a> </i> <i>característica en</i> <i>su concienciación</i><i>.</i>”<span typeof="mw:Extension/ref"><a href="#ref-5">\[</a></span><span typeof="mw:Extension/ref"><a href="#ref-5">5</a></span><span typeof="mw:Extension/ref"><a href="#ref-5">\]</a></span><b>(<a href="Nielsen">*Nielsen</a> </b>

<b>equivalencia)</b>

<b>.</b>

Source: <p><b><i>my sister</i><br/>lives</b> <u>in Wales</u></p>
Current Translation: <p><b><i>Mis vidas</i><br/>de hermana</b> <u>en Gales</u></p>
Ideal Translation:
After wordbound blanks: 

Source: <b>The</b> <i>sister</i>'s <em>dog</em>
Current Translation: <b>El perro</i> de la <em></b> <i>hermana</em>
Ideal Translation:
After wordbound blanks: <p><b><i>Mis</i><br></b> <b>vidas</b> de <b><i>hermana</i><br></b> <u>en Gales</u></p> 

 <em>el perro</em>

From [tests]:

Source: <p>A <b>Japanese</b> <i>BBC</i> article</p>
Current Translation: <p>Una <b>prenda</b> <i>de BBC</i> japonesa</p>
Ideal Translation:
After wordbound blanks: <b>de L</b> <i>a hermana</i><p>Una prenda de <i>BBC</i> <b>japonesa</b> </p>

Source: <div>A <b>modern</b> Britain.</div>
Current Translation: <div>Una <b>Gran Bretaña</b> moderna.</div>
Ideal Translation: <div>Una Gran Bretaña <b>moderna</b>.</div>
After wordbound blanks: <div>Una Gran Bretaña <b>moderna</b> .</div>

Source: <p>The <b>big <i>red</i></b> dog</p>
Current Translation: <p>El <b>perro <i>rojo</i></b> grande</p>
Ideal Translation: <p>El perro <b><i>rojo</i></b> <b>grande</b></p>
After wordbound blanks: <p>El perro <b> <i>rojo</i></b> <b>grande</b> </p>

Source: <p>He said "<i>I tile <a href="x">bathrooms</a>.</i>"</p>
Current Translation: <p> Diga "<i>#I baños <a href="x">de azulejo</a>.</i>"</p>
Ideal Translation: <p>Diga que "<i>enladrillo</i> <i><a href="x">baños</a></i>."</p>
After wordbound blanks: <p>diga "<i>#I <a href="x">baños</a></i> de <i>azulejo.</i>"</p>

Source: <p>The <b>big red</b> dog</p>
Current Translation: <p>El <b>perro rojo</b> grande</p>
Ideal Translation: <p>El perro <b>rojo grande</b></p>
After wordbound blanks: <p>El perro <b>rojo grande</b> </p>

Source: <p>The <b>big</b> <b>red</b> dog</p>
Current Translation:  <p>El <b>perro</b> <b>rojo</b> grande</p>
Ideal Translation: <p>El perro <b>rojo</b> <b>grande</b></p>
After wordbound blanks: <p>El perro <b>rojo</b> <b>grande</b> </p>

Source: <p>The <a href="1">big</a> <a href="2">red</a> dog</p>
Current Translation: <p>El <a href="1">perro</a> <a href="2">rojo</a> grande</p>
Ideal Translation: <p>El perro <a href="2">rojo</a> <a href="1">grande</a></p>
After wordbound blanks: <p>El perro <a href="2">rojo</a> <a href="1">grande</a> </p>

Source: <p id="8"><span class="cx-segment" data-segmentid="9"><a class="cx-link" data-linkid="17" href="./The_New_York_Times" rel="mw:WikiLink" title="The New York Times">The New York Times</a>, which has an <b>executive editor</b> over the news pages and an <b>editorial page editor</b> over opinion pages.</span></p>
Current Translation: <p id="8"><span class="cx-segment" data-segmentid="9"><a class="cx-link" data-linkid="17" href="./The_New_York_Times" rel="mw:WikiLink" title="The New York Times">The New York Times</a>, el cual tiene un <b>editor ejecutivo</b> sobre las páginas noticiosas y un <b>editor de página del editorial</b> encima páginas de opinión.</span></p>
Ideal Translation: <p id="8"><span data-segmentid="9" class="cx-segment"><a title="The New York Times" rel="mw:WikiLink" href="./The_New_York_Times" data-linkid="17" class="cx-link">The New York Times</a>, el cual tiene un <b>editor ejecutivo</b> sobre las páginas noticiosas y un <b>editor de página del editorial</b> encima páginas de opinión.</span></p>
After wordbound blanks: <p id="8"><span class="cx-segment" data-segmentid="9"><a class="cx-link" data-linkid="17" href="./The_New_York_Times" rel="mw:WikiLink" title="The New York Times">The New York Times</a>, el cual tiene un <b>editor ejecutivo</b> sobre las páginas noticiosas y un <b>editor</b> de <b>página</b> del <b>editorial</b> encima páginas de opinión.</span></p>

Previous Attempts

References