Difference between revisions of "Reordering superblanks"
Jump to navigation
Jump to search
(Created page with "Currently there is a major problem with how formatting / superblanks interacts with word/chunk reordering in Apertium. If the input is <pre><a id="foobar" href="http://exam...") |
|||
Line 3: | Line 3: | ||
If the input is |
If the input is |
||
<pre><a id="foobar" href="http://example.com">Foo<b>bar</b>.</a></pre> |
<pre><a id="foobar" href="http://example.com">Foo <b>bar</b>.</a></pre> |
||
and we want to reorder the words, we currently ''only'' reorder the words, and don't touch (or even look at) the blanks, since we don't want to mess up the html, so the output becomes |
and we want to reorder the words, we currently ''only'' reorder the words, and don't touch (or even look at) the blanks, since we don't want to mess up the html, so the output becomes |
||
<pre><a id="foobar" href="http://example.com">Бар<b>фоо</b>.</a></pre> |
<pre><a id="foobar" href="http://example.com">Бар <b>фоо</b>.</a></pre> |
||
but now the bold has shifted from source word "bar" to the target word that was "foo" in the input. |
but now the bold has shifted from source word "bar" to the target word that was "foo" in the input. |
||
Ideally, the output should be |
|||
<pre><a id="foobar" href="http://example.com"><b>Бар</b> фоо.</a></pre> |
|||
All language pairs either do this, or have a possibility of messing up the format: |
All language pairs either do this, or have a possibility of messing up the format: |
Revision as of 18:30, 25 May 2014
Currently there is a major problem with how formatting / superblanks interacts with word/chunk reordering in Apertium.
If the input is
<a id="foobar" href="http://example.com">Foo <b>bar</b>.</a>
and we want to reorder the words, we currently only reorder the words, and don't touch (or even look at) the blanks, since we don't want to mess up the html, so the output becomes
<a id="foobar" href="http://example.com">Бар <b>фоо</b>.</a>
but now the bold has shifted from source word "bar" to the target word that was "foo" in the input.
Ideally, the output should be
<a id="foobar" href="http://example.com"><b>Бар</b> фоо.</a>
All language pairs either do this, or have a possibility of messing up the format:
$ echo '<i>Perro</i> <b>blanco</b>' |apertium es-en -f html <i>White</i> <b>dog</b>