https://wiki.apertium.org/w/index.php?title=Somewhere&feed=atom&action=history
Somewhere - Revision history
2024-03-29T12:56:55Z
Revision history for this page on the wiki
MediaWiki 1.34.1
https://wiki.apertium.org/w/index.php?title=Somewhere&diff=67869&oldid=prev
Unhammer at 09:09, 8 November 2018
2018-11-08T09:09:04Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 09:09, 8 November 2018</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 9:</td>
<td colspan="2" class="diff-lineno">Line 9:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>If we at least handle NUL's correctly in lt-proc and cg-proc,</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>If we at least handle NUL's correctly in lt-proc and cg-proc,</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>you can <del class="diffchange diffchange-inline">turn</del> <del class="diffchange diffchange-inline">linebreak</del>'s <del class="diffchange diffchange-inline">into</del> <del class="diffchange diffchange-inline">NUL's</del> (first deleting any existing NUL's</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>you can <ins class="diffchange diffchange-inline">append</ins> <ins class="diffchange diffchange-inline">NUL</ins>'s <ins class="diffchange diffchange-inline">after</ins> <ins class="diffchange diffchange-inline">linebreaks</ins> (first deleting any existing NUL's</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>in the corpus) and tag with the -z option to lt-/cg-proc:</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>in the corpus) and tag with the -z option to lt-/cg-proc:</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> cat corpus.txt<del class="diffchange diffchange-inline"> </del> \</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> cat corpus.txt \</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> | tr -d '\0'<del class="diffchange diffchange-inline"> </del> \</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> | tr -d '\0' \</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> | <del class="diffchange diffchange-inline">tr</del> <del class="diffchange diffchange-inline">'\</del>n<del class="diffchange diffchange-inline">' '\0' </del> \</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> | <ins class="diffchange diffchange-inline">apertium-deshtml</ins> <ins class="diffchange diffchange-inline">-</ins>n \</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> | sed 's/\[$/[][/; s/^]/]\x00/' \</div></td>
</tr>
<tr>
<td class="diff-marker"><a class="mw-diff-movedpara-left" title="Paragraph was moved. Click to jump to new location." href="#movedpara_7_0_rhs">⚫</a></td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_5_0_lhs"></a> | <del class="diffchange diffchange-inline">apertium-deshtml</del> -<del class="diffchange diffchange-inline">n </del> \</div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> | lt-proc -z -w '<del class="diffchange diffchange-inline">apertium-tat/</del>tat.automorf.bin'<del class="diffchange diffchange-inline"> </del> \</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> | lt-proc -z -w 'tat.automorf.bin' \</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> | cg-proc -z '<del class="diffchange diffchange-inline">apertium-tat/</del>tat.rlx.bin'<del class="diffchange diffchange-inline"> </del> \</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> | cg-proc -z<ins class="diffchange diffchange-inline"> -w -1</ins> 'tat.rlx.bin' \</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker"><a class="mw-diff-movedpara-right" title="Paragraph was moved. Click to jump to old location." href="#movedpara_5_0_lhs">⚫</a></td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_7_0_rhs"></a> | <ins class="diffchange diffchange-inline">tr</ins> -<ins class="diffchange diffchange-inline">d</ins> <ins class="diffchange diffchange-inline">'\0'</ins> \</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' \</div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> | tr '\0' '\n' \</div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> | apertium-rehtml-noent</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> | apertium-rehtml-noent</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>… finally deleting the NUL's. (Note that NUL's have to appear after the string "[][\n]" for tools like lt-proc to handle them correctly.)</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>… finally turning NUL's back into newlines.</div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>similarly for full pipelines</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>similarly for full pipelines</div></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Somewhere&diff=67868&oldid=prev
Unhammer: Created page with "this should go somewhere on the wiki We should treat NUL as hard separators – if we don't, apertium-apy (and thus www.apertium.org) will risk sending output meant for perso..."
2018-11-06T21:04:16Z
<p>Created page with "this should go somewhere on the wiki We should treat NUL as hard separators – if we don't, apertium-apy (and thus www.apertium.org) will risk sending output meant for perso..."</p>
<p><b>New page</b></p><div>this should go somewhere on the wiki<br />
<br />
We should treat NUL as hard separators – if we don't,<br />
apertium-apy (and thus www.apertium.org) will risk sending output meant<br />
for person1 to person2. (I have an inkling there might still be bugs in<br />
apertium-transfer related to this.)<br />
<br />
This also means we should be able to treat NUL's as "record separators" when e.g. translating a corpus of individual sentences, where we don't want one sentence to affect the translation of the next sentence.<br />
<br />
If we at least handle NUL's correctly in lt-proc and cg-proc,<br />
you can turn linebreak's into NUL's (first deleting any existing NUL's<br />
in the corpus) and tag with the -z option to lt-/cg-proc:<br />
<br />
<br />
cat corpus.txt \<br />
| tr -d '\0' \<br />
| tr '\n' '\0' \<br />
| apertium-deshtml -n \<br />
| lt-proc -z -w 'apertium-tat/tat.automorf.bin' \<br />
| cg-proc -z 'apertium-tat/tat.rlx.bin' \<br />
| cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' \<br />
| tr '\0' '\n' \<br />
| apertium-rehtml-noent<br />
<br />
<br />
… finally turning NUL's back into newlines.<br />
<br />
similarly for full pipelines<br />
<br />
(unfortunately, /usr/bin/apertium can't add the -z's for you, you'll have to grab the pipeline from modes/foo-bar.mode and insert -z's yourself)</div>
Unhammer