https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&feed=atom&action=history
Talk:Why we trim - Revision history
2024-03-28T08:32:40Z
Revision history for this page on the wiki
MediaWiki 1.34.1
https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&diff=72182&oldid=prev
Unhammer: /* Discussion on compounding, possible tagging solution */
2020-05-11T08:13:58Z
<p><span dir="auto"><span class="autocomment">Discussion on compounding, possible tagging solution</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 08:13, 11 May 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 143:</td>
<td colspan="2" class="diff-lineno">Line 143:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>http://tinodidriksen.com/pisg/freenode/logs/%23apertium/2020-05-03.log</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>http://tinodidriksen.com/pisg/freenode/logs/%23apertium/2020-05-03.log</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>https://sourceforge.net/p/apertium/tickets/88/</div></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&diff=72151&oldid=prev
Unhammer: /* Possible method for keeping surface forms and avoiding trimming */
2020-05-03T16:00:56Z
<p><span dir="auto"><span class="autocomment">Possible method for keeping surface forms and avoiding trimming</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:00, 3 May 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 75:</td>
<td colspan="2" class="diff-lineno">Line 75:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>килмәгәнмен: кил.1SG.NEG + мен, "I didn't arrive/show up", is both a compound and a morphological negative. So, if кил had no translation, outputting the lemma alone would severely alter the meaning.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>килмәгәнмен: кил.1SG.NEG + мен, "I didn't arrive/show up", is both a compound and a morphological negative. So, if кил had no translation, outputting the lemma alone would severely alter the meaning.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>==Discussion on compounding, possible tagging solution==</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:23:00] <Unhammer> khannatanmai, ls modes/@*</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:23:22] <Unhammer> the debug modes starting with @ in them refer to an untrimmed analyser</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:23:51] <Unhammer> you may have to provide the analyser yourself, but that's just copying it from the monolingual dir</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:26:46] <Unhammer> https://i.imgur.com/4XrftuJ.png</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:28:02] <Unhammer> https://i.imgur.com/gfMO0s2.png</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:31:45] <Unhammer> (As the examples show, compounds are a challenge. If both analyser and bidix has boazoguohtun and kommišuvdna, but only analyser has boazoguohtunkommišuvdna, then trimmed will translate it as parts and give an actual translation, whereas untrimmed will give @sourcelemma)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:37:57] <khannatanmai> Unhammer yeah compounds is the only part I'm still not sure about. One way is to still let partially unknown compounds be trimmed</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:39:26] <khannatanmai> maybe the partially known part of a compound can be useful?</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:45:54] <khannatanmai> > (As the examples show, compounds are a challenge. If both analyser and bidix has boazoguohtun and kommišuvdna, but only analyser has boazoguohtunkommišuvdna, then trimmed will translate it as parts and give an actual translation, whereas untrimmed will give @sourcelemma)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:45:54] <khannatanmai> In this case, there wont be any trimming right?</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:46:41] <khannatanmai> boazoguohtunkommišuvdna will be analysed as boazoguohtun+kommišuvdna, since it's in the monodix, and since both the parts are in the bidix, it wont be trimmed and will translate as parts</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[08:46:52] <khannatanmai> I don't see how trimming changes anything here</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:32:07] <Unhammer> khannatanmai, without trimming, boazoguohtunkommišuvdna will be analysed as ^boazoguohtunkommišuvdna<n>$</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:33:08] <Unhammer> Maybe there's a smart way to work around it, I haven't looked deeply into it, but it seems like a real challenge.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:36:44] <khannatanmai> wouldnt the monodix identify it as a compound?</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:36:53] <Unhammer> no</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:37:30] <khannatanmai> I thought boazoguohtunkommišuvdna is analysed as boazoguohtun<xyz>+kommišuvdna<xyz></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:37:51] <khannatanmai> (whatever the lemmas are)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:38:06] <Unhammer> in the untrimmed, you get boazoguohtunkommišuvdna<n><sem_org><sg><nom></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:38:20] <Unhammer> in the trimmed, you get boazoguohtun<n><cmp_sggen><cmp>+kommišuvdna<n><sem_org><sg><nom></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:39:30] <khannatanmai> Alright but then: boazoguohtunkommišuvdna<n><sem_org><sg><nom> these sort of entries shouldnt be in the monodix anyway right? If the word is a compound why is there an entry analysing it as one LU</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:41:38] <khannatanmai> In the untrimmed dictionary we have both:</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:41:39] <khannatanmai> boazoguohtunkommišuvdna -> boazoguohtunkommišuvdna<n><sem_org><sg><nom></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:41:39] <khannatanmai> boazoguohtunkommišuvdna -> boazoguohtun<n><cmp_sggen><cmp>+kommišuvdna<n><sem_org><sg><nom></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:41:41] <khannatanmai> right?</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:42:01] <khannatanmai> trimming trims the first one since the whole compound wont be in the bidix</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:42:10] <Unhammer> Dynamic compound analyses are generally more unsafe, so only used if we can't find a full analysis</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:42:19] <Unhammer> That's why we include both.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:43:04] <khannatanmai> ah ok I wasnt aware of that</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:44:43] <Unhammer> (e.g. on the form "nyrestaurert" (newly restored), I once saw the analysis "nyre+staur+ert" (kidney+stick+pea), until I added newyly-restored to the dixen (and turned off dynamic compounding for the rare word "staur", just in case))</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:47:28] <khannatanmai> whats dynamic compounding again?</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:51:54] <Unhammer> When a form doesn't get a regular analysis (using entries and paradigms etc), it would normally be output as *unknownform. But we can retry by splitting at all points of the string to see if it's analysable as two parts (where each part is analysable in the regular way, but the analyses must have certain special tags that say they're ok to be used in compounds)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:52:38] <Unhammer> If two parts doesn't work, we try three, four</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:52:46] <khannatanmai> okay I didn't even know this existed :p</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:53:02] <khannatanmai> that's interesting</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:53:14] <khannatanmai> and now another problem for eliminating trimming</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:55:20] <Unhammer> (at least that's how lt-proc does it; in hfst that system is normally encoded in the fst with an arc from final to initial and a flag diacritic to restrict analyses, and a higher weight so they're down-prioritised, the effect is the same)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:57:01] <khannatanmai> I was of the opinion that compounds are only analysed as such if the monodix says they're compounds. Didn't know that we try out possible compounds</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:58:22] <khannatanmai> My question is, dynamic compounding happens on unknown words, and if a compound is known, it splits them. What then is the use of boazoguohtunkommišuvdna<n><sem_org><sg><nom> ?</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[09:58:56] <khannatanmai> Because dynamic compounding could give an erroneous compound?</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[10:06:55] <khannatanmai> Alright I see the problem. Will think about it</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[10:55:44] <Unhammer> <+khannatanmai> Because dynamic compounding could give an erroneous compound?</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[10:55:47] <Unhammer> Yes</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[10:56:26] <Unhammer> But also, it might have a different translation than the compositional one</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[10:57:18] <khannatanmai> but even if the compositional one is wrong, you would've already trimmed out the non-compositional one right</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[10:57:38] <khannatanmai> Do you think there's a solution which involves keeping both of them and making a decision later in the pipeline</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[10:58:09] <Unhammer> Could be</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[10:58:43] <khannatanmai> I was trying hard to think of solutions which decouple the source from the target but I guess in cases like these we cant avoid a bidix dependency</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:00:44] <Unhammer> yeah, the immediate issue is that we would want bidix info in order to make the least bad translation</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:00:45] <Unhammer> but disambuation happens before bidix</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:03:10] <khannatanmai> yeah. Philosophically, disambiguation should handle these things and not the bidix, but even in the discussion with Francis we thought that using the bidix is far more practical than making the disambiguation better</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:03:40] <khannatanmai> Cause ideally, what we can translate and what we can't, shouldn't affect source analysis decisions</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:06:22] <Unhammer> Given some ambig analysis AB/A<tags1>+B<tags2>/AB<tags3>, if we pick AB<tags3> in cg, we might end up with an @AB (if AB is not in bidix) where A+B would have given an actual translation. But if AB *is* in bidix, and we pick A+B, we might get a worse translation than we could have (e.g. "fotballpøbel" turning into "football punk" instead of "hooligan") if we knew that AB was in bidix.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:07:18] <Unhammer> Could it be a possibility to *tag* all analyses that are (not) in bidix?</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:07:40] <khannatanmai> Yeah Francis was talking about weighing analyses based on the bidix</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:07:46] <Unhammer> exactly</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:08:08] <khannatanmai> it's still a dependency, so philosophically similar to trimming, but I guess it's better than throwing the analyses away</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:09:16] <Unhammer> The existing trimming apparatus could probably be fairly easily used for that – currently, it slices off that part of the analyser, so you have fst's UNTRIMMED and TRIMMED, you could do UNTRIMMED - TRIMMED = NOTINBIDIX, and put a tag/weight on the last one and then union. Or something like that, many ways to do that.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:11:00] <khannatanmai> yup that could work. in a few cases we can use the tags to effectively emulate trimming, and in other cases we can use the analyses from LUs that aren't in the bidix</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[11:13:52] <khannatanmai> simultaneously we implement the stream modification to have secondary tags and surface form, and overall we can keep the benefits of trimming and remove the disadvantages</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>http://tinodidriksen.com/pisg/freenode/logs/%23apertium/2020-05-03.log</div></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&diff=40327&oldid=prev
Unhammer: /* Keeping the surface form until bidix */
2013-04-12T16:58:00Z
<p><span dir="auto"><span class="autocomment">Keeping the surface form until bidix</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:58, 12 April 2013</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 34:</td>
<td colspan="2" class="diff-lineno">Line 34:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Currently pretransfer does this to multiwords with both + and #:</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Currently pretransfer does this to multiwords with both + and #:</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>+<del class="diffchange diffchange-inline">lm</del><n>#queue$ → ^sf/lm#queue<v>$ ^<del class="diffchange diffchange-inline">lm</del><n>$</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>+<ins class="diffchange diffchange-inline">lm1</ins><n>#queue$ → ^sf/lm#queue<v>$ ^<ins class="diffchange diffchange-inline">lm1</ins><n>$</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>#queue+<del class="diffchange diffchange-inline">lm</del><n>$ → ^sf/lm#queue<v>$ ^<del class="diffchange diffchange-inline">lm</del><n>$</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>#queue+<ins class="diffchange diffchange-inline">lm1</ins><n>$ → ^sf/lm#queue<v>$ ^<ins class="diffchange diffchange-inline">lm1</ins><n>$</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>We would want it to do this:</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>We would want it to do this:</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>+<del class="diffchange diffchange-inline">lm</del><n>#queue$ → ^sf/lm#queue<v>+<del class="diffchange diffchange-inline">lm</del><n>$</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>+<ins class="diffchange diffchange-inline">lm1</ins><n>#queue$ → ^sf/lm#queue<v>+<ins class="diffchange diffchange-inline">lm1</ins><n>$</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>#queue+<del class="diffchange diffchange-inline">lm</del><n>$ → ^sf/lm#queue<v>+<del class="diffchange diffchange-inline">lm</del><n>$</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>#queue+<ins class="diffchange diffchange-inline">lm1</ins><n>$ → ^sf/lm#queue<v>+<ins class="diffchange diffchange-inline">lm1</ins><n>$</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>and then let lt-proc output ^sf/@sf$ if either lm#queue&lt;v&gt; or <del class="diffchange diffchange-inline">lm</del>&lt;n&gt; were unknown<del class="diffchange diffchange-inline">.</del></div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>and then let lt-proc output ^sf/@sf$ if either lm#queue&lt;v&gt; or <ins class="diffchange diffchange-inline">lm1</ins>&lt;n&gt; were unknown<ins class="diffchange diffchange-inline">, otherwise ^lm#queue&lt;v&gt;/lmf#queuef&lt;v&gt;$ ^lm1&lt;n&gt;/lm1f&lt;n&gt;$</ins></div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&diff=40326&oldid=prev
Unhammer: /* Keeping the surface form until bidix */
2013-04-12T16:55:55Z
<p><span dir="auto"><span class="autocomment">Keeping the surface form until bidix</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:55, 12 April 2013</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 42:</td>
<td colspan="2" class="diff-lineno">Line 42:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>#queue+lm<n>$ → ^sf/lm#queue<v>+lm<n>$</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>#queue+lm<n>$ → ^sf/lm#queue<v>+lm<n>$</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>and then let lt-proc output ^sf/@sf$ if either lm#queue&lt;v&gt; or lm&lt;n&gt; were unknown.</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&diff=40325&oldid=prev
Unhammer: /* Keeping the surface form until bidix */
2013-04-12T16:54:50Z
<p><span dir="auto"><span class="autocomment">Keeping the surface form until bidix</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:54, 12 April 2013</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 17:</td>
<td colspan="2" class="diff-lineno">Line 17:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until bidix===</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until bidix===</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>We skip apertium-pretransfer, and do the job of pretransfer in lt-proc. Given a multiword, it has to output it with an @ if any one part of it is unknown, otherwise it outputs each analysis split like pretransfer would.</div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | <del class="diffchange diffchange-inline">lt</del>-<del class="diffchange diffchange-inline">proc</del> -o<del class="diffchange diffchange-inline"> nb-nn.autobil.bin</del></div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | <ins class="diffchange diffchange-inline">apertium</ins>-<ins class="diffchange diffchange-inline">pretransfer</ins> -o</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^vannmagasin/vann<n><nt><sg><ind><cmp>+magasin<n><nt><sg><ind>$</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div># apertium-pretransfer would currently output this as two units, needs a slight change</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer -o | lt-proc -o nb-nn.autobil.bin</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vannmagasin/@vannmagasin$</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vannmagasin/@vannmagasin$</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannkoker | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | lt-proc -o nb-nn.autobil.bin</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannkoker | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob<ins class="diffchange diffchange-inline"> | apertium-pretransfer -o</ins> | lt-proc -o nb-nn.autobil.bin</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vann<n><nt><sg><ind><cmp>/vatn<n><nt><sg><ind><cmp>$ ^koker<n><m><sg><ind>/kokar<n><m><sg><ind>$</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vann<n><nt><sg><ind><cmp>/vatn<n><nt><sg><ind><cmp>$ ^koker<n><m><sg><ind>/kokar<n><m><sg><ind>$</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>(Perhaps a version of apertium-pretransfer should still run to deal with &lt;g&gt; / #, just not &lt;j&gt; / + mwe's.)</div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>This would avoid the need to trim monodix to bidix, and we'd still have surface forms for words that are not in bidix, or even partially not in bidix. We'd also avoid outputting lemmas. </div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>This would avoid the need to trim monodix to bidix, and we'd still have surface forms for words that are not in bidix, or even partially not in bidix. We'd also avoid outputting lemmas. </div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Generation errors would still be a problem, but since that's a monodix it will in general be larger anyway.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Generation errors would still be a problem, but since that's a monodix it will in general be larger anyway.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Currently pretransfer does this to multiwords with both + and #:</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>+lm<n>#queue$ → ^sf/lm#queue<v>$ ^lm<n>$</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>#queue+lm<n>$ → ^sf/lm#queue<v>$ ^lm<n>$</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>We would want it to do this:</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>+lm<n>#queue$ → ^sf/lm#queue<v>+lm<n>$</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^sf/lm<v>#queue+lm<n>$ → ^sf/lm#queue<v>+lm<n>$</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&diff=40324&oldid=prev
Unhammer: wops
2013-04-12T16:21:56Z
<p>wops</p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:21, 12 April 2013</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 19:</td>
<td colspan="2" class="diff-lineno">Line 19:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>We skip apertium-pretransfer, and do the job of pretransfer in lt-proc. Given a multiword, it has to output it with an @ if any one part of it is unknown, otherwise it outputs each analysis split like pretransfer would.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>We skip apertium-pretransfer, and do the job of pretransfer in lt-proc. Given a multiword, it has to output it with an @ if any one part of it is unknown, otherwise it outputs each analysis split like pretransfer would.</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob<del class="diffchange diffchange-inline"> | apertium-pretransfer</del> | lt-proc -o nb-nn.autobil.bin</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | lt-proc -o nb-nn.autobil.bin</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vannmagasin/@vannmagasin$</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vannmagasin/@vannmagasin$</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannkoker | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob<del class="diffchange diffchange-inline"> | apertium-pretransfer</del> | lt-proc -o nb-nn.autobil.bin</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannkoker | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | lt-proc -o nb-nn.autobil.bin</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vann<n><nt><sg><ind><cmp>/vatn<n><nt><sg><ind><cmp>$ ^koker<n><m><sg><ind>/kokar<n><m><sg><ind>$</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vann<n><nt><sg><ind><cmp>/vatn<n><nt><sg><ind><cmp>$ ^koker<n><m><sg><ind>/kokar<n><m><sg><ind>$</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 30:</td>
<td colspan="2" class="diff-lineno">Line 30:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Generation errors would still be a problem, but since that's a monodix it will in general be larger anyway.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Generation errors would still be a problem, but since that's a monodix it will in general be larger anyway.</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&diff=40323&oldid=prev
Unhammer at 16:21, 12 April 2013
2013-04-12T16:21:08Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:21, 12 April 2013</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>{{<del class="diffchange diffchange-inline">tocd</del>}}</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>{{<ins class="diffchange diffchange-inline">TOCD</ins>}}</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&diff=40322&oldid=prev
Unhammer at 16:20, 12 April 2013
2013-04-12T16:20:14Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:20, 12 April 2013</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>{{tocd}}</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==Possible method for keeping surface forms and avoiding trimming==</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==Possible method for keeping surface forms and avoiding trimming==</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A possible way of dealing with keeping surface forms when splitting mwe's: put full surface on the first part, no surface on the rest. Example, assuming "magasin" is missing from bidix:</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A possible way of dealing with keeping surface forms when splitting mwe's: put full surface on the first part, no surface on the rest. Example, assuming "magasin" is missing from bidix:</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 27:</td>
<td colspan="2" class="diff-lineno">Line 30:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Generation errors would still be a problem, but since that's a monodix it will in general be larger anyway.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Generation errors would still be a problem, but since that's a monodix it will in general be larger anyway.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&diff=40321&oldid=prev
Unhammer at 16:19, 12 April 2013
2013-04-12T16:19:18Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:19, 12 April 2013</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>==Possible method for keeping surface forms and avoiding trimming==</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A possible way of dealing with keeping surface forms when splitting mwe's: put full surface on the first part, no surface on the rest. Example, assuming "magasin" is missing from bidix:</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A possible way of dealing with keeping surface forms when splitting mwe's: put full surface on the first part, no surface on the rest. Example, assuming "magasin" is missing from bidix:</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 5:</td>
<td colspan="2" class="diff-lineno">Line 6:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vannmagasin/vann<n><nt><sg><ind><cmp>+magasin<n><nt><sg><ind>$</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vannmagasin/vann<n><nt><sg><ind><cmp>+magasin<n><nt><sg><ind>$</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>From here on there are two possibilities: try to keep the surface form until bidix, or try to keep it all the way until generation. </div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* If we keep it until bidix, we can output the surface form if it's not in bidix, and avoid trimming monodix to bidix, but we can still end up with generation errors where bidix is contains words not in generation.dix, but it's better than nothing. </div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* If we could keep it until generation, we could output the original surface form even if none of the dictionaries were trimmed.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until bidix===</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>We skip apertium-pretransfer, and do the job of pretransfer in lt-proc. Given a multiword, it has to output it with an @ if any one part of it is unknown, otherwise it outputs each analysis split like pretransfer would.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer | lt-proc -o nb-nn.autobil.bin</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^vannmagasin/@vannmagasin$</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannkoker | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer | lt-proc -o nb-nn.autobil.bin</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>^vann<n><nt><sg><ind><cmp>/vatn<n><nt><sg><ind><cmp>$ ^koker<n><m><sg><ind>/kokar<n><m><sg><ind>$</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>(Perhaps a version of apertium-pretransfer should still run to deal with &lt;g&gt; / #, just not &lt;j&gt; / + mwe's.)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>This would avoid the need to trim monodix to bidix, and we'd still have surface forms for words that are not in bidix, or even partially not in bidix. We'd also avoid outputting lemmas. </div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Generation errors would still be a problem, but since that's a monodix it will in general be larger anyway.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>===Keeping the surface form until generation===</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Say you do:</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><pre></div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>$ echo vannmagasin | lt-proc -we nb-nn.automorf.bin | apertium-tagger -gp nb-nn.prob | apertium-pretransfer</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vannmagasin/vann<n><nt><sg><ind><cmp>$ ^/magasin<n><nt><sg><ind>$</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>^vannmagasin/vann<n><nt><sg><ind><cmp>$ ^/magasin<n><nt><sg><ind>$</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 31:</td>
<td colspan="2" class="diff-lineno">Line 56:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">Difficult</del> example from Kazakh:</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">A difficult</ins> example from Kazakh:</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>килмәгәнмен: кил.1SG.NEG + мен, "I didn't arrive/show up", is both a compound and a morphological negative. So, if кил had no translation, outputting the lemma alone would severely alter the meaning.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>килмәгәнмен: кил.1SG.NEG + мен, "I didn't arrive/show up", is both a compound and a morphological negative. So, if кил had no translation, outputting the lemma alone would severely alter the meaning.</div></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Talk:Why_we_trim&diff=40312&oldid=prev
Unhammer at 12:58, 12 April 2013
2013-04-12T12:58:04Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 12:58, 12 April 2013</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 33:</td>
<td colspan="2" class="diff-lineno">Line 33:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Difficult example from Kazakh:</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Difficult example from Kazakh:</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>килмәгәнмен: кил.1SG.NEG + мен, "I didn't show up", is both a compound and a morphological negative <del class="diffchange diffchange-inline">(so</del> outputting the lemma<del class="diffchange diffchange-inline"> "come"</del> alone <del class="diffchange diffchange-inline">severly</del> <del class="diffchange diffchange-inline">alters</del> the meaning<del class="diffchange diffchange-inline">)</del></div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>килмәгәнмен: кил.1SG.NEG + мен, "I didn't <ins class="diffchange diffchange-inline">arrive/</ins>show up", is both a compound and a morphological negative<ins class="diffchange diffchange-inline">.</ins> <ins class="diffchange diffchange-inline">So, if кил had no translation,</ins> outputting the lemma alone <ins class="diffchange diffchange-inline">would severely</ins> <ins class="diffchange diffchange-inline">alter</ins> the meaning<ins class="diffchange diffchange-inline">.</ins></div></td>
</tr>
</table>
Unhammer