https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/Automatic_diacritic_restoration&feed=atom&action=history
Ideas for Google Summer of Code/Automatic diacritic restoration - Revision history
2024-03-29T00:46:19Z
Revision history for this page on the wiki
MediaWiki 1.34.1
https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/Automatic_diacritic_restoration&diff=71357&oldid=prev
Popcorndude: categorize
2020-03-24T19:50:10Z
<p>categorize</p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 19:50, 24 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 10:</td>
<td colspan="2" class="diff-lineno">Line 10:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* D. Yarowsky (1994) "[http://citeseer.ist.psu.edu/rd/43728582%2C73251%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/1083/http:zSzzSzwww.cs.jhu.eduzSz%7EyarowskyzSzpubszSzkluwerbook.pdf/yarowsky94comparison.pdf A Comparison Of Corpus-Based Techniques For Restoring Accents In Spanish And French Text]". ''Proceedings, 2nd annual workshop on very large corpora''. pp. 19--32</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* D. Yarowsky (1994) "[http://citeseer.ist.psu.edu/rd/43728582%2C73251%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/1083/http:zSzzSzwww.cs.jhu.eduzSz%7EyarowskyzSzpubszSzkluwerbook.pdf/yarowsky94comparison.pdf A Comparison Of Corpus-Based Techniques For Restoring Accents In Spanish And French Text]". ''Proceedings, 2nd annual workshop on very large corpora''. pp. 19--32</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* K. Scannell (2010) "[http://borel.slu.edu/pub/lre.pdf Statistical Unicodification of African Languages]". Submitted for publication.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* K. Scannell (2010) "[http://borel.slu.edu/pub/lre.pdf Statistical Unicodification of African Languages]". Submitted for publication.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Ideas_for_Google_Summer_of_Code]]</div></td>
</tr>
</table>
Popcorndude
https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/Automatic_diacritic_restoration&diff=24648&oldid=prev
Unhammer: plug accentuate.us =P
2011-03-28T13:24:36Z
<p>plug accentuate.us =P</p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 13:24, 28 March 2011</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[User:Kevin Scannell|Kevin&nbsp;Scannell]] has a Perl implementation of various statistical restoration algorithms called [http://sourceforge.net/projects/lingala/ charlifter], which has been trained for more than 100 languages using web crawled data. Details are in his paper linked below. You can try the system [http://logipam.org/charlifter/index.php here]. </div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[User:Kevin Scannell|Kevin&nbsp;Scannell]] has a Perl implementation of various statistical restoration algorithms called [http://sourceforge.net/projects/lingala/ charlifter], which has been trained for more than 100 languages using web crawled data. Details are in his paper linked below. You can try the system [http://logipam.org/charlifter/index.php here]<ins class="diffchange diffchange-inline"> (or [http://accentuate</ins>.<ins class="diffchange diffchange-inline">us/</ins> <ins class="diffchange diffchange-inline">install</ins> <ins class="diffchange diffchange-inline">the Firefox extension here]).</ins></div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A port of the algorithm to C++ should be easy. The more subtle issue is to optimize smoothing of the statistical models on a language-by-language basis. </div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A port of the algorithm to C++ should be easy. The more subtle issue is to optimize smoothing of the statistical models on a language-by-language basis. </div></td>
</tr>
</table>
Unhammer
https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/Automatic_diacritic_restoration&diff=17308&oldid=prev
Kevin Scannell at 18:17, 13 March 2010
2010-03-13T18:17:21Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 18:17, 13 March 2010</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[User:Kevin Scannell|Kevin&nbsp;Scannell]] has a Perl implementation of various statistical restoration algorithms called [http://sourceforge.net/projects/lingala/ charlifter], which has been trained for more than 100 languages using web crawled data. Details are in his paper linked below. You can try the system [http://logipam.org/charlifter/index.php here]. </div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>A port of the algorithm to C++ should be easy. The more subtle issue is to optimize smoothing of the statistical models on a language-by-language basis. </div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>;References</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>;References</div></td>
</tr>
</table>
Kevin Scannell
https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/Automatic_diacritic_restoration&diff=17307&oldid=prev
Kevin Scannell: New "read more" page for diacritic restoration
2010-03-13T18:10:39Z
<p>New "read more" page for diacritic restoration</p>
<p><b>New page</b></p><div><br />
;References<br />
* Simard, Michel (1998). "[http://citeseer.ist.psu.edu/rd/0%2C79937%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/7401/http:zSzzSzwww.iro.umontreal.cazSz%7EsimardmzSzpubzSzemnlp98.pdf/simard98automatic.pdf Automatic Insertion of Accents in French Texts]". ''Proceedings of EMNLP-3. Granada, Spain''.<br />
* Rada F. Mihalcea. (2002). "[http://www.cs.unt.edu/~rada/papers/mihalcea.cicling02.ps Diacritics Restoration: Learning from Letters versus Learning from Words]". ''Lecture Notes in Computer Science'' 2276/2002 pp. 96--113 <br />
* G. De Pauw, P. W. Wagacha; G.M. de Schryver (2007) "[http://tshwanedje.com/publications/tsd2007.pdf Automatic diacritic restoration for resource-scarce languages]". ''Proceedings of Text, Speech and Dialogue, Tenth International Conference''. pp. 170--179<br />
* P.W. Wagacha; G. De Pauw; P.W. Githinji (2006) "[http://aflat.org/files/wagachaetallrec2k6_0.pdf A grapheme-based approach to accent restoration in Gĩkũyũ]". ''Proceedings of the Fifth International Conference on Language Resources and Evaluation''<br />
* D. Yarowsky (1994) "[http://citeseer.ist.psu.edu/rd/43728582%2C73251%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/1083/http:zSzzSzwww.cs.jhu.eduzSz%7EyarowskyzSzpubszSzkluwerbook.pdf/yarowsky94comparison.pdf A Comparison Of Corpus-Based Techniques For Restoring Accents In Spanish And French Text]". ''Proceedings, 2nd annual workshop on very large corpora''. pp. 19--32<br />
* K. Scannell (2010) "[http://borel.slu.edu/pub/lre.pdf Statistical Unicodification of African Languages]". Submitted for publication.</div>
Kevin Scannell