https://wiki.apertium.org/w/index.php?title=Aragonese_and_Catalan/Evaluation&feed=atom&action=historyAragonese and Catalan/Evaluation - Revision history2024-03-28T10:37:45ZRevision history for this page on the wikiMediaWiki 1.34.1https://wiki.apertium.org/w/index.php?title=Aragonese_and_Catalan/Evaluation&diff=56150&oldid=prevJuanpabl at 08:55, 16 January 20162016-01-16T08:55:44Z<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 08:55, 16 January 2016</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 60:</td>
<td colspan="2" class="diff-lineno">Line 60:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Percentage of unknown words that were free rides: 32.69 %</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Percentage of unknown words that were free rides: 32.69 %</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div></pre></div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Aragonese and Catalan]]</div></td>
</tr>
</table>Juanpablhttps://wiki.apertium.org/w/index.php?title=Aragonese_and_Catalan/Evaluation&diff=56149&oldid=prevJuanpabl: Created page with "== Version 0.1 (Beta) == === Naïve coverage === ==== arg-cat ==== <pre> $ cat corpus_narrative.txt | sh corpus-stat-arg-cat.sh Number of tokenised words in the corpus: 37844..."2016-01-16T08:54:47Z<p>Created page with "== Version 0.1 (Beta) == === Naïve coverage === ==== arg-cat ==== <pre> $ cat corpus_narrative.txt | sh corpus-stat-arg-cat.sh Number of tokenised words in the corpus: 37844..."</p>
<p><b>New page</b></p><div>== Version 0.1 (Beta) ==<br />
<br />
=== Naïve coverage ===<br />
==== arg-cat ====<br />
<pre><br />
$ cat corpus_narrative.txt | sh corpus-stat-arg-cat.sh<br />
Number of tokenised words in the corpus: 378440<br />
Number of known words in the corpus: 337924<br />
Coverage: 89.3 %<br />
<br />
$ cat sentencelistanwiki.txt | sh corpus-stat-arg-cat.sh<br />
Number of tokenised words in the corpus: 2673751<br />
Number of known words in the corpus: 2344686<br />
Coverage: 87.7 %<br />
</pre><br />
==== cat-arg ====<br />
<pre><br />
$ cat ../apertium-es-ca/ca-tagger-data/ca.tagged.txt | sh corpus-stat-cat-arg.sh<br />
Number of tokenised words in the corpus: 24590<br />
Number of known words in the corpus: 22919<br />
Coverage: 93.2 %<br />
<br />
trunk/apertium-eo-ca/tekstaro/ca.crp.txt<br />
$ cat ca.crp.txt | sed 's/^ *[0123456789]*\.//g'| sh ./corpus-stat-cat-arg.sh<br />
Number of tokenised words in the corpus: 567608<br />
Number of known words in the corpus: 497165<br />
Coverage: 87.6 %<br />
</pre><br />
=== Translation Quality ===<br />
==== cat-arg ====<br />
<pre><br />
$../apertium-eval-translator/apertium-eval-translator.pl -test MT.txt -ref postedit.txt<br />
Test file: 'MT.txt'<br />
Reference file 'postedit.txt'<br />
<br />
Statistics about input files<br />
-------------------------------------------------------<br />
Number of words in reference: 1311<br />
Number of words in test: 1315<br />
Number of unknown words (marked with a star) in test: 156<br />
Percentage of unknown words: 11.86 %<br />
<br />
Results when removing unknown-word marks (stars)<br />
-------------------------------------------------------<br />
Edit distance: 203<br />
Word error rate (WER): 15.48 %<br />
Number of position-independent correct words: 1132<br />
Position-independent word error rate (PER): 13.96 %<br />
<br />
Results when unknown-word marks (stars) are not removed<br />
-------------------------------------------------------<br />
Edit distance: 254<br />
Word Error Rate (WER): 19.37 %<br />
Number of position-independent correct words: 1081<br />
Position-independent word error rate (PER): 17.85 %<br />
<br />
Statistics about the translation of unknown words<br />
-------------------------------------------------------<br />
Number of unknown words which were free rides: 51<br />
Percentage of unknown words that were free rides: 32.69 %<br />
</pre></div>Juanpabl