<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=Evaluation</id>
	<title>Evaluation - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=Evaluation"/>
	<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;action=history"/>
	<updated>2026-05-05T08:27:48Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.34.1</generator>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=75673&amp;oldid=prev</id>
		<title>Unhammer at 10:23, 3 September 2024</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=75673&amp;oldid=prev"/>
		<updated>2024-09-03T10:23:49Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 10:23, 3 September 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 8:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 8:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how many word [[N-gram]]&#039;s are common to the MT output and one or more reference translations (see [http://en.wikipedia.org/wiki/BLEU Wikipedia on Bleu], [https://en.wikipedia.org/wiki/METEOR Meteor] or [http://en.wikipedia.org/wiki/NIST_%28metric%29 NIST]), here &#039;&#039;&#039;higher scores are better&#039;&#039;&#039; &lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how many word [[N-gram]]&#039;s are common to the MT output and one or more reference translations (see [http://en.wikipedia.org/wiki/BLEU Wikipedia on Bleu], [https://en.wikipedia.org/wiki/METEOR Meteor] or [http://en.wikipedia.org/wiki/NIST_%28metric%29 NIST]), here &#039;&#039;&#039;higher scores are better&#039;&#039;&#039; &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how many character [[N-gram]]&#039;s are common to MT output and a post-edit (the Fuzzy Match score, an unordered comparison with the Sørensen–Dice coefficient).[http://amtaweb.org/wp-content/uploads/2015/10/MTSummitXV_ResearchTrack.pdf#page=138]&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how many character [[N-gram]]&#039;s are common to MT output and a post-edit (the Fuzzy Match score, an unordered comparison with the Sørensen–Dice coefficient).[http://amtaweb.org/wp-content/uploads/2015/10/MTSummitXV_ResearchTrack.pdf#page=138]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;** or the [&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;http&lt;/del&gt;://&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;www.aclweb&lt;/del&gt;.org&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;/anthology/W&lt;/del&gt;/W15/&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;W15-30.pdf#page=412&lt;/del&gt; Character N-gram F-score] (code at https://github.com/Waino/chrF)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;** or the [&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;https&lt;/ins&gt;://&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;aclanthology&lt;/ins&gt;.org/W15&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;-3049&lt;/ins&gt;/ Character N-gram F-score] (code at https://github.com/Waino/chrF)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how well a user &#039;&#039;understands&#039;&#039; the message of the original text (this typically requires an experiment with real human subjects, see [[Assimilation Evaluation Toolkit]] which lets you make gap-filling tests).&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how well a user &#039;&#039;understands&#039;&#039; the message of the original text (this typically requires an experiment with real human subjects, see [[Assimilation Evaluation Toolkit]] which lets you make gap-filling tests).&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* user-interviews to find the subjective experience of using the translator for their task (whether post-editing or gisting)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* user-interviews to find the subjective experience of using the translator for their task (whether post-editing or gisting)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Unhammer</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=75672&amp;oldid=prev</id>
		<title>Unhammer: meteor</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=75672&amp;oldid=prev"/>
		<updated>2024-09-03T10:17:42Z</updated>

		<summary type="html">&lt;p&gt;meteor&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 10:17, 3 September 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 6:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 6:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how many words need to be changed before a text is publication-ready (Word-Error Rate, see [http://en.wikipedia.org/wiki/Word_error_rate Wikipedia on WER]), here &#039;&#039;&#039;lower scores are better&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how many words need to be changed before a text is publication-ready (Word-Error Rate, see [http://en.wikipedia.org/wiki/Word_error_rate Wikipedia on WER]), here &#039;&#039;&#039;lower scores are better&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;* how many word [[N-gram]]&#039;s are common to the MT output and one or more reference translations (see [http://en.wikipedia.org/wiki/BLEU Wikipedia on Bleu] or [http://en.wikipedia.org/wiki/NIST_%28metric%29 NIST]), here &#039;&#039;&#039;higher scores are better&#039;&#039;&#039; &lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* how many word [[N-gram]]&#039;s are common to the MT output and one or more reference translations (see [http://en.wikipedia.org/wiki/BLEU Wikipedia on Bleu&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;], [https://en.wikipedia.org/wiki/METEOR Meteor&lt;/ins&gt;] or [http://en.wikipedia.org/wiki/NIST_%28metric%29 NIST]), here &#039;&#039;&#039;higher scores are better&#039;&#039;&#039; &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how many character [[N-gram]]&#039;s are common to MT output and a post-edit (the Fuzzy Match score, an unordered comparison with the Sørensen–Dice coefficient).[http://amtaweb.org/wp-content/uploads/2015/10/MTSummitXV_ResearchTrack.pdf#page=138]&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how many character [[N-gram]]&#039;s are common to MT output and a post-edit (the Fuzzy Match score, an unordered comparison with the Sørensen–Dice coefficient).[http://amtaweb.org/wp-content/uploads/2015/10/MTSummitXV_ResearchTrack.pdf#page=138]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;** or the [http://www.aclweb.org/anthology/W/W15/W15-30.pdf#page=412 Character N-gram F-score] (code at https://github.com/Waino/chrF)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;** or the [http://www.aclweb.org/anthology/W/W15/W15-30.pdf#page=412 Character N-gram F-score] (code at https://github.com/Waino/chrF)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Unhammer</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=73785&amp;oldid=prev</id>
		<title>Unhammer at 09:13, 21 November 2021</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=73785&amp;oldid=prev"/>
		<updated>2021-11-21T09:13:21Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 09:13, 21 November 2021&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 3:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 3:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;Most evaluations focus on numerical metrics like WER, which make the most sense when done on fresh post-edits. WER gives less information when run on pretranslated text (due to multiple possible translations, and MT shaping the translation output). WER is quite far from the task of &#039;&#039;gisting&#039;&#039;, where cloze-like tests may be a more useful metric, and interviews may give more useful information. Beware of overfitting and [https://www.nngroup.com/articles/campbells-law/ Campbell&#039;s law].&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;Most evaluations focus on numerical metrics like WER, which make the most sense when done on fresh post-edits. WER gives less information when run on pretranslated text (due to multiple possible translations, and MT shaping the translation output). WER is quite far from the task of &#039;&#039;gisting&#039;&#039;, where cloze-like tests may be a more useful metric, and interviews may give more useful information. Beware of overfitting and [https://www.nngroup.com/articles/campbells-law/ Campbell&#039;s law].&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;Common evaluation measures are:&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Common evaluation measures&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt; and methods&lt;/ins&gt; are:&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how many words need to be changed before a text is publication-ready (Word-Error Rate, see [http://en.wikipedia.org/wiki/Word_error_rate Wikipedia on WER]), here &#039;&#039;&#039;lower scores are better&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how many words need to be changed before a text is publication-ready (Word-Error Rate, see [http://en.wikipedia.org/wiki/Word_error_rate Wikipedia on WER]), here &#039;&#039;&#039;lower scores are better&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Unhammer</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=73784&amp;oldid=prev</id>
		<title>Unhammer at 09:13, 21 November 2021</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=73784&amp;oldid=prev"/>
		<updated>2021-11-21T09:13:04Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 09:13, 21 November 2021&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;Most evaluations focus on numerical metrics like WER, which make the most sense when done on fresh post-edits. WER gives less information when run on pretranslated text (due to multiple possible translations, and MT shaping the translation output). WER is quite far from the task of &#039;&#039;gisting&#039;&#039;, where cloze-like tests may be a more useful metric, and interviews may give more useful information. Beware of overfitting and [https://www.nngroup.com/articles/campbells-law/ Campbell&#039;s law].&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;Most evaluations focus on numerical metrics like WER, which make the most sense when done on fresh post-edits. WER gives less information when run on pretranslated text (due to multiple possible translations, and MT shaping the translation output). WER is quite far from the task of &#039;&#039;gisting&#039;&#039;, where cloze-like tests may be a more useful metric, and interviews may give more useful information. Beware of overfitting and [https://www.nngroup.com/articles/campbells-law/ Campbell&#039;s law].&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Common evaluation measures are:&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how many words need to be changed before a text is publication-ready (Word-Error Rate, see [http://en.wikipedia.org/wiki/Word_error_rate Wikipedia on WER]), here &#039;&#039;&#039;lower scores are better&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how many words need to be changed before a text is publication-ready (Word-Error Rate, see [http://en.wikipedia.org/wiki/Word_error_rate Wikipedia on WER]), here &#039;&#039;&#039;lower scores are better&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 7:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 9:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how many character [[N-gram]]&#039;s are common to MT output and a post-edit (the Fuzzy Match score, an unordered comparison with the Sørensen–Dice coefficient).[http://amtaweb.org/wp-content/uploads/2015/10/MTSummitXV_ResearchTrack.pdf#page=138]&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how many character [[N-gram]]&#039;s are common to MT output and a post-edit (the Fuzzy Match score, an unordered comparison with the Sørensen–Dice coefficient).[http://amtaweb.org/wp-content/uploads/2015/10/MTSummitXV_ResearchTrack.pdf#page=138]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;** or the [http://www.aclweb.org/anthology/W/W15/W15-30.pdf#page=412 Character N-gram F-score] (code at https://github.com/Waino/chrF)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;** or the [http://www.aclweb.org/anthology/W/W15/W15-30.pdf#page=412 Character N-gram F-score] (code at https://github.com/Waino/chrF)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;* how well a user &#039;&#039;understands&#039;&#039; the message of the original text (this typically requires an experiment with real human subjects, see [[Assimilation Evaluation Toolkit]]).&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* how well a user &#039;&#039;understands&#039;&#039; the message of the original text (this typically requires an experiment with real human subjects, see [[Assimilation Evaluation Toolkit]]&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt; which lets you make gap-filling tests&lt;/ins&gt;).&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* user-interviews to find the subjective experience of using the translator for their task (whether post-editing or gisting)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Unhammer</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=73783&amp;oldid=prev</id>
		<title>Unhammer at 09:10, 21 November 2021</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=73783&amp;oldid=prev"/>
		<updated>2021-11-21T09:10:10Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 09:10, 21 November 2021&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Evaluation&#039;&#039;&#039; can give you some idea as to how well a language pair works in practice. There are many ways to evaluate, and the test chosen should depend on the [[Assimilation and dissemination|intended use of the language pair]]&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;:&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Evaluation&#039;&#039;&#039; can give you some idea as to how well a language pair works in practice. There are many ways to evaluate, and the test chosen should depend on the [[Assimilation and dissemination|intended use of the language pair]]&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Most evaluations focus on numerical metrics like WER, which make the most sense when done on fresh post-edits. WER gives less information when run on pretranslated text (due to multiple possible translations, and MT shaping the translation output). WER is quite far from the task of &#039;&#039;gisting&#039;&#039;, where cloze-like tests may be a more useful metric, and interviews may give more useful information. Beware of overfitting and [https://www.nngroup.com/articles/campbells-law/ Campbell&#039;s law].&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how many words need to be changed before a text is publication-ready (Word-Error Rate, see [http://en.wikipedia.org/wiki/Word_error_rate Wikipedia on WER]), here &#039;&#039;&#039;lower scores are better&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how many words need to be changed before a text is publication-ready (Word-Error Rate, see [http://en.wikipedia.org/wiki/Word_error_rate Wikipedia on WER]), here &#039;&#039;&#039;lower scores are better&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 6:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 8:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;** or the [http://www.aclweb.org/anthology/W/W15/W15-30.pdf#page=412 Character N-gram F-score] (code at https://github.com/Waino/chrF)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;** or the [http://www.aclweb.org/anthology/W/W15/W15-30.pdf#page=412 Character N-gram F-score] (code at https://github.com/Waino/chrF)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how well a user &#039;&#039;understands&#039;&#039; the message of the original text (this typically requires an experiment with real human subjects, see [[Assimilation Evaluation Toolkit]]).&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how well a user &#039;&#039;understands&#039;&#039; the message of the original text (this typically requires an experiment with real human subjects, see [[Assimilation Evaluation Toolkit]]).&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Unhammer</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=73627&amp;oldid=prev</id>
		<title>Popcorndude: /* See also */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=73627&amp;oldid=prev"/>
		<updated>2021-07-23T20:32:33Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;See also&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 20:32, 23 July 2021&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 85:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 85:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==See also==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==See also==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* [[Assimilation Evaluation Toolkit]] / [[Ideas for Google Summer of Code/Apertium assimilation evaluation toolkit]]&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* [[Assimilation Evaluation Toolkit]] / [[Ideas for Google Summer of Code/Apertium assimilation evaluation toolkit]]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;* [[&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Regression testing&lt;/del&gt;]]&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* [[&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Apertium-regtest&lt;/ins&gt;]]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* [[Quality control]]&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* [[Quality control]]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* [[Calculating coverage]]&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* [[Calculating coverage]]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Popcorndude</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=67922&amp;oldid=prev</id>
		<title>Unhammer: /* Using apertium-eval-translator for WER and PER */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=67922&amp;oldid=prev"/>
		<updated>2018-11-16T10:44:49Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Using apertium-eval-translator for WER and PER&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 10:44, 16 November 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 15:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 15:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==Using apertium-eval-translator for WER and PER==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==Using apertium-eval-translator for WER and PER==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;[&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;http&lt;/del&gt;://&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;svn&lt;/del&gt;.&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;code.sf.net/p&lt;/del&gt;/apertium&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;/svn/trunk&lt;/del&gt;/apertium-eval-translator&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;/&lt;/del&gt; apertium-eval-translator] is a script written in &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Perl. It&lt;/del&gt; calculates the word error rate (WER) and the position-independent word error rate (PER) between a translation performed by an Apertium-based MT system and its human-corrected translation at document level. Although it has been designed to evaluate Apertium-based systems, it can be easily adapted to evaluate other MT systems.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;[&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;https&lt;/ins&gt;://&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;github&lt;/ins&gt;.&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;com&lt;/ins&gt;/apertium/apertium-eval-translator apertium-eval-translator&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;.pl&lt;/ins&gt;] is a script written in &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;that&lt;/ins&gt; calculates the word error rate (WER) and the position-independent word error rate (PER) between a translation performed by an Apertium-based MT system and its human-corrected translation at document level. Although it has been designed to evaluate Apertium-based systems, it can be easily adapted to evaluate other MT systems.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;To use it, first translate a text with apertium, save that into &amp;lt;code&amp;gt;MT.txt&amp;lt;/code&amp;gt;, then manually post-edit that so it looks understandable and grammatical (but trying to avoid major rewrites), save that into &amp;lt;code&amp;gt;postedit.txt&amp;lt;/code&amp;gt;. Then run &amp;lt;code&amp;gt;apertium-eval-translator -test MT.txt -ref postedit.txt&amp;lt;/code&amp;gt; and you&#039;ll see a bunch of numbers indicating how good the translation was, for post-editing.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;To use it, first translate a text with apertium, save that into &amp;lt;code&amp;gt;MT.txt&amp;lt;/code&amp;gt;, then manually post-edit that so it looks understandable and grammatical (but trying to avoid major rewrites), save that into &amp;lt;code&amp;gt;postedit.txt&amp;lt;/code&amp;gt;. Then run &amp;lt;code&amp;gt;apertium-eval-translator -test MT.txt -ref postedit.txt&amp;lt;/code&amp;gt; and you&#039;ll see a bunch of numbers indicating how good the translation was, for post-editing.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;If your text is fairly long (&amp;gt;10k words), the full WER calculation is quite slow. You can speed it up with the -b/-beam option, which will make WER only take N words of context into account. But be sure to make N large enough, otherwise you may get artificially low/high WER. As an example, a 17k word text that took nearly an hour to get the full WER took a few seconds with -b 150 and gave the same result (19.69%), but with -b 5 it gave 73.54% and -b 15 it gave 7.85 %. So if you don&#039;t have time to do a full WER without -beam, there is a wrapper that will increase the beam context N until it seems to stabilise: &amp;lt;code&amp;gt;beam-eval-until-stable -t testfile -r reffile&amp;lt;/code&amp;gt;.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;===Detailed usage===&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;===Detailed usage===&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 28:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 31:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;      -ref|-r      Specify the file with the reference translation &lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;      -ref|-r      Specify the file with the reference translation &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;      -beam|-b     Perform a beam search by looking only to the &amp;lt;n&amp;gt; previous &lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;      -beam|-b     Perform a beam search by looking only to the &amp;lt;n&amp;gt; previous &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;                   and &amp;lt;n&amp;gt; posterior &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;neigboring&lt;/del&gt; words (optional parameter &lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;                   and &amp;lt;n&amp;gt; posterior &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;neighboring&lt;/ins&gt; words (optional parameter &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;                   to make the evaluation much faster)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;                   to make the evaluation much faster)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;      -help|-h     Show this help message&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;      -help|-h     Show this help message&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Unhammer</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=67871&amp;oldid=prev</id>
		<title>Purplemoon at 16:32, 9 November 2018</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=67871&amp;oldid=prev"/>
		<updated>2018-11-09T16:32:38Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 16:32, 9 November 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 57:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 57:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==Pair bootstrap resampling==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==Pair bootstrap resampling==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;===Detailed usage===&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;bootstrap_resampling.pl -source srcfile -test testfile -ref&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;reffile -times &amp;lt;n&amp;gt; -eval /full/path/to/eval/script&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Options:&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;  -source|-s   Specify the file with the source file&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;  -test|-t     Specify the file with the translations to evaluate &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;  -ref|-r      Specify the file with the reference translations&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;  -times|-n    Specify how many times the resampling should be done&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;  -eval|-e     Specify the full path to the MT evaluation script&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;  -help|-h     Show this help message&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;  &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Note: Reference translation MUST have no unknown-word marks, even if&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;      they are free rides.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Purplemoon</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=66283&amp;oldid=prev</id>
		<title>Xavivars: /* Using apertium-eval-translator for WER and PER */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=66283&amp;oldid=prev"/>
		<updated>2018-03-14T17:13:08Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Using apertium-eval-translator for WER and PER&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 17:13, 14 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 15:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 15:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==Using apertium-eval-translator for WER and PER==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==Using apertium-eval-translator for WER and PER==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;[&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;https&lt;/del&gt;://&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;apertium.&lt;/del&gt;svn.&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;sourceforge&lt;/del&gt;.net/&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;svnroot&lt;/del&gt;/apertium/trunk/apertium-eval-translator apertium-eval-translator] is a script written in Perl. It calculates the word error rate (WER) and the position-independent word error rate (PER) between a translation performed by an Apertium-based MT system and its human-corrected translation at document level. Although it has been designed to evaluate Apertium-based systems, it can be easily adapted to evaluate other MT systems.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;[&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;http&lt;/ins&gt;://svn.&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;code.sf&lt;/ins&gt;.net/&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;p&lt;/ins&gt;/apertium&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;/svn&lt;/ins&gt;/trunk/apertium-eval-translator&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;/&lt;/ins&gt; apertium-eval-translator] is a script written in Perl. It calculates the word error rate (WER) and the position-independent word error rate (PER) between a translation performed by an Apertium-based MT system and its human-corrected translation at document level. Although it has been designed to evaluate Apertium-based systems, it can be easily adapted to evaluate other MT systems.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;To use it, first translate a text with apertium, save that into &amp;lt;code&amp;gt;MT.txt&amp;lt;/code&amp;gt;, then manually post-edit that so it looks understandable and grammatical (but trying to avoid major rewrites), save that into &amp;lt;code&amp;gt;postedit.txt&amp;lt;/code&amp;gt;. Then run &amp;lt;code&amp;gt;apertium-eval-translator -test MT.txt -ref postedit.txt&amp;lt;/code&amp;gt; and you&#039;ll see a bunch of numbers indicating how good the translation was, for post-editing.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;To use it, first translate a text with apertium, save that into &amp;lt;code&amp;gt;MT.txt&amp;lt;/code&amp;gt;, then manually post-edit that so it looks understandable and grammatical (but trying to avoid major rewrites), save that into &amp;lt;code&amp;gt;postedit.txt&amp;lt;/code&amp;gt;. Then run &amp;lt;code&amp;gt;apertium-eval-translator -test MT.txt -ref postedit.txt&amp;lt;/code&amp;gt; and you&#039;ll see a bunch of numbers indicating how good the translation was, for post-editing.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Xavivars</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=63293&amp;oldid=prev</id>
		<title>Unhammer at 11:45, 9 June 2017</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Evaluation&amp;diff=63293&amp;oldid=prev"/>
		<updated>2017-06-09T11:45:41Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 11:45, 9 June 2017&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 4:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 4:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how many word [[N-gram]]&#039;s are common to the MT output and one or more reference translations (see [http://en.wikipedia.org/wiki/BLEU Wikipedia on Bleu] or [http://en.wikipedia.org/wiki/NIST_%28metric%29 NIST]), here &#039;&#039;&#039;higher scores are better&#039;&#039;&#039; &lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how many word [[N-gram]]&#039;s are common to the MT output and one or more reference translations (see [http://en.wikipedia.org/wiki/BLEU Wikipedia on Bleu] or [http://en.wikipedia.org/wiki/NIST_%28metric%29 NIST]), here &#039;&#039;&#039;higher scores are better&#039;&#039;&#039; &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how many character [[N-gram]]&#039;s are common to MT output and a post-edit (the Fuzzy Match score, an unordered comparison with the Sørensen–Dice coefficient).[http://amtaweb.org/wp-content/uploads/2015/10/MTSummitXV_ResearchTrack.pdf#page=138]&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how many character [[N-gram]]&#039;s are common to MT output and a post-edit (the Fuzzy Match score, an unordered comparison with the Sørensen–Dice coefficient).[http://amtaweb.org/wp-content/uploads/2015/10/MTSummitXV_ResearchTrack.pdf#page=138]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;** or the [http://www.aclweb.org/anthology/W/W15/W15-30.pdf#page=412 Character N-gram F-score] (code at https://github.com/Waino/chrF)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* how well a user &#039;&#039;understands&#039;&#039; the message of the original text (this typically requires an experiment with real human subjects, see [[Assimilation Evaluation Toolkit]]).&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* how well a user &#039;&#039;understands&#039;&#039; the message of the original text (this typically requires an experiment with real human subjects, see [[Assimilation Evaluation Toolkit]]).&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Unhammer</name></author>
		
	</entry>
</feed>