<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=Ideas_for_Google_Summer_of_Code%2Fautomatic-postediting</id>
	<title>Ideas for Google Summer of Code/automatic-postediting - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=Ideas_for_Google_Summer_of_Code%2Fautomatic-postediting"/>
	<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/automatic-postediting&amp;action=history"/>
	<updated>2026-05-05T22:20:26Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.34.1</generator>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/automatic-postediting&amp;diff=71371&amp;oldid=prev</id>
		<title>Popcorndude: categorize</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/automatic-postediting&amp;diff=71371&amp;oldid=prev"/>
		<updated>2020-03-24T19:56:19Z</updated>

		<summary type="html">&lt;p&gt;categorize&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 19:56, 24 March 2020&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 36:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 36:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* This uses code by [[User:Pankajksharma]] from his GSoC project.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* This uses code by [[User:Pankajksharma]] from his GSoC project.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;[[Category:Ideas_for_Google_Summer_of_Code]]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Popcorndude</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/automatic-postediting&amp;diff=65781&amp;oldid=prev</id>
		<title>Mlforcada: /* Coding challenge */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/automatic-postediting&amp;diff=65781&amp;oldid=prev"/>
		<updated>2018-02-06T16:33:36Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Coding challenge&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 16:33, 6 February 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 24:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 24:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;=== Coding challenge ===&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;=== Coding challenge ===&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* Understand the problem and the code. &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-left&quot; title=&quot;Paragraph was moved. Click to jump to new location.&quot; href=&quot;#movedpara_4_0_rhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_2_0_lhs&quot;&gt;&lt;/a&gt;Prepare source, Apertium MT-translated, and reference files (perhaps from an existing corpus) for the language pair of choice. Make a training set (large) and a test set (smaller)&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-right&quot; title=&quot;Paragraph was moved. Click to jump to old location.&quot; href=&quot;#movedpara_2_0_lhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_4_0_rhs&quot;&gt;&lt;/a&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;* &lt;/ins&gt;Prepare source, Apertium MT-translated, and reference files (perhaps from an existing corpus) for the language pair of choice. Make a training set (large) and a test set (smaller)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-left&quot; title=&quot;Paragraph was moved. Click to jump to new location.&quot; href=&quot;#movedpara_7_0_rhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_5_0_lhs&quot;&gt;&lt;/a&gt;Make the code in [[https://github.com/mlforcada/naive-automatic-postediting]] work to extract triplets.&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-right&quot; title=&quot;Paragraph was moved. Click to jump to old location.&quot; href=&quot;#movedpara_5_0_lhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_7_0_rhs&quot;&gt;&lt;/a&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;* &lt;/ins&gt;Make the code in [[https://github.com/mlforcada/naive-automatic-postediting]] work to extract triplets.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-left&quot; title=&quot;Paragraph was moved. Click to jump to new location.&quot; href=&quot;#movedpara_9_1_rhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_8_0_lhs&quot;&gt;&lt;/a&gt;Apply these triplets to the test set, and see if there is improvement.&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-right&quot; title=&quot;Paragraph was moved. Click to jump to old location.&quot; href=&quot;#movedpara_8_0_lhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_9_1_rhs&quot;&gt;&lt;/a&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;* &lt;/ins&gt;Apply these triplets to the test set, and see if there is improvement.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;=== Notes ===&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* This uses code by [[User:Pankajksharma]] from his GSoC project.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mlforcada</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/automatic-postediting&amp;diff=65778&amp;oldid=prev</id>
		<title>Mlforcada: Created page with &quot; == Improving language pairs by mining MediaWiki Content Translation postedits ==  Implement a toolkit that allows mining existing machine translation postediting data in [Med...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Ideas_for_Google_Summer_of_Code/automatic-postediting&amp;diff=65778&amp;oldid=prev"/>
		<updated>2018-02-06T16:28:24Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot; == Improving language pairs by mining MediaWiki Content Translation postedits ==  Implement a toolkit that allows mining existing machine translation postediting data in [Med...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
== Improving language pairs by mining MediaWiki Content Translation postedits ==&lt;br /&gt;
&lt;br /&gt;
Implement a toolkit that allows mining existing machine translation postediting data in [Mediawiki Content Translation https://www.mediawiki.org/wiki/Content_translation] to generate (as automatically as possible, and as complete as possible) monodix and bidix entries to improve the performance of an Apertium language pair. Data is available from Wikimedia content translation through an [API https://www.mediawiki.org/wiki/Content_translation/Published_translations#API] or in the form of [Dumps https://dumps.wikimedia.org/other/contenttranslation/] available in JSON and TMX format. This project is rather experimental and involves some research in addition to coding.&lt;br /&gt;
&lt;br /&gt;
=== First phase ===&lt;br /&gt;
&lt;br /&gt;
The first phase would produce a set of postediting operators  (s,MT(s),t) from three files: a source file (one sentence per line), a machine-translated file (one sentence per line) and a postedited file (one sentence per line). &lt;br /&gt;
&lt;br /&gt;
There is code in [[https://github.com/mlforcada/naive-automatic-postediting]] which is described there. The process is described in file rationale.md there.&lt;br /&gt;
&lt;br /&gt;
=== Second phase ===&lt;br /&gt;
&lt;br /&gt;
Study and implement ways to turn these triplets into information that can be inserted in that Apertium language pair. There are various points where it can be inserted:&lt;br /&gt;
&lt;br /&gt;
* *dictionaries*: the process may discover multi-word lexical units that need to be added to improve a literal translation&lt;br /&gt;
* *constraint grammar rules* the process may discover words that have been incorrectly disambiguated and may be turned into a constraint grammar rule&lt;br /&gt;
* *lexical selection rules*: the process may discover words that have been translated in the wrong sense, and they may be turned into lexical selection rules&lt;br /&gt;
* etc.&lt;br /&gt;
&lt;br /&gt;
The idea would be to identify safe postediting rules that can clearly be analysed as being examples of one of these cases and be turned into linguistic data to be inserted (more or less automatically) into the pair.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Coding challenge ===&lt;br /&gt;
&lt;br /&gt;
Prepare source, Apertium MT-translated, and reference files (perhaps from an existing corpus) for the language pair of choice. Make a training set (large) and a test set (smaller)&lt;br /&gt;
&lt;br /&gt;
Make the code in [[https://github.com/mlforcada/naive-automatic-postediting]] work to extract triplets.&lt;br /&gt;
&lt;br /&gt;
Apply these triplets to the test set, and see if there is improvement.&lt;/div&gt;</summary>
		<author><name>Mlforcada</name></author>
		
	</entry>
</feed>