<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=User%3ADeltamachine%2Fproposal2018</id>
	<title>User:Deltamachine/proposal2018 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=User%3ADeltamachine%2Fproposal2018"/>
	<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;action=history"/>
	<updated>2026-04-24T22:52:01Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.34.1</generator>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66554&amp;oldid=prev</id>
		<title>Deltamachine: /* Which of the published tasks are you interested in? What do you plan to do? */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66554&amp;oldid=prev"/>
		<updated>2018-03-26T09:19:18Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Which of the published tasks are you interested in? What do you plan to do?&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 09:19, 26 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 57:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 57:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==== Choosing a language pair(s) to experiment with and collecting/processing data. ====&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==== Choosing a language pair(s) to experiment with and collecting/processing data. ====&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;u&amp;gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Abour&lt;/del&gt; collecting and processing data&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;u&amp;gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;About&lt;/ins&gt; collecting and processing data&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;There might be two approaches.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;There might be two approaches.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;* Using Mediawiki JSON postediting data from https://dumps.wikimedia.org/other/contenttranslation/. &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; the target (postedited) side is very close to the given machine translation because it is basically based on it.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; Mediawiki articles often contain very long and very specific sentences and these factors can affect the quality of extracted triplets, because the current method of extracting them is built on edit distance algorithm. &amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt; also&lt;/del&gt; there will surely be words and sentences in other languages (like translations, links and so on, typical Wikipedia content) and it will make our data noisy.&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* Using Mediawiki JSON postediting data from https://dumps.wikimedia.org/other/contenttranslation/. &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; the target (postedited) side is very close to the given machine translation because it is basically based on it.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; Mediawiki articles often contain very long and very specific sentences and these factors can affect the quality of extracted triplets, because the current method of extracting them is built on edit distance algorithm. &amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; there will surely be words and sentences in other languages (like translations, links and so on, typical Wikipedia content) and it will make our data noisy&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; sometimes posteditors change the contents of a paragraph to make the article better: they split original sentences, add new information, etc. But these cases could probably be filtered&lt;/ins&gt;.&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* Using parallel corpora and Apertium translation of the source side. &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; in parallel corpora, especially in those, which are not Europarl, sentences are usually not very long and complicated and contain pretty common words and phrases (Tatoeba might be a good example). This is not a rule, but, in my opinion, parallel corpora are still less specific than Mediawiki articles.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; parallel corpora are more likely to contain less noise.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; the target side might be very different from Apertium-translated one, especially if we talk about long and complicated sentences.&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* Using parallel corpora and Apertium translation of the source side. &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; in parallel corpora, especially in those, which are not Europarl, sentences are usually not very long and complicated and contain pretty common words and phrases (Tatoeba might be a good example). This is not a rule, but, in my opinion, parallel corpora are still less specific than Mediawiki articles.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; parallel corpora are more likely to contain less noise.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; the target side might be very different from Apertium-translated one, especially if we talk about long and complicated sentences.&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Deltamachine</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66469&amp;oldid=prev</id>
		<title>Deltamachine at 13:30, 24 March 2018</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66469&amp;oldid=prev"/>
		<updated>2018-03-24T13:30:32Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 13:30, 24 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 38:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 38:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;Also I have participated in Google Code-In 2017 as Apertium mentor and it was great too. So I am very interested in further contributing to Apertium.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;Also I have participated in Google Code-In 2017 as Apertium mentor and it was great too. So I am very interested in further contributing to Apertium.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;This organisation works on things which are very interesting for me as a computational linguist: (rule-based) machine translation, minority languages, NLP and so on. I love that Apertium works with minority languages because it is very important for &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;saving&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;such&lt;/del&gt; languages.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;This organisation works on things which are very interesting for me as a computational linguist: (rule-based) machine translation, minority languages, NLP and so on. I love that Apertium works with minority languages because it is very important for &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;support&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;those&lt;/ins&gt; languages.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;Also Apertium community is very friendly and open to new members, people here are always ready to help you. It encourages me to work with these people.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;Also Apertium community is very friendly and open to new members, people here are always ready to help you. It encourages me to work with these people.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 52:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 52:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* MT(S): machine translation of S&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* MT(S): machine translation of S&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* PE(MT(S)): post-editing of the machine translation of S&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* PE(MT(S)): post-editing of the machine translation of S&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;* O(s, mt, pe): set of extracted postediting operations where s in S, mt in MT(S), pe in PE(MT(S)) &lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* O(s, mt, pe): set of extracted postediting operations where s in &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;substr(&lt;/ins&gt;S&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;)&lt;/ins&gt;, mt in &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;substr(&lt;/ins&gt;MT(S&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;)&lt;/ins&gt;), pe in &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;substr(&lt;/ins&gt;PE(MT(S&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;)&lt;/ins&gt;)) &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;=== Work stages ===&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;=== Work stages ===&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 61:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 61:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;There might be two approaches.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;There might be two approaches.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;* Using Mediawiki JSON postediting data from https://dumps.wikimedia.org/other/contenttranslation/. &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; the target (postedited) side is very close to the given machine translation because it is basically based on it.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; Mediawiki articles often contain very long and very specific sentences and these factors can affect the quality of extracted triplets, because the current method of extracting them is built on edit distance algorithm. &amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; also there will surely be words and sentences in other languages (like translations, links and so on, typical Wikipedia content) and it will &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;noise&lt;/del&gt; our data.&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* Using Mediawiki JSON postediting data from https://dumps.wikimedia.org/other/contenttranslation/. &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; the target (postedited) side is very close to the given machine translation because it is basically based on it.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; Mediawiki articles often contain very long and very specific sentences and these factors can affect the quality of extracted triplets, because the current method of extracting them is built on edit distance algorithm. &amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; also there will surely be words and sentences in other languages (like translations, links and so on, typical Wikipedia content) and it will &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;make&lt;/ins&gt; our data&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt; noisy&lt;/ins&gt;.&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* Using parallel corpora and Apertium translation of the source side. &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; in parallel corpora, especially in those, which are not Europarl, sentences are usually not very long and complicated and contain pretty common words and phrases (Tatoeba might be a good example). This is not a rule, but, in my opinion, parallel corpora are still less specific than Mediawiki articles.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; parallel corpora are more likely to contain less noise.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; the target side might be very different from Apertium-translated one, especially if we talk about long and complicated sentences.&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* Using parallel corpora and Apertium translation of the source side. &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; in parallel corpora, especially in those, which are not Europarl, sentences are usually not very long and complicated and contain pretty common words and phrases (Tatoeba might be a good example). This is not a rule, but, in my opinion, parallel corpora are still less specific than Mediawiki articles.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;+&#039;&#039;&#039; parallel corpora are more likely to contain less noise.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;&#039;&#039;&#039;-&#039;&#039;&#039; the target side might be very different from Apertium-translated one, especially if we talk about long and complicated sentences.&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 67:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 67:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;The question of choosing an approach is pretty discussable. I think we might experiment with both approaches or even mix different types of data and see if there any difference.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;The question of choosing an approach is pretty discussable. I think we might experiment with both approaches or even mix different types of data and see if there any difference.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;u&amp;gt;About language &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;pair&lt;/del&gt;&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;u&amp;gt;About language &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;pairs&lt;/ins&gt;&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;I would like to work with languages I know or at least can more or less understand. Since I&#039;m a native Russian speaker, it seems to be a good idea to work with bel-rus and ukr-rus. Though these pairs are already released, there are still a lot of work to be done. Also those are closely related languages so there will be less problems with alignment.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;I would like to work with languages I know or at least can more or less understand. Since I&#039;m a native Russian speaker, it seems to be a good idea to work with bel-rus and ukr-rus. Though these pairs are already released, there are still a lot of work to be done. Also those are closely related languages so there will be less problems with alignment.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 81:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 81:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* Russian -&amp;gt; Ukranian Mediawiki corpus of Apertium translated and postedited data = 60 sentences.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* Russian -&amp;gt; Ukranian Mediawiki corpus of Apertium translated and postedited data = 60 sentences.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* Tatoeba parallel corpus = about 6500 sentences&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* Tatoeba parallel corpus = about 6500 sentences&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;* OpenSubtitles2016 parallel corpus = about 400000 sentences (might contain &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;loose&lt;/del&gt; translations)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* OpenSubtitles2016 parallel corpus = about 400000 sentences (might contain &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;free&lt;/ins&gt; translations)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;* OpenSubtitles2018 parallel corpus = about 600000 sentences (might contain &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;loose&lt;/del&gt; translations)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* OpenSubtitles2018 parallel corpus = about 600000 sentences (might contain &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;free&lt;/ins&gt; translations)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* A few specific parallel corpora like KDE and GNOME&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* A few specific parallel corpora like KDE and GNOME&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 135:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 135:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==== Searching for extracted postediting operations that improve the quality of translation ====&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==== Searching for extracted postediting operations that improve the quality of translation ====&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;The next step is to process a big &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;train&lt;/del&gt; set using different parameters, extract potential postediting operations, apply them to the test set and find those that improve the quality of original Apertium translation on a regular basis.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;The next step is to process a big &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;training&lt;/ins&gt; set using different parameters, extract potential postediting operations, apply them to the test set and find those that improve the quality of original Apertium translation on a regular basis.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;A language model score might be a criteria of quality improvement.  A safety of a postediting operation might be determined by statistical methods.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;A language model score might be a criteria of quality improvement.  A safety of a postediting operation might be determined by statistical methods.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 238:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 238:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;ul&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;ul&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;li&amp;gt;&#039;&#039;parse_ct_json.py:&#039;&#039; A script that parses Mediawiki JSON file and splits the whole corpus on &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;train&lt;/del&gt; and test sets of a given size.&amp;lt;/li&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;li&amp;gt;&#039;&#039;parse_ct_json.py:&#039;&#039; A script that parses Mediawiki JSON file and splits the whole corpus on &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;training&lt;/ins&gt; and test sets of a given size.&amp;lt;/li&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;li&amp;gt;&#039;&#039;estimate_changes.py:&#039;&#039; A script that takes a file generated by &#039;&#039;apply_postedits.py&#039;&#039; and scores sentences which were processed with postediting rules on a language model.&amp;lt;/li&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;li&amp;gt;&#039;&#039;estimate_changes.py:&#039;&#039; A script that takes a file generated by &#039;&#039;apply_postedits.py&#039;&#039; and scores sentences which were processed with postediting rules on a language model.&amp;lt;/li&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;/ul&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;/ul&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Deltamachine</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66468&amp;oldid=prev</id>
		<title>Deltamachine: /* Contact information */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66468&amp;oldid=prev"/>
		<updated>2018-03-24T13:23:00Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Contact information&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 13:23, 24 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== Contact information ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== Contact information ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;Name:&#039;&#039;&#039; Anna Kondrateva&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;Name:&#039;&#039;&#039; Anna Kondrateva&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;Location:&#039;&#039;&#039; Moscow, Russia&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;Location:&#039;&#039;&#039; Moscow&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;/Yekaterinburg&lt;/ins&gt;, Russia&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;E-mail:&#039;&#039;&#039; an-an-kondratjeva@yandex.ru&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;E-mail:&#039;&#039;&#039; an-an-kondratjeva@yandex.ru&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;Phone number:&#039;&#039;&#039; +79250374221&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;Phone number:&#039;&#039;&#039; +79250374221&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;IRC:&#039;&#039;&#039; deltamachine&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;IRC:&#039;&#039;&#039; deltamachine&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;SourceForge&lt;/del&gt;:&#039;&#039;&#039; deltamachine&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Github&lt;/ins&gt;:&#039;&#039;&#039; deltamachine&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;Timezone:&#039;&#039;&#039; UTC+3&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;p&amp;gt;&#039;&#039;&#039;Timezone:&#039;&#039;&#039; UTC+3&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt; (Moscow) / UTC+5 (Yekaterinburg)&lt;/ins&gt;&amp;lt;/p&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== Skills and experience ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== Skills and experience ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Deltamachine</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66460&amp;oldid=prev</id>
		<title>Deltamachine: /* Choosing a language pair(s) to experiment with and collecting/processing data. */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66460&amp;oldid=prev"/>
		<updated>2018-03-23T17:43:50Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Choosing a language pair(s) to experiment with and collecting/processing data.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 17:43, 23 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 57:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 57:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==== Choosing a language pair(s) to experiment with and collecting/processing data. ====&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==== Choosing a language pair(s) to experiment with and collecting/processing data. ====&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-left&quot; title=&quot;Paragraph was moved. Click to jump to new location.&quot; href=&quot;#movedpara_3_1_rhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_1_1_lhs&quot;&gt;&lt;/a&gt;&amp;lt;u&amp;gt;About language pair&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-left&quot; title=&quot;Paragraph was moved. Click to jump to new location.&quot; href=&quot;#movedpara_3_3_rhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_1_3_lhs&quot;&gt;&lt;/a&gt;I would like to work with languages I know or at least can more or less understand. Since I&#039;m a native Russian speaker, it seems to be a good idea to work with bel-rus and ukr-rus. Though these pairs are already released, there are still a lot of work to be done. Also those are closely related languages so there will be less problems with alignment.&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-left&quot; title=&quot;Paragraph was moved. Click to jump to new location.&quot; href=&quot;#movedpara_3_19_rhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_1_5_lhs&quot;&gt;&lt;/a&gt;But the methods I&#039;m going to develop are not going to be tied to a language pair.&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;u&amp;gt;Abour collecting and processing data&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;u&amp;gt;Abour collecting and processing data&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 73:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 66:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;The question of choosing an approach is pretty discussable. I think we might experiment with both approaches or even mix different types of data and see if there any difference.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;The question of choosing an approach is pretty discussable. I think we might experiment with both approaches or even mix different types of data and see if there any difference.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-right&quot; title=&quot;Paragraph was moved. Click to jump to old location.&quot; href=&quot;#movedpara_1_1_lhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_3_1_rhs&quot;&gt;&lt;/a&gt;&amp;lt;u&amp;gt;About language pair&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-right&quot; title=&quot;Paragraph was moved. Click to jump to old location.&quot; href=&quot;#movedpara_1_3_lhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_3_3_rhs&quot;&gt;&lt;/a&gt;I would like to work with languages I know or at least can more or less understand. Since I&#039;m a native Russian speaker, it seems to be a good idea to work with bel-rus and ukr-rus. Though these pairs are already released, there are still a lot of work to be done. Also those are closely related languages so there will be less problems with alignment.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;The problem with working with bel-rus and ukr-rus is the comparably small amount of postedited data and more or less suitable parallel corpora. I&#039;m still looking for data, but current situation looks like this:&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&#039;&#039;&#039; bel-rus &#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* Russian -&amp;gt; Belarusian Mediawiki corpus of Apertium translated and postedited data = 1895 sentences.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* Tatoeba parallel corpus = about 1800 sentences&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* A few specific parallel corpora like KDE and GNOME&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&#039;&#039;&#039; ukr-rus&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* Russian -&amp;gt; Ukranian Mediawiki corpus of Apertium translated and postedited data = 60 sentences.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* Tatoeba parallel corpus = about 6500 sentences&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* OpenSubtitles2016 parallel corpus = about 400000 sentences (might contain loose translations)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* OpenSubtitles2018 parallel corpus = about 600000 sentences (might contain loose translations)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* A few specific parallel corpora like KDE and GNOME&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-right&quot; title=&quot;Paragraph was moved. Click to jump to old location.&quot; href=&quot;#movedpara_1_5_lhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_3_19_rhs&quot;&gt;&lt;/a&gt;But the methods I&#039;m going to develop are not going to be tied to a language pair&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. We might choose another language pair or start our experiments with a small amount of data. The question is discussable&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==== Improving of existing methods ====&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==== Improving of existing methods ====&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Deltamachine</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66447&amp;oldid=prev</id>
		<title>Deltamachine at 10:19, 23 March 2018</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66447&amp;oldid=prev"/>
		<updated>2018-03-23T10:19:33Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 10:19, 23 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 232:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 232:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;Also I have refactored and documented the old code from https://github.com/mlforcada/naive-automatic-postediting/blob/master/learn_postedits.py. A new version is stored in a given folder in &#039;&#039;cleaned_learn_postedits.py&#039;&#039;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;Also I have refactored and documented the old code from https://github.com/mlforcada/naive-automatic-postediting/blob/master/learn_postedits.py. A new version is stored in a given folder in &#039;&#039;cleaned_learn_postedits.py&#039;&#039;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;[[Category:GSoC 2018 student proposals|Deltamachine]]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Deltamachine</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66441&amp;oldid=prev</id>
		<title>Deltamachine: /* Coding challenge */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66441&amp;oldid=prev"/>
		<updated>2018-03-23T09:22:03Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Coding challenge&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 09:22, 23 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 228:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 228:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;li&amp;gt;&#039;&#039;estimate_changes.py:&#039;&#039; A script that takes a file generated by &#039;&#039;apply_postedits.py&#039;&#039; and scores sentences which were processed with postediting rules on a language model.&amp;lt;/li&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;li&amp;gt;&#039;&#039;estimate_changes.py:&#039;&#039; A script that takes a file generated by &#039;&#039;apply_postedits.py&#039;&#039; and scores sentences which were processed with postediting rules on a language model.&amp;lt;/li&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;/ul&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;/ul&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;In addition, I have added cache function to learn_postedits.py (https://github.com/deltamachine/naive-automatic-postediting/blob/master/lib/explain2_cache.py):&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;Also I have refactored and documented the old code from https://github.com/mlforcada/naive-automatic-postediting/blob/master/learn_postedits.py. A new version is stored in a given folder in &#039;&#039;cleaned_learn_postedits.py&#039;&#039;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;Also I have refactored and documented the old code from https://github.com/mlforcada/naive-automatic-postediting/blob/master/learn_postedits.py. A new version is stored in a given folder in &#039;&#039;cleaned_learn_postedits.py&#039;&#039;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Deltamachine</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66440&amp;oldid=prev</id>
		<title>Deltamachine: /* Improving of existing methods */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66440&amp;oldid=prev"/>
		<updated>2018-03-23T09:20:27Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Improving of existing methods&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 09:20, 23 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 80:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 80:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;1. The main problem is a pretty low speed of learn_postedits.py and apply_postedits.py. Repeating Apertium calls and big size of maximum source/target segment length make the process pretty slow. It makes extracting triplets from a big corpus very hard and using a small corpus is just useless. &lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;1. The main problem is a pretty low speed of learn_postedits.py and apply_postedits.py. Repeating Apertium calls and big size of maximum source/target segment length make the process pretty slow. It makes extracting triplets from a big corpus very hard and using a small corpus is just useless. &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;I have &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;implemented&lt;/del&gt; cache function &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;for&lt;/del&gt; learn_postedits.py (https://github.com/deltamachine/naive-automatic-postediting/blob/master/lib/explain2_cache.py): now instead of calling Apertium every time the program needs to translate/analyze any subsegment, it firstly checks, is this subsegment already stored in a database. If yes, it takes it from here, if no, it calls Apertium and then adds new information to the database.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;I have &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;added&lt;/ins&gt; cache function &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;to&lt;/ins&gt; learn_postedits.py (https://github.com/deltamachine/naive-automatic-postediting/blob/master/lib/explain2_cache.py): now instead of calling Apertium every time the program needs to translate/analyze any subsegment, it firstly checks, is this subsegment already stored in a database. If yes, it takes it from here, if no, it calls Apertium and then adds new information to the database.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;Results of running two versions of learn_postedits.py on 100 sentences corpus with different parameters:&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;Results of running two versions of learn_postedits.py on 100 sentences corpus with different parameters:&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 118:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 118:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;2. There might be problems with current alignment method because currently postediting operations look pretty strange even if they were extracted with high fuzzy match threshold. Alignment implementation should be checked carefully.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;2. There might be problems with current alignment method because currently postediting operations look pretty strange even if they were extracted with high fuzzy match threshold. Alignment implementation should be checked carefully.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;3. Some old code refactoring needs to be done.&lt;del class=&quot;diffchange diffchange-inline&quot;&gt; &lt;/del&gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;3. Some old code refactoring needs to be done.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==== Searching for extracted postediting operations that improve the quality of translation ====&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==== Searching for extracted postediting operations that improve the quality of translation ====&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Deltamachine</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66439&amp;oldid=prev</id>
		<title>Deltamachine: /* Reasons why Google and Apertium should sponsor it */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66439&amp;oldid=prev"/>
		<updated>2018-03-23T09:19:34Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Reasons why Google and Apertium should sponsor it&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 09:19, 23 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 150:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 150:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== Reasons why Google and Apertium should sponsor it ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== Reasons why Google and Apertium should sponsor it ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;This toolbox might become a great way of improving language pairs by filling gaps in dictionaries and reducing the amount of human work at the same time. Even the released Apertium pairs are not perfect and sometimes do mistakes &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;which&lt;/del&gt; can be easily fixed.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;This toolbox might become a great way of improving language pairs by filling gaps in dictionaries and reducing the amount of human work at the same time. Even the released Apertium pairs are not perfect and sometimes do mistakes &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;that&lt;/ins&gt; can be easily fixed.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;For example, Apertium translates Belarusian sentence &#039;&#039;&quot;Нехта тут размаўляе па-руску?&quot;&#039;&#039; (&quot;Does somebody here speak Russian?&quot;) in Russian as &#039;&#039;&quot;Кто-то здесь размаўляе по-русски?&quot;&#039;&#039; when a correct translation would be &#039;&#039;&quot;Кто-то здесь говорит по-русски?&quot;&#039;&#039;. The problem with this example is obvious: Apertium doesn&#039;t know the word &quot;размаўлять&quot;. But it can be easily fixed with the methods described above (sections 5.2.3, 5.2.4).&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;For example, Apertium translates Belarusian sentence &#039;&#039;&quot;Нехта тут размаўляе па-руску?&quot;&#039;&#039; (&quot;Does somebody here speak Russian?&quot;) in Russian as &#039;&#039;&quot;Кто-то здесь размаўляе по-русски?&quot;&#039;&#039; when a correct translation would be &#039;&#039;&quot;Кто-то здесь говорит по-русски?&quot;&#039;&#039;. The problem with this example is obvious: Apertium doesn&#039;t know the word &quot;размаўлять&quot;. But it can be easily fixed with the methods described above (sections 5.2.3, 5.2.4).&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Deltamachine</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66437&amp;oldid=prev</id>
		<title>Deltamachine: /* Work stages */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66437&amp;oldid=prev"/>
		<updated>2018-03-23T09:16:52Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Work stages&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 09:16, 23 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 60:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 60:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;u&amp;gt;About language pair&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;u&amp;gt;About language pair&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;I would like to work with languages I know or at least can more or less understand. Since &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;I“m&lt;/del&gt; a native Russian speaker, it seems to be a good idea to work with bel-rus and ukr-rus. Though these pairs are already released, there are still a lot of work to be done. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;But&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;the&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;methods&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;I“m&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;going&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;to&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;develop&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;won“t&lt;/del&gt; be &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;tied&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;to&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;a language&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;pair&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;I would like to work with languages I know or at least can more or less understand. Since &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;I&#039;m&lt;/ins&gt; a native Russian speaker, it seems to be a good idea to work with bel-rus and ukr-rus. Though these pairs are already released, there are still a lot of work to be done. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Also&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;those&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;are&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;closely&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;related&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;languages&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;so&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;there will&lt;/ins&gt; be &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;less&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;problems&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;with&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;alignment&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;But the methods I&#039;m going to develop are not going to be tied to a language pair.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;u&amp;gt;Abour collecting and processing data&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;u&amp;gt;Abour collecting and processing data&amp;lt;/u&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 112:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 114:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;* Caching function for apply_postedits.py&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;* Caching function for apply_postedits.py&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;* &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Search&lt;/del&gt; for other ways &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;we&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;can improve&lt;/del&gt; the speed.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Searching&lt;/ins&gt; for other ways &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;of&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;improving&lt;/ins&gt; the speed.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;2. There might be problems with current alignment method because currently postediting operations look pretty strange even if they were extracted with high fuzzy match threshold. Alignment implementation should be checked carefully.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;2. There might be problems with current alignment method because currently postediting operations look pretty strange even if they were extracted with high fuzzy match threshold. Alignment implementation should be checked carefully.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 118:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 120:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;3. Some old code refactoring needs to be done. &lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;3. Some old code refactoring needs to be done. &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;==== &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Search&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;of&lt;/del&gt; extracted postediting operations that improve the quality of translation ====&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;==== &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Searching&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;for&lt;/ins&gt; extracted postediting operations that improve the quality of translation ====&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;The next step is to process a big train set using different parameters, extract potential postediting operations, apply them to the test set and find those that improve the quality of original Apertium translation on a regular basis.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;The next step is to process a big train set using different parameters, extract potential postediting operations, apply them to the test set and find those that improve the quality of original Apertium translation on a regular basis.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;A language model score might be a criteria of&lt;del class=&quot;diffchange diffchange-inline&quot;&gt; a&lt;/del&gt; quality improvement.  A safety of a postediting operation might be determined by statistical methods.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;A language model score might be a criteria of quality improvement.  A safety of a postediting operation might be determined by statistical methods.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==== Classifying of successful postediting operations ====&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==== Classifying of successful postediting operations ====&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 139:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 141:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;For example, to identify potential bidix entries, we might choose a set of triplets from O such that s = mt for all (s, pe) from O.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;For example, to identify potential bidix entries, we might choose a set of triplets from O such that s = mt for all (s, pe) from O.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;But it won&#039;t be enough for identifying potential lexical selection rules: we should carefully look at the context to find those.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;But it&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt; obviously&lt;/ins&gt; won&#039;t be enough for identifying potential lexical selection rules: we should carefully look at the context to find those.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;==== Creating tools for inserting useful information into a language pair ====&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;==== Creating tools for inserting useful information into a language pair ====&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;The last step is to create tools for inserting useful information into a language pair. It might be some scripts which automatically create monodix/bidix entries or write rules based on a given data and its type and insert it in dictionaries.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;The last step is to create tools for inserting useful information into a language pair. It might be some scripts which automatically create monodix/bidix entries or write rules based on a given data and its type and insert it in dictionaries&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. Also it might be a new module in a pipeline. The final decision will depend on the results of the previous stage&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== Reasons why Google and Apertium should sponsor it ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== Reasons why Google and Apertium should sponsor it ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Deltamachine</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66436&amp;oldid=prev</id>
		<title>Deltamachine: /* Why is it that you are interested in Apertium? */</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Deltamachine/proposal2018&amp;diff=66436&amp;oldid=prev"/>
		<updated>2018-03-23T09:06:58Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Why is it that you are interested in Apertium?&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 09:06, 23 March 2018&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 35:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 35:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== Why is it that you are interested in Apertium? ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== Why is it that you are interested in Apertium? ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;I have participated in &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;GSoC&lt;/del&gt; 2017 with Apertium and it was a great experience. I have successfully finished my project, learned a lot of new things and had a lot of fun. My programming and NLP skills became much better and I want to develop them more.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;I have participated in &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Google Summer of Code&lt;/ins&gt; 2017 with Apertium and it was a great experience. I have successfully finished my project, learned a lot of new things and had a lot of fun. My programming and NLP skills became much better and I want to develop them more.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;Also I have participated in &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;GCI&lt;/del&gt; 2017 as &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;a&lt;/del&gt; mentor&lt;del class=&quot;diffchange diffchange-inline&quot;&gt; for Apertium&lt;/del&gt; and it was great too. So I am very interested in further contributing to Apertium.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Also I have participated in &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Google Code-In&lt;/ins&gt; 2017 as &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Apertium&lt;/ins&gt; mentor and it was great too. So I am very interested in further contributing to Apertium.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;This organisation works on things which are very interesting for me as a computational linguist: (rule-based) machine translation, minority languages, NLP and so on. I love that Apertium works with minority languages because it is very important &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;and&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;is&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;not&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;a mainstream at the same time&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;This organisation works on things which are very interesting for me as a computational linguist: (rule-based) machine translation, minority languages, NLP and so on. I love that Apertium works with minority languages because it is very important &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;for&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;saving&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;such&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;languages&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;Also Apertium community is very friendly and open to new members, people here are always ready to help you. It encourages me to work with these people.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;Also Apertium community is very friendly and open to new members, people here are always ready to help you. It encourages me to work with these people.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Deltamachine</name></author>
		
	</entry>
</feed>