<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=User%3AIrene%2Fproposal</id>
	<title>User:Irene/proposal - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=User%3AIrene%2Fproposal"/>
	<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;action=history"/>
	<updated>2026-05-09T08:02:26Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.34.1</generator>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;diff=62739&amp;oldid=prev</id>
		<title>Irene: work plan suggestions</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;diff=62739&amp;oldid=prev"/>
		<updated>2017-04-03T20:53:09Z</updated>

		<summary type="html">&lt;p&gt;work plan suggestions&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 20:53, 3 April 2017&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 26:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 26:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;Discontiguous multiwords are multi-word expressions that are separated by something in the middle. In the set of sentences above, &#039;&#039;take out&#039;&#039; is a multiword verb. When it is separated by the noun phrase &#039;&#039;the rubbish&#039;&#039;, it becomes a discontinuous multiword.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;Discontiguous multiwords are multi-word expressions that are separated by something in the middle. In the set of sentences above, &#039;&#039;take out&#039;&#039; is a multiword verb. When it is separated by the noun phrase &#039;&#039;the rubbish&#039;&#039;, it becomes a discontinuous multiword.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;Apertium currently doesn’t offer support for discontinuous multiwords, and this is a source of many unfortunate translation errors. &#039;&#039;Take out&#039;&#039; is a multiword in English, but its Spanish translation &#039;&#039;sacar&#039;&#039; is not. Apertium can seamlessly translate (1) into (3) from English to Spanish: in (1), the whole phrasal verb &#039;&#039;take out&#039;&#039; is together, so Apertium can easily recognise and translate it as one unit. &#039;&#039;Take out&#039;&#039; correctly becomes &#039;&#039;saco&#039;&#039;, its first-person conjugation in Spanish. However, Apertium imperfectly translates (2) into (4) from English to Spanish: in (2), the phrasal verb &#039;&#039;take out&#039;&#039; is separated by the NP &#039;&#039;the rubbish&#039;&#039;, so Apertium doesn’t recognise it as a unit and incorrectly translates it as two separate words. &#039;&#039;Take&#039;&#039; becomes &#039;&#039;tomo&#039;&#039; and &#039;&#039;out&#039;&#039; becomes &#039;&#039;fuera&#039;&#039;, independently, which is not what we want; &#039;&#039;tomar fuera&#039;&#039; cannot be used interchangeably&lt;del class=&quot;diffchange diffchange-inline&quot;&gt; with &#039;&#039;sacar&#039;&#039;&lt;/del&gt;. This demonstrates that discontiguous multiwords produce significant wrinkles in the translation process.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Apertium currently doesn’t offer support for discontinuous multiwords, and this is a source of many unfortunate translation errors. &#039;&#039;Take out&#039;&#039; is a multiword in English, but its Spanish translation &#039;&#039;sacar&#039;&#039; is not. Apertium can seamlessly translate (1) into (3) from English to Spanish: in (1), the whole phrasal verb &#039;&#039;take out&#039;&#039; is together, so Apertium can easily recognise and translate it as one unit. &#039;&#039;Take out&#039;&#039; correctly becomes &#039;&#039;saco&#039;&#039;, its first-person conjugation in Spanish. However, Apertium imperfectly translates (2) into (4) from English to Spanish: in (2), the phrasal verb &#039;&#039;take out&#039;&#039; is separated by the NP &#039;&#039;the rubbish&#039;&#039;, so Apertium doesn’t recognise it as a unit and incorrectly translates it as two separate words. &#039;&#039;Take&#039;&#039; becomes &#039;&#039;tomo&#039;&#039; and &#039;&#039;out&#039;&#039; becomes &#039;&#039;fuera&#039;&#039;, independently, which is not what we want; &#039;&#039;tomar fuera&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039; and&#039;&#039;sacar&lt;/ins&gt;&#039;&#039; cannot be used interchangeably. This demonstrates that discontiguous multiwords produce significant wrinkles in the translation process.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;My plan is to eliminate such errors by improving the multiwords processor into being able to recognise when sentences contain discontiguous multiwords, and then reorder the sentence structure so that the whole verb phrase is placed together before bilingual dictionary lookup occurs. For the set of sentences above, the processor should be able to recognise the discontinuous &#039;&#039;take___out&#039;&#039; in (2) and rearrange the sentence to look like the &#039;&#039;take out___&#039;&#039; in (1).&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;My plan is to eliminate such errors by improving the multiwords processor into being able to recognise when sentences contain discontiguous multiwords, and then reorder the sentence structure so that the whole verb phrase is placed together before bilingual dictionary lookup occurs. For the set of sentences above, the processor should be able to recognise the discontinuous &#039;&#039;take___out&#039;&#039; in (2) and rearrange the sentence to look like the &#039;&#039;take out___&#039;&#039; in (1).&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 38:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 38:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;As noted in the [http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Discontiguous_multiwords#Tasks wiki page for this project], this involves (1) creating a typology of discontinuous multiword expressions in some Germanic, Celtic, Romance, Turkic, and Uralic languages; (2) creating a module for recognising and reordering discontiguous multiword expressions; and (3) providing support for discontiguous multiwords in some existing language pairs.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;As noted in the [http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Discontiguous_multiwords#Tasks wiki page for this project], this involves (1) creating a typology of discontinuous multiword expressions in some Germanic, Celtic, Romance, Turkic, and Uralic languages; (2) creating a module for recognising and reordering discontiguous multiword expressions; and (3) providing support for discontiguous multiwords in some existing language pairs&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. See work plan for details&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== How and who will it benefit in society? ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== How and who will it benefit in society? ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 51:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 51:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Community bonding period&#039;&#039;&#039; &amp;lt;br /&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Community bonding period&#039;&#039;&#039; &amp;lt;br /&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;*Understand dix formats&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Understand the current multiwords processor module&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Understand the current multiwords processor module&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Understand the features of lt-toolbox&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Understand the features of lt-toolbox&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Devise a typology format, create a typology for types of English phrasal verbs / discontiguous multiword expressions&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Devise a typology format, create a typology for types of English phrasal verbs / discontiguous multiword expressions&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;*Devise a method for coding typologies into Apertium files&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Part I: preparing data&#039;&#039;&#039; &amp;lt;br /&amp;gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Part I: preparing data&#039;&#039;&#039; &amp;lt;br /&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;Create a typology of different types of discontinuous multiword expressions in some Germanic, Celtic, Romance, Turkic, and Uralic languages. This is necessary for getting an idea of how to build a module in part II. I estimate that it would take 2-4 days to complete full investigations of multiword expressions in each language, depending on how familiar I am with the language. I chose the following languages for their significance in Apertium’s database and for my accessibility to/familiarity with them.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Create a typology of different types of discontinuous multiword expressions in some Germanic, Celtic, Romance, Turkic, and Uralic languages. This is necessary for getting an idea of how to build a module in part II&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. The typologies should include an analysis of ones that cannot be created by the multiwords processor. Each week&#039;s typologies should be coded into Apertium files by the end of the weekend&lt;/ins&gt;. I estimate that it would take 2-4 days to complete full investigations of multiword expressions in each language, depending on how familiar I am with the language. I chose the following languages for their significance in Apertium’s database and for my accessibility to/familiarity with them.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Week 1 (5/22): Romance- Spanish, Portuguese&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Week 1 (5/22): Romance- Spanish, Portuguese&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Week 2 (5/29): Romance- Italian, Romanian&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Week 2 (5/29): Romance- Italian, Romanian&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 64:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 66:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Part II: building the module&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Part II: building the module&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;The module should respect discontiguous multiwords that may remain discontiguous in both languages. If we are trying to translate a discontiguous multiword from xxx —&amp;gt; yyy, and it is well-formed in language yyy for the word to be discontiguous, then the output translation should allow it to remain discontiguous. Otherwise, the sentence should be reordered and the word should be translated as a single unit.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Week 5 (6/19): devise a module for recognising multiword expressions in each of the languages that I created typologies for, write unit tests to make sure it is functioning correctly&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Week 5 (6/19): devise a module for recognising multiword expressions in each of the languages that I created typologies for, write unit tests to make sure it is functioning correctly&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;**Devise a compatible format for integrating the expressions into monodix: figure out how to annotate the words with respect to the current multiwords processor&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Week 6 (6/26): (cont.)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Week 6 (6/26): (cont.)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Week 7 (7/3): write a script to have the module reorder sentences to unify discontiguous multiwords, write unit tests to make sure it is functioning correctly&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Week 7 (7/3): write a script to have the module reorder sentences to unify discontiguous multiwords, write unit tests to make sure it is functioning correctly&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;**Devise a method for integrating the script with Apertium&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Week 8 (7/10): (cont.)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Week 8 (7/10): (cont.)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&#039;&#039;Deliverable #2:&#039;&#039; functioning discontiguous multiword processor, not yet integrated into Apertium&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&#039;&#039;Deliverable #2:&#039;&#039; functioning discontiguous multiword processor, not yet integrated into Apertium&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Part III: integrating the module into Apertium&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Part III: integrating the module into Apertium&#039;&#039;&#039;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;*Week 9 (7/17): &lt;del class=&quot;diffchange diffchange-inline&quot;&gt; &lt;/del&gt;insert the module between Apertium-pretransfer and lt-proc-b, testing&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;*Week 9 (7/17): &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&quot;&lt;/ins&gt;insert the module between Apertium-pretransfer and lt-proc-b, testing&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&quot;, is what the wiki says&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Week 10 (7/24): (cont.)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Week 10 (7/24): (cont.)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Week 11 (7/31): include support for discontiguous multiwords in specific pairs&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Week 11 (7/31): include support for discontiguous multiwords in specific pairs&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Week 12 (8/7): (cont.)&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Week 12 (8/7): (cont.)&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&#039;&#039;Project completed:&#039;&#039; typologies and &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;fully-integrated&lt;/del&gt; module for processing discontiguous multiwords&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&#039;&#039;Project completed:&#039;&#039;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt; fully-integrated&lt;/ins&gt; typologies and  module for processing discontiguous multiwords&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;*Week 13 (8/14): testing&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;*Week 13 (8/14): testing&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Irene</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;diff=62570&amp;oldid=prev</id>
		<title>Irene: added category</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;diff=62570&amp;oldid=prev"/>
		<updated>2017-04-02T17:15:03Z</updated>

		<summary type="html">&lt;p&gt;added category&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 17:15, 2 April 2017&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 90:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 90:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;If my project is accepted, then my plan is to complete GSoC and take some light elective course somewhere, either online or at a community college.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;If my project is accepted, then my plan is to complete GSoC and take some light elective course somewhere, either online or at a community college.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;[[Category:GSoC_2017_Student_Proposals]]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Irene</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;diff=62569&amp;oldid=prev</id>
		<title>Irene at 17:12, 2 April 2017</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;diff=62569&amp;oldid=prev"/>
		<updated>2017-04-02T17:12:19Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;a href=&quot;//wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;amp;diff=62569&amp;amp;oldid=62265&quot;&gt;Show changes&lt;/a&gt;</summary>
		<author><name>Irene</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;diff=62265&amp;oldid=prev</id>
		<title>Irene: /* Which of the published tasks are you interested in? What do you plan to do? */  italics formatting</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;diff=62265&amp;oldid=prev"/>
		<updated>2017-03-31T07:11:32Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Which of the published tasks are you interested in? What do you plan to do?: &lt;/span&gt;  italics formatting&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 07:11, 31 March 2017&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 24:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 24:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;:# &amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Tomo la basura fuera.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;:# &amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Tomo la basura fuera.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;Discontiguous multiwords are multi-word expressions that are separated by something in the middle. In the set of sentences above, &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;“take&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;out”&lt;/del&gt; is a multiword verb. When it is separated by the noun phrase &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;“the&lt;/del&gt; rubbish,&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;”&lt;/del&gt; it becomes a discontinuous multiword.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Discontiguous multiwords are multi-word expressions that are separated by something in the middle. In the set of sentences above, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;take&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;out&#039;&#039;&lt;/ins&gt; is a multiword verb. When it is separated by the noun phrase &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;the&lt;/ins&gt; rubbish&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;, it becomes a discontinuous multiword.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;Apertium currently doesn’t offer support for discontinuous multiwords, and this is a source of many unfortunate translation errors. Apertium can seamlessly translate (1) into (3) from English to Spanish: in (1), the whole phrasal verb take out is together, so Apertium can easily recognize and translate it as one unit. Take out correctly becomes saco, its first-person conjugation in Spanish. However, Apertium imperfectly translates (2) into (4) from English to Spanish: in (2), the phrasal verb take out is separated by the NP the rubbish, so Apertium doesn’t recognize it as a unit and incorrectly translates it as two separate words. Take becomes tomo and out becomes fuera, independently, which is not what we want. This demonstrates that discontiguous multiwords produce significant wrinkles in the translation process.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Apertium currently doesn’t offer support for discontinuous multiwords, and this is a source of many unfortunate translation errors. Apertium can seamlessly translate (1) into (3) from English to Spanish: in (1), the whole phrasal verb &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;take out&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt; is together, so Apertium can easily recognize and translate it as one unit. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;Take out&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt; correctly becomes &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;saco&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;, its first-person conjugation in Spanish. However, Apertium imperfectly translates (2) into (4) from English to Spanish: in (2), the phrasal verb &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;take out&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt; is separated by the NP &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;the rubbish&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;, so Apertium doesn’t recognize it as a unit and incorrectly translates it as two separate words. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;Take&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt; becomes &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;tomo&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt; and &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;out&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt; becomes &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;fuera&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&#039;&#039;&lt;/ins&gt;, independently, which is not what we want. This demonstrates that discontiguous multiwords produce significant wrinkles in the translation process.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;My plan is to improve the multiwords processor into being able to recognize when sentences contain discontiguous multiwords, and then reorder the sentence structure so that the whole verb phrase is placed together before bilingual dictionary lookup occurs. As noted in the wiki page for this project, this involves (1) creating a typology of discontinuous multiword expressions in some Germanic, Celtic, Romance, Turkic, and Uralic languages; (2) creating a module for recognising and reordering discontiguous multiword expressions; and (3) supporting discontiguous multiwords for specifically the English-Spanish pair.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;My plan is to improve the multiwords processor into being able to recognize when sentences contain discontiguous multiwords, and then reorder the sentence structure so that the whole verb phrase is placed together before bilingual dictionary lookup occurs. As noted in the wiki page for this project, this involves (1) creating a typology of discontinuous multiword expressions in some Germanic, Celtic, Romance, Turkic, and Uralic languages; (2) creating a module for recognising and reordering discontiguous multiword expressions; and (3) supporting discontiguous multiwords for specifically the English-Spanish pair.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== How and who will it benefit in society? ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== How and who will it benefit in society? ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Irene</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;diff=62264&amp;oldid=prev</id>
		<title>Irene: Created page with &quot; == Contact Info ==  &#039;&#039;&#039;Name:&#039;&#039;&#039; Irene Tang &lt;br /&gt; &#039;&#039;&#039;E-mail:&#039;&#039;&#039; itang1@swarthmore.edu &lt;br /&gt; &#039;&#039;&#039;IRC nick:&#039;&#039;&#039; irene_ &lt;br /&gt; &#039;&#039;&#039;Location:&#039;&#039;&#039; Pennsylvania, USA / California, USA...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=User:Irene/proposal&amp;diff=62264&amp;oldid=prev"/>
		<updated>2017-03-31T07:06:53Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot; == Contact Info ==  &amp;#039;&amp;#039;&amp;#039;Name:&amp;#039;&amp;#039;&amp;#039; Irene Tang &amp;lt;br /&amp;gt; &amp;#039;&amp;#039;&amp;#039;E-mail:&amp;#039;&amp;#039;&amp;#039; itang1@swarthmore.edu &amp;lt;br /&amp;gt; &amp;#039;&amp;#039;&amp;#039;IRC nick:&amp;#039;&amp;#039;&amp;#039; irene_ &amp;lt;br /&amp;gt; &amp;#039;&amp;#039;&amp;#039;Location:&amp;#039;&amp;#039;&amp;#039; Pennsylvania, USA / California, USA...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
== Contact Info ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Name:&amp;#039;&amp;#039;&amp;#039; Irene Tang &amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;E-mail:&amp;#039;&amp;#039;&amp;#039; itang1@swarthmore.edu &amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;IRC nick:&amp;#039;&amp;#039;&amp;#039; irene_ &amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Location:&amp;#039;&amp;#039;&amp;#039; Pennsylvania, USA / California, USA &amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Time zone:&amp;#039;&amp;#039;&amp;#039; UTC -05:00 / UTC-08:00 &amp;lt;br /&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
== Why are you are interested in machine translation? / Why are you are interested in Apertium? ==&lt;br /&gt;
&lt;br /&gt;
I became interested in machine translation earlier in this school year when I was introduced to an organisation that works to translate the Bible for people interested in reading it—in particular, people who speak minority languages in which the text is not currently available. The representative mentioned that the translation process would be made exponentially easier and faster if only they had a computer program that could do a first-pass translation for linguists to reference, rather than starting from scratch by hand. This is a particular cause that I care about; and I’m sure there are many other groups and individuals who would appreciate machine translation as a handy supplement to their endeavors. I figured I could use my background in computer science and linguistics to contribute towards building up machine translation tools for the public to use.&lt;br /&gt;
&lt;br /&gt;
I apply to Apertium because I believe in its success. Apertium is currently one of more successful translation endeavors—and while it lacks the data and traffic that is available to Google Translate, it stands out from corporate undertakings by being open-source and by catering towards uncommon, lesser-resourced languages. From my interactions on the IRC I’ve also noticed an active community of dedicated linguists/programmers, and I’ve read about how much Apertium has accomplished since its birth in 2004. I’m excited for Apertium’s mission.&lt;br /&gt;
 &lt;br /&gt;
== Which of the published tasks are you interested in? What do you plan to do? ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Discontiguous Multiwords&amp;#039;&amp;#039;&amp;#039; &amp;lt;br /&amp;gt;&lt;br /&gt;
For an overview of Apertium’s discontiguous multiwords problem, consider the following set of sentences:&lt;br /&gt;
&lt;br /&gt;
:# I take out the rubbish.&lt;br /&gt;
:# I take the rubbish out.&lt;br /&gt;
:# Saco la basura.&lt;br /&gt;
:# &amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;Tomo la basura fuera.&lt;br /&gt;
&lt;br /&gt;
Discontiguous multiwords are multi-word expressions that are separated by something in the middle. In the set of sentences above, “take out” is a multiword verb. When it is separated by the noun phrase “the rubbish,” it becomes a discontinuous multiword.&lt;br /&gt;
&lt;br /&gt;
Apertium currently doesn’t offer support for discontinuous multiwords, and this is a source of many unfortunate translation errors. Apertium can seamlessly translate (1) into (3) from English to Spanish: in (1), the whole phrasal verb take out is together, so Apertium can easily recognize and translate it as one unit. Take out correctly becomes saco, its first-person conjugation in Spanish. However, Apertium imperfectly translates (2) into (4) from English to Spanish: in (2), the phrasal verb take out is separated by the NP the rubbish, so Apertium doesn’t recognize it as a unit and incorrectly translates it as two separate words. Take becomes tomo and out becomes fuera, independently, which is not what we want. This demonstrates that discontiguous multiwords produce significant wrinkles in the translation process.&lt;br /&gt;
&lt;br /&gt;
My plan is to improve the multiwords processor into being able to recognize when sentences contain discontiguous multiwords, and then reorder the sentence structure so that the whole verb phrase is placed together before bilingual dictionary lookup occurs. As noted in the wiki page for this project, this involves (1) creating a typology of discontinuous multiword expressions in some Germanic, Celtic, Romance, Turkic, and Uralic languages; (2) creating a module for recognising and reordering discontiguous multiword expressions; and (3) supporting discontiguous multiwords for specifically the English-Spanish pair.&lt;br /&gt;
 &lt;br /&gt;
== How and who will it benefit in society? ==&lt;br /&gt;
&lt;br /&gt;
Discontiguous multiwords are common in Germanic, Celtic, Romance, Turkic, and Uralic languages. These groups make up the majority of Apertium’s language database. All Apertium users of these five language groups stand to benefit from this project.&lt;br /&gt;
&lt;br /&gt;
== Why should Google and Apertium sponsor it? ==&lt;br /&gt;
&lt;br /&gt;
This issue is rather large, but the solution is within close reach and it provides generous rewards. Discontinuous multiwords are quite common in everyday speech (for those languages that they appear in), so fixing the problem will generously improve translation quality across the board. The discontiguous multiwords problem should be addressed the sooner the better; but this project has been sitting in the GSoC ideas tank on the wiki since 2010.&lt;br /&gt;
&lt;br /&gt;
== Work Plan ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Community bonding period&amp;#039;&amp;#039;&amp;#039; – (begin typology) &amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Part I: preparing data&amp;#039;&amp;#039;&amp;#039; – create a typology of different types of discontinuous multiword expressions in Germanic, Celtic, Romance, Turkic, and Uralic languages. This helps with getting an idea of how to build a module in part II. I estimate that it would take 2-5 days to investigate multiword expressions in each language, depending on how familiar I am with the language. I chose the following languages for their significance in Apertium’s database and for my accessibility to them. I’m more familiar with the Romance languages than the others. Creating typologies for roughly 10-12 languages would take up at least a hefty month’s worth of time; I plan to start on Part I during the community-bonding period.&lt;br /&gt;
&lt;br /&gt;
*Week 1 (5/22): Germanic- English, Swedish | Celtic- Welsh&lt;br /&gt;
*Week 2 (5/29): Romance- Portuguese, Spanish, French&lt;br /&gt;
*Week 3 (6/5):  Romance- Italian, Romanian&lt;br /&gt;
*Week 4 (6/12): Turkic-   | Uralic- Finnish&lt;br /&gt;
&amp;#039;&amp;#039;Deliverable #1:&amp;#039;&amp;#039; typologies of discontiguous multiword expressions for 10-12 languages currently supported by Apertium, with at least one from each of the five language categories.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Part II: building the module&amp;#039;&amp;#039;&amp;#039; – create a module/script for recognising and reordering discontiguous multiword expressions&lt;br /&gt;
&lt;br /&gt;
*Week 5 (6/19):&lt;br /&gt;
*Week 6 (6/26):&lt;br /&gt;
*Week 7 (7/3):&lt;br /&gt;
*Week 8 (7/10):&lt;br /&gt;
&amp;#039;&amp;#039;Deliverable #2:&amp;#039;&amp;#039; functioning discontiguous multiword processor, not yet integrated into Apertium&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Part III:&amp;#039;&amp;#039;&amp;#039; - integrating the module into Apertium (insert between Apertium-pretransfer and lt-proc-b)&lt;br /&gt;
&lt;br /&gt;
*Week 9 (7/17):&lt;br /&gt;
*Week 10 (7/24):&lt;br /&gt;
*Week 11 (7/31): include support for discontiguous multiwords in specific pairs&lt;br /&gt;
*Week 12 (8/7): include support for discontiguous multiwords in specific pairs&lt;br /&gt;
&amp;#039;&amp;#039;Project completed:&amp;#039;&amp;#039; typologies and fully-integrated module for processing discontiguous multiwords&lt;br /&gt;
&lt;br /&gt;
*Week 13 (8/14): testing&lt;br /&gt;
*Week 14 (8/21): pencils down&lt;br /&gt;
 &lt;br /&gt;
== List your skills and give evidence of your qualifications. ==&lt;br /&gt;
&lt;br /&gt;
I’m a second-year Computer Science major and Linguistics minor at Swarthmore College (United States). English is my native language and Spanish is a language that I studied for four years in high school.&lt;br /&gt;
*Relevant coursework: Data Structures/Algorithms, Computer Systems, Algorithm Analysis, Artificial Intelligence/Machine Learning, Syntax&lt;br /&gt;
*Technical skills: Python, C++, C, Java&lt;br /&gt;
*Coding challenges: https://github.com/irene-tang/discontiguous-multiwords (information is in the README)&lt;br /&gt;
 &lt;br /&gt;
== List any non-Summer of Code plans you have for the summer. ==&lt;br /&gt;
&lt;br /&gt;
If my project is accepted, then my plan is to complete GSoC and take some light elective course somewhere, either online or at a community college.&lt;/div&gt;</summary>
		<author><name>Irene</name></author>
		
	</entry>
</feed>