https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&feed=atom&action=history
User:Khannatanmai/GSoC2020Proposal Trimming - Revision history
2024-03-29T11:36:22Z
Revision history for this page on the wiki
MediaWiki 1.34.1
https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&diff=71957&oldid=prev
Khannatanmai: /* Non-Summer-Of-Code Plans */
2020-03-31T16:55:10Z
<p><span dir="auto"><span class="autocomment">Non-Summer-Of-Code Plans</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:55, 31 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 331:</td>
<td colspan="2" class="diff-lineno">Line 331:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>I have no plans apart from GSoC in the summer and can devote 30-40 hours a week for this project.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>I have no plans apart from GSoC in the summer and can devote 30-40 hours a week for this project.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Category:GSoC_2020_student_proposals]]</div></td>
</tr>
</table>
Khannatanmai
https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&diff=71955&oldid=prev
Khannatanmai: /* Skills and Qualifications */
2020-03-31T16:41:38Z
<p><span dir="auto"><span class="autocomment">Skills and Qualifications</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:41, 31 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 320:</td>
<td colspan="2" class="diff-lineno">Line 320:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>I've been interested in linguistics from the very beginning and due to the rigorous programming courses, I'm also adept at several programming languages like Python, C++, XML, Bash Scripting, etc. I'm skilled in writing Finite State Transducers, Algorithms. Data Structures, and Machine Learning Algorithms as well.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>I've been interested in linguistics from the very beginning and due to the rigorous programming courses, I'm also adept at several programming languages like Python, C++, XML, Bash Scripting, etc. I'm skilled in writing Finite State Transducers, Algorithms. Data Structures, and Machine Learning Algorithms as well.</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>I also have a lot of experience studying and manually analysing data which. is an essential skill for doing problem solving in Linguistics</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>I also have a lot of experience studying and manually analysing data which. is an essential skill for doing problem solving in Linguistics<ins class="diffchange diffchange-inline">.</ins></div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>I've worked with Apertium as part of GSoC 2019 and built the Anaphora resolution module, and hence I'm familiar with the codebase, writing parsers for the stream format, and the community which will help me to dive right in the project and make a significant contribution right from the start. I have worked in several other projects, such as a tool that predicts commas and sentence boundaries in ASR output using pitch, building a Translation Memory, Detecting Homographic Puns, POS Taggers, Grammar and Spell Checkers, Named Entity Recognisers, Building Chatbots, etc. all of which required a working understanding of Natural Language Processing. Most of these projects were done offline in my research lab and aren't available on GitHub because of the privacy settings but can be provided if needed.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>I've worked with Apertium as part of GSoC 2019 and built the Anaphora resolution module, and hence I'm familiar with the codebase, writing parsers for the stream format, and the community which will help me to dive right in the project and make a significant contribution right from the start. I have worked in several other projects, such as a tool that predicts commas and sentence boundaries in ASR output using pitch, building a Translation Memory, Detecting Homographic Puns, POS Taggers, Grammar and Spell Checkers, Named Entity Recognisers, Building Chatbots, etc. all of which required a working understanding of Natural Language Processing. Most of these projects were done offline in my research lab and aren't available on GitHub because of the privacy settings but can be provided if needed.</div></td>
</tr>
</table>
Khannatanmai
https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&diff=71954&oldid=prev
Khannatanmai: /* Which of the published tasks am I interested in? What do I plan to do? */
2020-03-31T16:40:10Z
<p><span dir="auto"><span class="autocomment">Which of the published tasks am I interested in? What do I plan to do?</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:40, 31 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 45:</td>
<td colspan="2" class="diff-lineno">Line 45:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Project Proposal =</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Project Proposal =</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Which of the published tasks am I interested in? What do I plan to do? ==</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Which of the published tasks am I interested in? What do I plan to do? ==</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The task I'm interested in is '''Eliminating Dictionary Trimming'''([[Ideas_for_Google_Summer_of_Code/Eliminate_trimming]]). Dictionary trimming is the process of removing those words and their analyses from monolingual language models (FSTs compiled from monodixes) which don't have an entry in the bidix, to avoid a lot of untranslated lemmas (with an @ if debugging) in the output, which lead to issues with comprehension and post-editing the output. <del class="diffchange diffchange-inline">The</del> <del class="diffchange diffchange-inline">section</del> <del class="diffchange diffchange-inline">#WhyWeTrim</del> explains the rationale behind trimming further.</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The task I'm interested in is '''Eliminating Dictionary Trimming'''([[Ideas_for_Google_Summer_of_Code/Eliminate_trimming]]). Dictionary trimming is the process of removing those words and their analyses from monolingual language models (FSTs compiled from monodixes) which don't have an entry in the bidix, to avoid a lot of untranslated lemmas (with an @ if debugging) in the output, which lead to issues with comprehension and post-editing the output. <ins class="diffchange diffchange-inline">[[Why</ins> <ins class="diffchange diffchange-inline">we</ins> <ins class="diffchange diffchange-inline">trim]]</ins> explains the rationale behind trimming further.</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>However, by trimming the dictionary, you throw away valuable analyses of words in the source language, which, if preserved, can be used as context for lexical selection and analysis of the input. Also, several transfer rules don't match as the word is shown as unknown. There are several other project ideas that would become viable if we don't trim away the analyses of monodix words that aren't in the bidix, such as a morph-guessing module that can output a source language lemma with target language morphology. Eliminating trimming would also help as if we learn morphological data for a language from an external source, if we continue to trim, it is unusable until all those words are in the bidix. As a general principle, it is better to try and not discard useful information, to pass it in the pipeline and then work on it based on the task.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>However, by trimming the dictionary, you throw away valuable analyses of words in the source language, which, if preserved, can be used as context for lexical selection and analysis of the input. Also, several transfer rules don't match as the word is shown as unknown. There are several other project ideas that would become viable if we don't trim away the analyses of monodix words that aren't in the bidix, such as a morph-guessing module that can output a source language lemma with target language morphology. Eliminating trimming would also help as if we learn morphological data for a language from an external source, if we continue to trim, it is unusable until all those words are in the bidix. As a general principle, it is better to try and not discard useful information, to pass it in the pipeline and then work on it based on the task.</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>As part of this project, I plan to eliminate monodix trimming and propose a solution such that we don't lose the benefits of trimming<del class="diffchange diffchange-inline">. In the next section I will be describing a proposed solution as well as how I plan to work around everything in [[Why_we_trim]]</del>.</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>As part of this project, I plan to eliminate monodix trimming and propose a solution such that we don't lose the benefits of trimming.</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Proposed Workaround ==</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Proposed Workaround ==</div></td>
</tr>
</table>
Khannatanmai
https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&diff=71953&oldid=prev
Khannatanmai: /* About Me */
2020-03-31T16:38:38Z
<p><span dir="auto"><span class="autocomment">About Me</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:38, 31 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 27:</td>
<td colspan="2" class="diff-lineno">Line 27:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''What I want to get out of GSoC'''</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''What I want to get out of GSoC'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>I've worked with Apertium as part of GSoC 2019 and I am familiar with the codebase and the community so I can do a lot more for Apertium during this GSoC and <del class="diffchange diffchange-inline">I</del> <del class="diffchange diffchange-inline">want</del> <del class="diffchange diffchange-inline">to</del> be able to benefit the developers and users of the MT platform.</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>I've worked with Apertium as part of GSoC 2019 and I am familiar with the codebase and the community so I can do a lot more for Apertium during this GSoC and <ins class="diffchange diffchange-inline">with</ins> <ins class="diffchange diffchange-inline">this</ins> <ins class="diffchange diffchange-inline">project I'll</ins> be able to benefit the developers and users of the MT platform<ins class="diffchange diffchange-inline"> and develop a deep knowledge about the pipeline</ins>.</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Computational Linguistics is my passion, and I would love to work with similarly passionate people at Apertium, to develop tools that people actually benefit from. This would be an invaluable experience that classes just can't match.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Computational Linguistics is my passion, and I would love to work with similarly passionate people at Apertium, to develop tools that people actually benefit from. This would be an invaluable experience that classes just can't match.</div></td>
</tr>
</table>
Khannatanmai
https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&diff=71952&oldid=prev
Khannatanmai: /* About Me */
2020-03-31T16:37:03Z
<p><span dir="auto"><span class="autocomment">About Me</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:37, 31 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 27:</td>
<td colspan="2" class="diff-lineno">Line 27:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''What I want to get out of GSoC'''</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''What I want to get out of GSoC'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>I've worked with Apertium as part of GSoC 2019 and I am familiar with the codebase and the community so I can do a lot more for Apertium during this GSoC and I want to be able to benefit the developers and users of the MT platform.</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>I’ve enjoyed using Apertium in various personal and academic projects and it’s amazing to me that I get an opportunity to work with them. </div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Computational Linguistics is my passion, and I would love to work with similarly passionate people at Apertium, to develop tools that people actually benefit from. This would be an invaluable experience that classes just can't match.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Computational Linguistics is my passion, and I would love to work with similarly passionate people at Apertium, to develop tools that people actually benefit from. This would be an invaluable experience that classes just can't match.</div></td>
</tr>
</table>
Khannatanmai
https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&diff=71949&oldid=prev
Khannatanmai: /* Work Plan */
2020-03-31T16:30:38Z
<p><span dir="auto"><span class="autocomment">Work Plan</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:30, 31 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 252:</td>
<td colspan="2" class="diff-lineno">Line 252:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Analyse the code of the parsers of the modules</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Analyse the code of the parsers of the modules</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 1<del class="diffchange diffchange-inline">: Week 1-4</del>'''<del class="diffchange diffchange-inline"> </del>(June 1 - July 3)</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 1'''(June 1 - July 3)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Week 1'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for each module to parse the new stream format</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for each module to parse the new stream format</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Week 2,3,4'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Modify the code for the parsers one by one in branches</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Modify the code for the parsers one by one in branches</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Document the computational burden due to a longer stream</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Document the computational burden due to a longer stream</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 261:</td>
<td colspan="2" class="diff-lineno">Line 264:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 1: June 29'''</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 1: June 29'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 2<del class="diffchange diffchange-inline">: Week 5-8</del>''' (July 3 - July 27)</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 2''' (July 3 - July 27)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Week 5'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for the tokeniser to add the surface form in the stream as well as the lemma and analysis</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for the tokeniser to add the surface form in the stream as well as the lemma and analysis</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for the bidix lookup/generator to read surface form and output it if word missing in bidix</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for the bidix lookup/generator to read surface form and output it if word missing in bidix</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Week 6'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Modify tokeniser code</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Modify tokeniser code</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Modify bidix look up</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Modify bidix look up</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Week 7'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Modify generator</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Modify generator</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for checking if trimming not needed</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for checking if trimming not needed</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Week 8'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Add surface form in the stream as secondary information</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Add surface form in the stream as secondary information</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Document elimination of trimming</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Document elimination of trimming</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 275:</td>
<td colspan="2" class="diff-lineno">Line 283:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 2: July 27'''</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 2: July 27'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 3<del class="diffchange diffchange-inline">: Week 9-12</del>''' (July 31 - August 24)</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 3''' (July 31 - August 24)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Week 9'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Test several language pairs that use trimming and run experiments and tests to ensure no regression of benefits</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Test several language pairs that use trimming and run experiments and tests to ensure no regression of benefits</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Test the new stream format with legacy data files and old language pairs</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Test the new stream format with legacy data files and old language pairs</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Week 10'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Document and prove without doubt that this is fully backwards compatible</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Document and prove without doubt that this is fully backwards compatible</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Try solving markup reordering problem by putting markups as secondary info</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Try solving markup reordering problem by putting markups as secondary info</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Week 11,12'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Make a naive morph guessing module as described above (stretch goal)</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Make a naive morph guessing module as described above (stretch goal)</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Finish writing a thorough documentation and tests</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Finish writing a thorough documentation and tests</div></td>
</tr>
</table>
Khannatanmai
https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&diff=71852&oldid=prev
Khannatanmai: /* A description of how and who it will benefit in society */
2020-03-30T18:01:21Z
<p><span dir="auto"><span class="autocomment">A description of how and who it will benefit in society</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 18:01, 30 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 291:</td>
<td colspan="2" class="diff-lineno">Line 291:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= A description of how and who it will benefit in society =</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= A description of how and who it will benefit in society =</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>It will definitely benefit most users of Apertium and hopefully will attract more people to the tool. By discarding the morphological analysis of the word too early we prevent modules like lexical selection and transfer from using it, which would really benefit from this information. Secondly, as I've described earlier, there are several projects and ideas that would be possible once trimming is eliminated, because now the developer will have the option to use this morphological information anywhere in the pipeline. The morph guessing project will make the output of translation from Apertium much <del class="diffchange diffchange-inline">for</del> comprehensive and would help with both gisting translation and post-editing, and hence help all kinds of users of Apertium.</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>It will definitely benefit most users of Apertium and hopefully will attract more people to the tool. By discarding the morphological analysis of the word too early we prevent modules like lexical selection and transfer from using it, which would really benefit from this information. Secondly, as I've described earlier, there are several projects and ideas that would be possible once trimming is eliminated, because now the developer will have the option to use this morphological information anywhere in the pipeline. The morph guessing project will make the output of translation from Apertium much <ins class="diffchange diffchange-inline">more</ins> comprehensive and would help with both gisting translation and post-editing, and hence help all kinds of users of Apertium.</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>By modifying the stream to include arbitrary information, we also open the doors for several future applications, which can now add and use any info in the pipeline. This would be useful for markup reordering, semantic information, capitalisation, theta roles, sentiment tags, etc.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>By modifying the stream to include arbitrary information, we also open the doors for several future applications, which can now add and use any info in the pipeline. This would be useful for markup reordering, semantic information, capitalisation, theta roles, sentiment tags, etc.</div></td>
</tr>
</table>
Khannatanmai
https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&diff=71851&oldid=prev
Khannatanmai: /* Skills and Qualifications */
2020-03-30T18:00:36Z
<p><span dir="auto"><span class="autocomment">Skills and Qualifications</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 18:00, 30 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 308:</td>
<td colspan="2" class="diff-lineno">Line 308:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>I've been interested in linguistics from the very beginning and due to the rigorous programming courses, I'm also adept at several programming languages like Python, C++, XML, Bash Scripting, etc. I'm skilled in writing Finite State Transducers, Algorithms. Data Structures, and Machine Learning Algorithms as well.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>I've been interested in linguistics from the very beginning and due to the rigorous programming courses, I'm also adept at several programming languages like Python, C++, XML, Bash Scripting, etc. I'm skilled in writing Finite State Transducers, Algorithms. Data Structures, and Machine Learning Algorithms as well.</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>I also have a lot of experience studying <del class="diffchange diffchange-inline">data</del> <del class="diffchange diffchange-inline">which</del> <del class="diffchange diffchange-inline">I</del> <del class="diffchange diffchange-inline">feel</del> is essential <del class="diffchange diffchange-inline">in</del> <del class="diffchange diffchange-inline">solving</del> <del class="diffchange diffchange-inline">any</del> problem<del class="diffchange diffchange-inline">.</del></div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>I also have a lot of experience studying <ins class="diffchange diffchange-inline">and</ins> <ins class="diffchange diffchange-inline">manually</ins> <ins class="diffchange diffchange-inline">analysing</ins> <ins class="diffchange diffchange-inline">data which.</ins> is<ins class="diffchange diffchange-inline"> an</ins> essential <ins class="diffchange diffchange-inline">skill</ins> <ins class="diffchange diffchange-inline">for</ins> <ins class="diffchange diffchange-inline">doing</ins> problem<ins class="diffchange diffchange-inline"> solving in Linguistics</ins></div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>I've worked with Apertium as part of GSoC 2019 and built the Anaphora resolution module, and hence I'm familiar with the codebase, writing parsers for the stream format, and the community which will help me to dive right in the project and make a significant contribution right from the start. I have worked in several other projects, such as a tool that predicts commas and sentence boundaries in ASR output using pitch, building a Translation Memory, Detecting Homographic Puns, POS Taggers, Grammar and Spell Checkers, Named Entity Recognisers, Building Chatbots, etc. all of which required a working understanding of Natural Language Processing. Most of these projects were done offline in my research lab and aren't available on GitHub because of the privacy settings but can be provided if needed.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>I've worked with Apertium as part of GSoC 2019 and built the Anaphora resolution module, and hence I'm familiar with the codebase, writing parsers for the stream format, and the community which will help me to dive right in the project and make a significant contribution right from the start. I have worked in several other projects, such as a tool that predicts commas and sentence boundaries in ASR output using pitch, building a Translation Memory, Detecting Homographic Puns, POS Taggers, Grammar and Spell Checkers, Named Entity Recognisers, Building Chatbots, etc. all of which required a working understanding of Natural Language Processing. Most of these projects were done offline in my research lab and aren't available on GitHub because of the privacy settings but can be provided if needed.</div></td>
</tr>
</table>
Khannatanmai
https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&diff=71812&oldid=prev
Khannatanmai: /* Work Plan (TODO) */
2020-03-30T14:05:10Z
<p><span dir="auto"><span class="autocomment">Work Plan (TODO)</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 14:05, 30 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 238:</td>
<td colspan="2" class="diff-lineno">Line 238:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Now for the trimming, In the fourth statement we do a compose-intersect between the automorf fst (compiled monodix) and the autobil prefixes that we prepared above (modified bidix).''' Then, using <code>hfst-fst2fst</code> it converts the fst to an optimized-lookup (weighted) implementation and the final trimmed file is named <code>$(PREFIX1).automorf.hfst</code>. The unmodified automorf is put in the same optimised-lookup implementation and named <code>$(PREFIX1).automorf-untrimmed.hfst</code>. Note that <code>PREFIX1=$(LANG1)-$(LANG2)</code>.</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Now for the trimming, In the fourth statement we do a compose-intersect between the automorf fst (compiled monodix) and the autobil prefixes that we prepared above (modified bidix).''' Then, using <code>hfst-fst2fst</code> it converts the fst to an optimized-lookup (weighted) implementation and the final trimmed file is named <code>$(PREFIX1).automorf.hfst</code>. The unmodified automorf is put in the same optimised-lookup implementation and named <code>$(PREFIX1).automorf-untrimmed.hfst</code>. Note that <code>PREFIX1=$(LANG1)-$(LANG2)</code>.</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>= Work Plan<del class="diffchange diffchange-inline"> (TODO)</del> =</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>= Work Plan =</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Application Review Period''' (<del class="diffchange diffchange-inline">April</del> <del class="diffchange diffchange-inline">1</del> - May <del class="diffchange diffchange-inline">4</del>)</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Application Review Period''' (<ins class="diffchange diffchange-inline">March</ins> <ins class="diffchange diffchange-inline">31</ins> - May <ins class="diffchange diffchange-inline">3</ins>)</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Document modification to Apertium stream format</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Document modification to Apertium stream format</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Proof of Concept for the new format</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Proof of Concept for the new format</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 252:</td>
<td colspan="2" class="diff-lineno">Line 252:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Analyse the code of the parsers of the modules</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Analyse the code of the parsers of the modules</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 1: Week 1-4''' (June 1 - )</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 1: Week 1-4''' (June 1 - <ins class="diffchange diffchange-inline">July 3</ins>)</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for each module to parse the new stream format</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for each module to parse the new stream format</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Modify the code for the parsers one by one in branches</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Modify the code for the parsers one by one in branches</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 259:</td>
<td colspan="2" class="diff-lineno">Line 259:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Deliverable #1: Stream Modified such that the new format passes through each part of the pipeline ==</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Deliverable #1: Stream Modified such that the new format passes through each part of the pipeline ==</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 1: June '''</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 1: June <ins class="diffchange diffchange-inline">29</ins>'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 2: Week 5-8''' (<del class="diffchange diffchange-inline">June</del> )</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 2: Week 5-8''' (<ins class="diffchange diffchange-inline">July</ins> <ins class="diffchange diffchange-inline">3 - July 27</ins>)</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for the tokeniser to add the surface form in the stream as well as the lemma and analysis</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for the tokeniser to add the surface form in the stream as well as the lemma and analysis</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for the bidix lookup/generator to read surface form and output it if word missing in bidix</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for the bidix lookup/generator to read surface form and output it if word missing in bidix</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 273:</td>
<td colspan="2" class="diff-lineno">Line 273:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Deliverable #2: Trimming eliminated without regression of benefits ==</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Deliverable #2: Trimming eliminated without regression of benefits ==</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 2: July '''</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 2: July <ins class="diffchange diffchange-inline">27</ins>'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 3: Week 9-12''' (July )</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Phase 3: Week 9-12''' (July <ins class="diffchange diffchange-inline">31 - August 24</ins>)</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Test several language pairs that use trimming and run experiments and tests to ensure no regression of benefits</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Test several language pairs that use trimming and run experiments and tests to ensure no regression of benefits</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Test the new stream format with legacy data files and old language pairs</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Test the new stream format with legacy data files and old language pairs</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 283:</td>
<td colspan="2" class="diff-lineno">Line 283:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Finish writing a thorough documentation and tests</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Finish writing a thorough documentation and tests</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Final Evaluations: August 24-'''</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Final Evaluations: August 24<ins class="diffchange diffchange-inline"> </ins>-<ins class="diffchange diffchange-inline"> August 31</ins>'''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Project Completed ==</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Project Completed ==</div></td>
</tr>
</table>
Khannatanmai
https://wiki.apertium.org/w/index.php?title=User:Khannatanmai/GSoC2020Proposal_Trimming&diff=71811&oldid=prev
Khannatanmai: /* Work Plan (TODO) */
2020-03-30T14:00:25Z
<p><span dir="auto"><span class="autocomment">Work Plan (TODO)</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 14:00, 30 March 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 241:</td>
<td colspan="2" class="diff-lineno">Line 241:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Application Review Period''' (April 1 - May 4)</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Application Review Period''' (April 1 - May 4)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Document modification to Apertium stream format</div></td>
</tr>
<tr>
<td class="diff-marker"><a class="mw-diff-movedpara-left" title="Paragraph was moved. Click to jump to new location." href="#movedpara_9_0_rhs">⚫</a></td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_2_0_lhs"></a>* Write tests to prepare for test driven development</div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Proof of Concept for the new format</div></td>
</tr>
<tr>
<td class="diff-marker"><a class="mw-diff-movedpara-left" title="Paragraph was moved. Click to jump to new location." href="#movedpara_11_0_rhs">⚫</a></td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_4_0_lhs"></a>* Run some experiments with the new stream format<del class="diffchange diffchange-inline"> - document how much change is needed in which parsers</del></div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Document all the proposed benefits with including secondary information</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Document how much change is needed in which parsers and what the change is</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Document the change needed in tokeniser, bidix lookup, and generation to include surface form</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Community Bonding Period''' (May 4 - June 1)</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Community Bonding Period''' (May 4 - June 1)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker"><a class="mw-diff-movedpara-right" title="Paragraph was moved. Click to jump to old location." href="#movedpara_2_0_lhs">⚫</a></td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_9_0_rhs"></a>* Write tests to prepare for test driven development</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* </div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker"><a class="mw-diff-movedpara-right" title="Paragraph was moved. Click to jump to old location." href="#movedpara_4_0_lhs">⚫</a></td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_11_0_rhs"></a>* Run some experiments with the new stream format</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Analyse the code of the parsers of the modules</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Week 1-4''' (June 1 - )</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''<ins class="diffchange diffchange-inline">Phase 1: </ins>Week 1-4''' (June 1 - )</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for each module to parse the new stream format</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* </div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Modify the code for the parsers one by one in branches</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Document the computational burden due to a longer stream</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Deliverable #1: Stream Modified such that the new format passes through each part of the pipeline ==</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Deliverable #1: Stream Modified such that the new format passes through each part of the pipeline ==</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 255:</td>
<td colspan="2" class="diff-lineno">Line 261:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 1: June '''</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 1: June '''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Week 5-8''' (June )</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''<ins class="diffchange diffchange-inline">Phase 2: </ins>Week 5-8''' (June )</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for the tokeniser to add the surface form in the stream as well as the lemma and analysis</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* </div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for the bidix lookup/generator to read surface form and output it if word missing in bidix</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Modify tokeniser code</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Modify bidix look up</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Modify generator</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Write tests for checking if trimming not needed</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Add surface form in the stream as secondary information</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Document elimination of trimming</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Deliverable #2: Trimming eliminated without regression of benefits ==</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Deliverable #2: Trimming eliminated without regression of benefits ==</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 262:</td>
<td colspan="2" class="diff-lineno">Line 275:</td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 2: July '''</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Evaluation 2: July '''</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Week 9-12''' (July )</div></td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''<ins class="diffchange diffchange-inline">Phase 3: </ins>Week 9-12''' (July )</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Test several language pairs that use trimming and run experiments and tests to ensure no regression of benefits</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Test the new stream format with legacy data files and old language pairs</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* </div></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Document and prove without doubt that this is fully backwards compatible</div></td>
</tr>
<tr>
<td class="diff-marker">−</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"></td>
<td colspan="2" class="diff-empty"> </td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Try solving markup reordering problem by putting markups as secondary info</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Make a naive morph guessing module as described above (stretch goal)</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty"> </td>
<td class="diff-marker">+</td>
<td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Finish writing a thorough documentation and tests</div></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td>
</tr>
<tr>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Final Evaluations: August 24-'''</div></td>
<td class="diff-marker"> </td>
<td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Final Evaluations: August 24-'''</div></td>
</tr>
</table>
Khannatanmai