<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=Weighted_transfer_rules_at_GSoC_2016</id>
	<title>Weighted transfer rules at GSoC 2016 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.apertium.org/w/index.php?action=history&amp;feed=atom&amp;title=Weighted_transfer_rules_at_GSoC_2016"/>
	<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Weighted_transfer_rules_at_GSoC_2016&amp;action=history"/>
	<updated>2026-05-05T10:54:59Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.34.1</generator>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Weighted_transfer_rules_at_GSoC_2016&amp;diff=59861&amp;oldid=prev</id>
		<title>Nikita Medyankin at 13:37, 23 August 2016</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Weighted_transfer_rules_at_GSoC_2016&amp;diff=59861&amp;oldid=prev"/>
		<updated>2016-08-23T13:37:01Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 13:37, 23 August 2016&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== The Idea ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== The Idea ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;The idea is to allow Apertium transfer rules to be ambiguous, i.e., allow a set of rules to match the same general input pattern&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;.&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;To&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;decide&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;which&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;rule&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;applies,&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;transfer&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;module&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;would&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;use&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;a&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;set&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;of&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;predefined&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;or&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;pretrained&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;—&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;more&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;specific&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;—&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;weighted&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;patterns&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;provided&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;for&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;each&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;group&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;of&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;ambiguous&lt;/del&gt; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;rules&lt;/del&gt;. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;This way, if a specific pattern matches, the rule with the highest weight for that pattern is applied.&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;The idea is to allow Apertium transfer rules to be ambiguous, i.e., allow a set of rules to match the same general input pattern&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;,&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;as&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;opposed&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;to&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;the&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;present&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;situation&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;when&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;the&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;first&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;rule&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;in&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;xml&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;transfer&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;file&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;takes&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;exclusive&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;precedence&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;and&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;blocks&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;out&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;all&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;its&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;ambiguous&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;peers&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;during&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;transfer&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;precompilation&lt;/ins&gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;stage&lt;/ins&gt;. &lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;To decide which rule applies, transfer module would use a set of predefined or pretrained — more specific — weighted patterns provided for each group of ambiguous rules. This way, if a specific pattern matches, the rule with the highest weight for that pattern is applied.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-left&quot; title=&quot;Paragraph was moved. Click to jump to new location.&quot; href=&quot;#movedpara_5_1_rhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_4_0_lhs&quot;&gt;&lt;/a&gt;The first rule in transfer file that matches the general pattern is still considered the default one and is applied if no weighted patterns matched. This way, transfer weights file can be seen as specifying lexicalized or partially lexicalized exceptions from the default rule.&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-right&quot; title=&quot;Paragraph was moved. Click to jump to old location.&quot; href=&quot;#movedpara_4_0_lhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_5_1_rhs&quot;&gt;&lt;/a&gt;The first rule in&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt; xml&lt;/ins&gt; transfer file that matches the general pattern is still considered the default one and is applied if no weighted patterns matched. This way, transfer weights file can be seen as specifying lexicalized or partially lexicalized exceptions from the default rule.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== Example language pair ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== Example language pair ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 10:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 12:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;The difference between this and the trunk version is that [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-en-es/apertium-en-es.en-es.t1x t1x transfer file] has three additional rules which are ambiguous counterparts to the three rules that define interaction of adjacent adjective and noun. In all three original rules, noun and adjective are swapped on output as is usual for Spanish. In additional rules, they are not, as sometimes happens too and is known to be dependent on lexical patterns involved.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;The difference between this and the trunk version is that [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-en-es/apertium-en-es.en-es.t1x t1x transfer file] has three additional rules which are ambiguous counterparts to the three rules that define interaction of adjacent adjective and noun. In all three original rules, noun and adjective are swapped on output as is usual for Spanish. In additional rules, they are not, as sometimes happens too and is known to be dependent on lexical patterns involved.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-left&quot; title=&quot;Paragraph was moved. Click to jump to new location.&quot; href=&quot;#movedpara_12_1_rhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_7_1_lhs&quot;&gt;&lt;/a&gt;== Weights file format ==&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium/apertium/transfer-weights.dtd&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;tbd: some explanation&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== Transfer module ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== Transfer module ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;The code can be located at https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium/&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;The code can be located&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt; in the weighted-transfer branch, namely&lt;/ins&gt; at https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium/&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;The changes were made to [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium/apertium/transfer.cc transfer.cc] and [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium/apertium/transfer.h transfer.h] files to incorporate reading of the weights file and utilising the weights in order to choose which rule applies for ambiguous input. Wrapper [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium/apertium/apertium-transfer.cc apertium-transfer.cc] was also modified to recognize input weights file name provided by -w option.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Since transfer.cc and transfer.h originally came with little to no comments, comments were also added to the crucial parts of transfer in addition to commenting the code directly dealing with transfer weights.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;This version of transfer module understands fully lexicalized patterns (i.e., when only items with lemma and full set of tags are allowed in pattern) as well as partially delexicalized patterns (i.e., with some tokens missing lemmas while retaning full set of tags). However, it does not support any wildcards in tags, only full tag patterns.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;This version of transfer module understands fully lexicalized patterns (i.e., when only items with lemma and full set of tags are allowed in pattern) as well as partially delexicalized patterns (i.e., with some tokens missing lemmas while retaning full set of tags). However, it does not support any wildcards in tags, only full tag patterns.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;a class=&quot;mw-diff-movedpara-right&quot; title=&quot;Paragraph was moved. Click to jump to old location.&quot; href=&quot;#movedpara_7_1_lhs&quot;&gt;&amp;#x26AB;&lt;/a&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&lt;a name=&quot;movedpara_12_1_rhs&quot;&gt;&lt;/a&gt;== Weights file format ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;DTD specification was developed for weights file format and [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium/apertium/transfer-weights.dtd was added] to the branch. Below is a small example of weights file. All mutually ambiguous rules are listed as subelements of the same &#039;rule-group&#039; element. Each rule copies its &#039;comment&#039; and &#039;id&#039; attributes from the xml transfer file. The use of &#039;id&#039; attribute was added specifically for the purpose of matching the rule from weights file with the same rule from transfer file. It is optional, unique, and must be added only to the ambiguous rules. Each rule in weights file also has &#039;md5&#039; attribute, which is added during weights learning and is an md5 sum of original rule text with whitespace removed. It is added in order to be able to check if the weights file actually corresponds to the transfer file during the language pair installation.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;Each rule, in turn, has any number of &#039;pattern&#039; subelements with &#039;weights&#039; attribute, which specify certain patterns for the rule. In the example given below, there is one pattern for both rules, which specifies that the second rule in the group should be preferred to the first, since it is listed with ~0.95 weight for the second as opposed to ~0.05 for the first.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;source lang=&quot;xml&quot;&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;?xml version=&#039;1.0&#039; encoding=&#039;UTF-8&#039;?&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;transfer-weights&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;  &amp;lt;rule-group&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;    &amp;lt;rule comment=&quot;REGLA: DET ADJ NOM&quot; id=&quot;det-adj-nom&quot; md5=&quot;897a67e4ffadec9b7fd515ce0a8d453b&quot;&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;      &amp;lt;pattern weight=&quot;0.05124922803710481&quot;&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;        &amp;lt;pattern-item lemma=&quot;this&quot; tags=&quot;det.dem.sg&quot;/&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;        &amp;lt;pattern-item lemma=&quot;new&quot; tags=&quot;adj.sint&quot;/&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;        &amp;lt;pattern-item lemma=&quot;software&quot; tags=&quot;n.sg&quot;/&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;      &amp;lt;/pattern&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;    &amp;lt;/rule&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;    &amp;lt;rule comment=&quot;REGLA: DET ADJ NOM no-swap-version&quot; id=&quot;det-adj-nom-ns&quot; md5=&quot;13f1c5ed0615ae8f9d3142aed7a3855f&quot;&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;      &amp;lt;pattern weight=&quot;0.9487507719628953&quot;&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;        &amp;lt;pattern-item lemma=&quot;this&quot; tags=&quot;det.dem.sg&quot;/&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;        &amp;lt;pattern-item lemma=&quot;new&quot; tags=&quot;adj.sint&quot;/&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;        &amp;lt;pattern-item lemma=&quot;software&quot; tags=&quot;n.sg&quot;/&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;      &amp;lt;/pattern&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;    &amp;lt;/rule&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;  &amp;lt;/rule-group&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;/transfer-weights&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;&amp;lt;/source&amp;gt;&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;DTD for [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium/apertium/transfer.dtd transfer rules] was modified in order to add &#039;id&#039; property to the &#039;rule&#039; element, used in the corresponding weights file to identify the rules.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== Weights learning script ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== Weights learning script ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;A python3 script was made to enable learning rule &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;weigths&lt;/del&gt; from a corpus. Its source code is located at https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/ It works in two modes, monolingual and parallel.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;A python3 script was made to enable learning rule &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;weights&lt;/ins&gt; from a corpus. Its source code is located at https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/ It works in two modes, monolingual and parallel.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;In monolingual mode, it requires a t1x file with ambiguous rules, a corpus of source language and a pretrained language model of target language. The target language model may be trained on a target language corpus that does not have to be related to the source language corpus in any way. A number of simple helper scripts are located in [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/ tools] folder to help to prepare language model as well as instructions in the [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/README.md README] file. The workflow of the script in monolingual mode is as follows:&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;In monolingual mode, it requires a t1x file with ambiguous rules, a corpus of source language and a pretrained language model of target language. The target language model may be trained on a target language corpus that does not have to be related to the source language corpus in any way. A number of simple helper scripts are located in [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/ tools] folder to help to prepare language model as well as instructions in the [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/README.md README] file. The workflow of the script in monolingual mode is as follows:&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 30:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 62:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;# Score all variants of all sentences against the language model and normalize the scores for the variants of each sentence obtained for the same segment. Store them as scores for the corresponding ambiguous chunk patterns.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;# Score all variants of all sentences against the language model and normalize the scores for the variants of each sentence obtained for the same segment. Store them as scores for the corresponding ambiguous chunk patterns.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;# Sum up the scores for each ambiguous chunk pattern and make weights xml.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;# Sum up the scores for each ambiguous chunk pattern and make weights xml.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;# Prune the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;weigths&lt;/del&gt; xml.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;# Prune the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;weights&lt;/ins&gt; xml.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;In parallel mode, weights learning script requires a parallel corpus stored in two text files which match line by line. The workflow of the script in parallel mode is as follows:&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;In parallel mode, weights learning script requires a parallel corpus stored in two text files which match line by line. The workflow of the script in parallel mode is as follows:&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 37:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 69:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;# If there are any ambiguous chunks in the coverage, translate them and look them up in the corresponding target language sentence. If the translation is found, score the chunk pattern with 1.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;# If there are any ambiguous chunks in the coverage, translate them and look them up in the corresponding target language sentence. If the translation is found, score the chunk pattern with 1.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;# Sum up the scores for each ambiguous chunk pattern and make weights xml.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;# Sum up the scores for each ambiguous chunk pattern and make weights xml.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;# Prune the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;weigths&lt;/del&gt; xml.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;# Prune the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;weights&lt;/ins&gt; xml.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;More information can be found in the [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/README.md README] file.&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;More information can be found in the [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/README.md README] file.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 44:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 76:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;== Evaluation ==&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;== Evaluation ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;A simple script for the evaluation of the resulting weights file was made and can be located at https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/testing/&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-deletedline diff-side-deleted&quot;&gt;&lt;div&gt;tbd&lt;/div&gt;&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-added&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;== To be done ==&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;The following issues should be addressed in further work:&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* Add an option to learn the weights for partially delexicalized patterns in weights learning script.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* Extensively test the impact of the weighted rules on overall quality and speed of translation using large corpora for training and evaluation.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;* Add md5 sum verification.&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Nikita Medyankin</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.apertium.org/w/index.php?title=Weighted_transfer_rules_at_GSoC_2016&amp;diff=59765&amp;oldid=prev</id>
		<title>Nikita Medyankin: Started a page for GSoC 2016 final submission.</title>
		<link rel="alternate" type="text/html" href="https://wiki.apertium.org/w/index.php?title=Weighted_transfer_rules_at_GSoC_2016&amp;diff=59765&amp;oldid=prev"/>
		<updated>2016-08-22T23:31:50Z</updated>

		<summary type="html">&lt;p&gt;Started a page for GSoC 2016 final submission.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;This page serves as a final submission page for [[Ideas_for_Google_Summer_of_Code/Weighted_transfer_rules|Weighted transfer rules project]] conducted by [[User:Nikita Medyankin|Nikita Medyankin]] at Google Summer of Code 2016.&lt;br /&gt;
&lt;br /&gt;
== The Idea ==&lt;br /&gt;
The idea is to allow Apertium transfer rules to be ambiguous, i.e., allow a set of rules to match the same general input pattern. To decide which rule applies, transfer module would use a set of predefined or pretrained — more specific — weighted patterns provided for each group of ambiguous rules. This way, if a specific pattern matches, the rule with the highest weight for that pattern is applied.&lt;br /&gt;
&lt;br /&gt;
The first rule in transfer file that matches the general pattern is still considered the default one and is applied if no weighted patterns matched. This way, transfer weights file can be seen as specifying lexicalized or partially lexicalized exceptions from the default rule.&lt;br /&gt;
&lt;br /&gt;
== Example language pair ==&lt;br /&gt;
An example language pair was put up for the purposes of testing and evaluation. The code can be found at https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-en-es/&lt;br /&gt;
&lt;br /&gt;
The difference between this and the trunk version is that [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-en-es/apertium-en-es.en-es.t1x t1x transfer file] has three additional rules which are ambiguous counterparts to the three rules that define interaction of adjacent adjective and noun. In all three original rules, noun and adjective are swapped on output as is usual for Spanish. In additional rules, they are not, as sometimes happens too and is known to be dependent on lexical patterns involved.&lt;br /&gt;
&lt;br /&gt;
== Weights file format ==&lt;br /&gt;
https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium/apertium/transfer-weights.dtd&lt;br /&gt;
tbd: some explanation&lt;br /&gt;
&lt;br /&gt;
== Transfer module ==&lt;br /&gt;
The code can be located at https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium/&lt;br /&gt;
This version of transfer module understands fully lexicalized patterns (i.e., when only items with lemma and full set of tags are allowed in pattern) as well as partially delexicalized patterns (i.e., with some tokens missing lemmas while retaning full set of tags). However, it does not support any wildcards in tags, only full tag patterns.&lt;br /&gt;
&lt;br /&gt;
== Weights learning script ==&lt;br /&gt;
A python3 script was made to enable learning rule weigths from a corpus. Its source code is located at https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/ It works in two modes, monolingual and parallel.&lt;br /&gt;
&lt;br /&gt;
In monolingual mode, it requires a t1x file with ambiguous rules, a corpus of source language and a pretrained language model of target language. The target language model may be trained on a target language corpus that does not have to be related to the source language corpus in any way. A number of simple helper scripts are located in [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/ tools] folder to help to prepare language model as well as instructions in the [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/README.md README] file. The workflow of the script in monolingual mode is as follows:&lt;br /&gt;
# Tag source language corpus.&lt;br /&gt;
# For each sentence, calculate its LRLM coverage by the transfer rules.&lt;br /&gt;
# If there are any ambiguous chunks in the coverage, segment the sentence into parts containing one ambiguous chunk each.&lt;br /&gt;
## For each sentence segment, translate it in the default way.&lt;br /&gt;
## For each sentence segment, translate it in all possible ways and concatenate each variant with the default translation of the other segments. Store the results.&lt;br /&gt;
# Score all variants of all sentences against the language model and normalize the scores for the variants of each sentence obtained for the same segment. Store them as scores for the corresponding ambiguous chunk patterns.&lt;br /&gt;
# Sum up the scores for each ambiguous chunk pattern and make weights xml.&lt;br /&gt;
# Prune the weigths xml.&lt;br /&gt;
&lt;br /&gt;
In parallel mode, weights learning script requires a parallel corpus stored in two text files which match line by line. The workflow of the script in parallel mode is as follows:&lt;br /&gt;
# Tag source language corpus.&lt;br /&gt;
# For each sentence, calculate its LRLM coverage by the transfer rules.&lt;br /&gt;
# If there are any ambiguous chunks in the coverage, translate them and look them up in the corresponding target language sentence. If the translation is found, score the chunk pattern with 1.&lt;br /&gt;
# Sum up the scores for each ambiguous chunk pattern and make weights xml.&lt;br /&gt;
# Prune the weigths xml.&lt;br /&gt;
&lt;br /&gt;
More information can be found in the [https://svn.code.sf.net/p/apertium/svn/branches/weighted-transfer/apertium-weights-learner/README.md README] file.&lt;br /&gt;
&lt;br /&gt;
For now, weights learning script only allows for learning fully lexicalized patterns, i.e. only items with lemma and full set of tags are allowed in patterns. However, partially delexicalized patterns (i.e., with some tokens missing lemmas while still retaning full set of tags) can be added to the obtained weights file manually.&lt;br /&gt;
&lt;br /&gt;
== Evaluation ==&lt;br /&gt;
tbd&lt;/div&gt;</summary>
		<author><name>Nikita Medyankin</name></author>
		
	</entry>
</feed>