Ideas for Google Summer of Code/Weighted transfer rules

Example[edit]

Transfer rules:

ID	Rule	Input	Output
1	$x$ de $y$ → $x$ $y$	memoria de traducción	translation memory
2	$x$ de $y$ → $y$ 's $x$	la hermana de mi novia	my girlfriend's sister
3	$x$ de $y$ → $x$ of $y$	el estado de la cuestión	the state of the art

Take a big corpus
For each sentence:
- Apply transfer rules
- For each possible combination of transfer rules
  - Translate the sentence and score on language model
  - Each sentence gets a count 1. This count is shared between the transfer rules.

	La canciller se reúne hoy con el presidente de EE UU para limar asperezas y preparar la cumbre del miércoles con Putin.
1 1	The chancellor gathers today with [the U.S. president] for mend fences and prepare [the Wednesday summit] with Putin.	-74.55	0.39
2 1	The chancellor gathers today with [the U.S.'s president] for mend fences and prepare [the Wednesday summit] with Putin.	-69.51	60.71
3 1	The chancellor gathers today with [the president of the U.S.] for mend fences and prepare [the Wednesday summit] with Putin.	-74.47	0.43
1 2	The chancellor gathers today with [the U.S. president] for mend fences and prepare [the Wednesday's summit] with Putin.	-75.02	0.25
2 2	The chancellor gathers today with [the U.S.'s president] for mend fences and prepare [the Wednesday's summit] with Putin.	-69.98	37.94
3 2	The chancellor gathers today with [the president of the U.S.] for mend fences and prepare [the Wednesday's summit] with Putin.	-74.94	0.27
1 3	The chancellor gathers today with [the U.S. president] for mend fences and prepare [the summit of the Wednesday] with Putin.	-82.88	0.0
2 3	The chancellor gathers today with [the U.S.'s president] for mend fences and prepare [the summit of the Wednesday] with Putin.	-77.84	0.01
3 3	The chancellor gathers today with [the president of the U.S.] for mend fences and prepare [the summit of the Wednesday] with Putin.	-82.80	0.0

You can then feed the fractional counts to some supervised machine learning program to get appropriate weights.

How to calculate the paths?
- With optimal coverage, or with just taking the LRLM and only calculating paths for rules which conflict.
For lexicalised weights:
- What is the function assigning cost to each lexical combination of N1 and N2?
Could we score a rule at a time, by keeping part fixed ?

ID	Rule	Input	Output	Frequency
1	$x$ de $y$ → $x$ $y$	memoría de traducción	translation memory	90
2	$x$ de $y$ → $y$ 's $x$	memoría de traducción	translation's memory	0
3	$x$ de $y$ → $x$ of $y$	memoría de traducción	memory of translation	0