Ideas for Google Summer of Code/Weighted transfer rules

Example

Transfer rules:

ID	Rule	Input	Output
1	$x$ de $y$ → $x$ $y$	memoria de traducción	translation memory
2	$x$ de $y$ → $y$ 's $x$	la hermana de mi novia	my girlfriend's sister
3	$x$ de $y$ → $x$ of $y$	el estado de la cuestión	the state of the art

Take a big corpus
For each sentence:
- Apply transfer rules
- For each possible combination of transfer rules
  - Translate the sentence and score on language model
  - Each sentence gets a count 1. This count is shared between the transfer rules.

	La canciller se reúne hoy con el presidente de EE UU para limar asperezas y preparar la cumbre del miércoles con Putin.
1 1	The chancellor gathers today with [the U.S. president] for mend fences and prepare [the Wednesday summit] with Putin.	-74.55	0.39
2 1	The chancellor gathers today with [the U.S.'s president] for mend fences and prepare [the Wednesday summit] with Putin.	-69.51	60.71
3 1	The chancellor gathers today with [the president of the U.S.] for mend fences and prepare [the Wednesday summit] with Putin.	-74.47	0.43
1 2	The chancellor gathers today with [the U.S. president] for mend fences and prepare [the Wednesday's summit] with Putin.	-75.02	0.25
2 2	The chancellor gathers today with [the U.S.'s president] for mend fences and prepare [the Wednesday's summit] with Putin.	-69.98	37.94
3 2	The chancellor gathers today with [the president of the U.S.] for mend fences and prepare [the Wednesday's summit] with Putin.	-74.94	0.27
1 3	The chancellor gathers today with [the U.S. president] for mend fences and prepare [the summit of the Wednesday] with Putin.	-82.88	0.0
2 3	The chancellor gathers today with [the U.S.'s president] for mend fences and prepare [the summit of the Wednesday] with Putin.	-77.84	0.01
3 3	The chancellor gathers today with [the president of the U.S.] for mend fences and prepare [the summit of the Wednesday] with Putin.	-82.80	0.0

You can then feed the fractional counts to some supervised machine learning program to get appropriate weights.

How to calculate the paths?
- With optimal coverage, or with just taking the LRLM and only calculating paths for rules which conflict.
For lexicalised weights:
- What is the function assigning cost to each lexical combination of N1 and N2?
Could we score a rule at a time, by keeping part fixed ?

Write a program (in python or C++) that reads the XML transfer format patterns and applies them to an input stream printing out all the possible coverages, using left-right longest match (so a "det" rule and a "noun" rule won't match "det noun" input if there are "det noun" rules).

Write a program (in python or C++) that reads the XML transfer format patterns and applies them to an input stream printing out all the possible coverages, including alternatives where a combination of shorter rules matches a longer rule (so a "det" rule and a "noun" rule will be included in the combinations even if there are "det noun" rules).

ID	Rule	Input	Output	Frequency
1	$x$ de $y$ → $x$ $y$	memoría de traducción	translation memory	90
2	$x$ de $y$ → $y$ 's $x$	memoría de traducción	translation's memory	0
3	$x$ de $y$ → $x$ of $y$	memoría de traducción	memory of translation	0