# Difference between revisions of "Ideas for Google Summer of Code/Command-line translation memory fuzzy-match repair"

## Fuzzy matching in translation memories

Imagine that the new sentence is:

s’ = “Connect the printer to the computer”

And we find a fuzzy match (score 83%):

s’ = “Connect the printer to the computer” t = “Connecteu l’escàner a l’ordinador”

What would be t', the translation of s'?

## Obtain t’ by “patching” or “repairing” t

A way to repair would be:

• determine what changed from s to s’
• decide which parts of t correspond to changed parts in s
• translate what changed from s to s’
• change the corresponding parts in t to obtain one or more approximate t’

## Determine what changed from s to s’

So first we align:

s’ = “Connect the printer to the computer” s = “Connect the scanner to the computer”

and find the changes shown.

Then we cut “clippings” around the changes: “the scanner”→ “the printer” “the scanner to” → “the printer to” “scanner to” → “printer to” “connect the scanner” → “connect the printer” “scanner to the computer” → “printer to the computer”

## Decide which parts of t correspond to changed parts in s

s = “Connect the scanner to the computer” t = “Connecteu l’escàner a l’ordinador”

Translate the s clippings and match them in t.

(The translations given here are from Google)

• “the scanner” [2,3]→ “l’escàner [2,3]”
• “the scanner to [2,4]” → “l’escàner a [2,4]”
• “scanner to [3,4]” → “escàner a [3,4]”
• “connect the scanner” [1,3] → “connecteu l’escàner” [1,3]
• “scanner to the computer” [3,6] → “escàner a l’ordinador” [3,6]

All match! (This may not always be the case).

## Translate what changed from s to s’

Translate s’ clippings different from s.

• “the printer”→ “la impressora”
• “the printer to” → “la impressora a”
• “printer to” → “impressora per”
• “connect the printer” → “connecteu la impressora”
• “printer to the computer” → “impressora a l’ordinador”

Match translations of s’ clippings to translations of s clippings to build “repair operators”.

• “l’escàner” [2,3]→ “la impressora”
• “l’escàner a” [2,4] → “la impressora a”
• “escàner a [3,4]” → “impressora per”
• connecteu l’escàner” [1,3] → “connecteu la impressora”

Overlap emphasized: overlap is desirable.

## Change the corresponding parts in t to obtain one or more approximate t’

t = “Connecteu l’escàner a l’ordinador”

• t’(a) = “Connecteu la impressora a l’ordinador”
• t’(b) = “Connecteu la impressora a l’ordinador”
• t’(c) = “Connecteu l’impressora per l’ordinador”
• t’(d) = “Connecteu la impressora a l’ordinador”
• t’(e) = “Connecteu l’impressora a l’ordinador”

## Which are the best repairs?

Probably the best repairs would come from:

• the longest possible repair operators (even longer than in the example above)
• those having most overlap (or context)
• those having overlaps on both sides

## When should repairs be used?

Only for high fuzzy match scores. This could be a parameter when calling this functionality.