Bilingual dictionary discovery

This page describes a way of discovering new bilingual, or multilingual dictionaries.
We already have apertiumdixtools for crossing dictionaries, but what happens if you want to make a pair where there are no direct crossings available, or alternatively you want to enhance the accuracy of the crossing, or you want to maximise the number of correspondences you can get.
We can try using multiple input dictionaries.
Let's say you want to make a ChuvashTatar dictionary, and you have:
 ChuvashRussian
 ChuvashTurkish
 TurkishRussian
 TurkishTatar
 RussianTatar
You could make a graph out of these dictionaries where each node is a word in a language, and each arc is a language pair. For example like: http://i.imgur.com/SFOsRMv.png
You could then cluster the words using some "stronglyconnected subgraph"^{[1]} algorithm. Then assume that the sets of words within a stronglyconnected subgraph are translations of each other. Meaning that you could get килйорт without having any direct correspondence.
[edit] vs crossdics
килйорт are not stronglyconnected to each other, but hypothesising an arc between them would make them a stronglyconnected subgraph along with ev and дом. The size of the stronglyconnected subgraph (here: 4) could be an indicator of the strength of the association, but stronglyconnected subgraphs might be too hard a requirement.
You could still get килйорт through crossdics. If crossdics gives the subgraphs of size 3 (where one arc is hypothesized), then the intersection of runs of crossdics (chvrustat and chvturtat) doesn't necessarily give the subgraphs of size 4 – that would require the rustur connection as well, while the crossdics intersection doesn't require that.
Two things we wouldn't get from crossdics:
 the fact that even with one arc missing, is still stronger than the simple chvrustat crossdics (due to the extra route via tur),
 the possibility of adding translations where both crossings would have lacunae, but doublecrossing shows a translation:
 possibly a bad translation, but if there are no shorter paths for either word, it might be worth it
[edit] Restrictions on subgraphs:
 Only one word per input language
 Prune words with only a single output arc.
 Only accept words where there is a cycle(?)
[edit] Some ideas:
 Weighting
 Outgoing arcs get 1/number of arcs?
 Using more monolingual data, e.g. each word gets an SL concordance/context vector.
[edit] Notes
[edit] Further reading
 Bilingual Dictionary Induction as an Optimisation Problem
 Compiling a Massive, Multilingual Dictionary via Probabilistic Inference
 Terminologydriven Augmentation of Bilingual Terminologies