Bilingual dictionary discovery

vs crossdics[edit]

кил--йорт are not strongly-connected to each other, but hypothesising an arc between them would make them a strongly-connected subgraph along with ev and дом. The size of the strongly-connected subgraph (here: 4) could be an indicator of the strength of the association, but strongly-connected subgraphs might be too hard a requirement.

You could still get кил--йорт through crossdics. If crossdics gives the subgraphs of size 3 (where one arc is hypothesized), then the intersection of runs of crossdics (chv-rus-tat and chv-tur-tat) doesn't necessarily give the subgraphs of size 4 – that would require the rus-tur connection as well, while the crossdics intersection doesn't require that.

Two things we wouldn't get from crossdics:

the fact that even with one arc missing, is still stronger than the simple chv-rus-tat crossdics (due to the extra route via tur),
the possibility of adding translations where both crossings would have lacunae, but doublecrossing shows a translation:
possibly a bad translation, but if there are no shorter paths for either word, it might be worth it

Restrictions on sub-graphs:[edit]

Only one word per input language
Prune words with only a single output arc.
Only accept words where there is a cycle(?)

Some ideas:[edit]

Weighting
- Outgoing arcs get 1/number of arcs?
Using more monolingual data, e.g. each word gets an SL concordance/context vector.

Notes[edit]

↑ http://en.wikipedia.org/wiki/Strongly_connected_components

Related pages[edit]

[1] ttp://en.wikipedia.org/wiki/Strongly_connected_components

[1]

Bilingual dictionary discovery

Contents

vs crossdics[edit]

Restrictions on sub-graphs:[edit]

Some ideas:[edit]

Notes[edit]

Further reading[edit]

Related pages[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools