Difference between revisions of "Bilingual dictionary discovery"

Latest revision as of 00:02, 22 March 2018

vs crossdics[edit]

кил--йорт are not strongly-connected to each other, but hypothesising an arc between them would make them a strongly-connected subgraph along with ev and дом. The size of the strongly-connected subgraph (here: 4) could be an indicator of the strength of the association, but strongly-connected subgraphs might be too hard a requirement.

You could still get кил--йорт through crossdics. If crossdics gives the subgraphs of size 3 (where one arc is hypothesized), then the intersection of runs of crossdics (chv-rus-tat and chv-tur-tat) doesn't necessarily give the subgraphs of size 4 – that would require the rus-tur connection as well, while the crossdics intersection doesn't require that.

Two things we wouldn't get from crossdics:

the fact that even with one arc missing, is still stronger than the simple chv-rus-tat crossdics (due to the extra route via tur),
the possibility of adding translations where both crossings would have lacunae, but doublecrossing shows a translation:
possibly a bad translation, but if there are no shorter paths for either word, it might be worth it

Restrictions on sub-graphs:[edit]

Only one word per input language
Prune words with only a single output arc.
Only accept words where there is a cycle(?)

Some ideas:[edit]

Weighting
- Outgoing arcs get 1/number of arcs?
Using more monolingual data, e.g. each word gets an SL concordance/context vector.

Notes[edit]

↑ http://en.wikipedia.org/wiki/Strongly_connected_components

Related pages[edit]

[1] ttp://en.wikipedia.org/wiki/Strongly_connected_components

[1]

@@ Line 53: / Line 53: @@
 * [http://turing.cs.washington.edu/papers/mausam-acl-ijcnlp-09.pdf Compiling a Massive, Multilingual Dictionary via Probabilistic Inference]
 * [http://www.mt-archive.info/10/MTS-2013-Sato.pdf Terminology-driven Augmentation of Bilingual Terminologies]
+* https://github.com/IlnarSelimcan/projectt/blob/master/bidixes2multidix.py might be helpful when getting started
+==Related pages==
+* [[Ideas for Google Summer of Code/Improved bilingual dictionary induction]]
+* [[Building dictionaries#Generating bilingual dictionary entries]]
 [[Category:Development]]

Difference between revisions of "Bilingual dictionary discovery"

Latest revision as of 00:02, 22 March 2018

Contents

vs crossdics[edit]

Restrictions on sub-graphs:[edit]

Some ideas:[edit]

Notes[edit]

Further reading[edit]

Related pages[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools