Difference between revisions of "Bilingual dictionary discovery"
(Created page with " This page describes a way of discovering new bilingual, or multilingual dictionaries. We already have apertium-dixtools for crossing dictionaries, but what happens if yo...") |
|||
Line 16: | Line 16: | ||
You could make a graph out of these dictionaries where each node is a word in a language, and each arc is a language pair. For example like: http://i.imgur.com/SFOsRMv.png |
You could make a graph out of these dictionaries where each node is a word in a language, and each arc is a language pair. For example like: http://i.imgur.com/SFOsRMv.png |
||
You could then cluster the words using some "strongly-connected subgraph" algorithm. Then assume that the sets of words within a strongly-connected subgraph are translations of each other. Meaning that you could get кил--йорт without having any direct correspondence. |
You could then cluster the words using some "strongly-connected subgraph"<ref>http://en.wikipedia.org/wiki/Strongly_connected_components</ref> algorithm. Then assume that the sets of words within a strongly-connected subgraph are translations of each other. Meaning that you could get кил--йорт without having any direct correspondence. |
||
==Notes== |
|||
<references/> |
|||
[[Category:Development]] |
[[Category:Development]] |
Revision as of 21:52, 10 July 2014
This page describes a way of discovering new bilingual, or multilingual dictionaries.
We already have apertium-dixtools for crossing dictionaries, but what happens if you want to make a pair where there are no direct crossings available, or alternatively you want to enhance the accuracy of the crossing.
We can try using multiple input dictionaries.
Let's say you want to make a Chuvash--Tatar dictionary, and you have:
- Chuvash--Russian
- Chuvash--Turkish
- Turkish--Russian
- Turkish--Tatar
- Russian--Tatar
You could make a graph out of these dictionaries where each node is a word in a language, and each arc is a language pair. For example like: http://i.imgur.com/SFOsRMv.png
You could then cluster the words using some "strongly-connected subgraph"[1] algorithm. Then assume that the sets of words within a strongly-connected subgraph are translations of each other. Meaning that you could get кил--йорт without having any direct correspondence.