Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on
If you have any questions, please come and talk to us on
#apertium
on irc.freenode.net
or contact the GitHub migration team.Talk:Turkic-Turkic translator
From Apertium
[edit] Lexicon trimming
<spectre> also, some 'corner cases' for the lexicon scraper <spectre> when you have stems in a continuation lexicon, e.g. demonstratives in kazakh, personal pronouns in other places <spectre> they will all get included disregarding the bilingual dictionary <spectre> another example: when one language has a word that another doesn't have (e.g. a case that turns into a postposition, but the postposition doesn't have an equivalent in the other language, e.g. is inserted by transfer <spectre> (same goes for "particles") <spectre> for verbs (this one can be fixed by having a correspondence between tags/continuation lexica) e.g. when you have a verb which is both tv/iv but only one in the bidix <spectre> another example: when you have an entry like foo<adj>:bar<n><attr> in the bilingual dictionary, but no entry for foo<adj><subst>:bar<n> (or foo<adj><subst>:baz<n>) then there will be errors
[edit] Testvoc
We probably need to work out a way to run the testvoc in a reasonable amount of time. Here are some suggestions:
- Create sub-lexicons, which just run one category through the testvoquing process.
- Treat clitics like 'mi', 'i' etc. separately, and not as attached. In Turkish this would reduce the size of those categories which can take these clitics by _at least_ 6 times.
- Because of how the lexicons are laid out. We could try and do some kind of continuation-based testvoc.
Idea:
- Read in the lexc file, and from the stems, reading up, make a list of the combinations of continuation lexicons, e.g.
V-TV V-FIN-COMMON V-NONFIN ...
- Make a hash relating each combination of continuation lexicons to a list of stems
- Do this for both lexc files
- Match up the combinations via the bilingual dictionary. e.g.
"foo" N -- "bar" N; "baz" N-NOPOS - "barm" N
- Then for each combination of lists of continuation lexicons, make lists of the different combinations with the bilingual dictionary.
- Select n at random from each of the pairs and expand them.