Difference between revisions of "Talk:Turkic-Turkic translator"

From Apertium
Jump to navigation Jump to search
Line 13: Line 13:
for foo<adj><subst>:bar<n> (or foo<adj><subst>:baz<n>) then there will be errors
for foo<adj><subst>:bar<n> (or foo<adj><subst>:baz<n>) then there will be errors
</pre>
</pre>

==Testvoc==

We probably need to work out a way to run the testvoc in a reasonable amount of time. Here are some suggestions:

* Create sub-lexicons, which just run one category through the testvoquing process.
* Treat clitics like 'mi', 'i' etc. separately, and not as attached. In Turkish this would reduce the size of those categories which can take these clitics by _at least_ 6 times.
* Because of how the lexicons are laid out. We could try and do some kind of continuation-based testvoc.

Revision as of 12:19, 23 April 2012

Lexicon trimming

<spectre> also, some 'corner cases' for the lexicon scraper
<spectre> when you have stems in a continuation lexicon, e.g. demonstratives in kazakh, personal pronouns in other places
<spectre> they will all get included disregarding the bilingual dictionary
<spectre> another example: when one language has a word that another doesn't have (e.g. a case that turns into a postposition, 
          but the postposition doesn't have an equivalent in the other language, e.g. is inserted by transfer
<spectre> (same goes for "particles")
<spectre> for verbs (this one can be fixed by having a correspondence between tags/continuation lexica) 
          e.g. when you have a verb which is both tv/iv but only one in the bidix
<spectre> another example: when you have an entry like foo<adj>:bar<n><attr> in the bilingual dictionary, but no entry 
          for foo<adj><subst>:bar<n> (or foo<adj><subst>:baz<n>) then there will be errors

Testvoc

We probably need to work out a way to run the testvoc in a reasonable amount of time. Here are some suggestions:

  • Create sub-lexicons, which just run one category through the testvoquing process.
  • Treat clitics like 'mi', 'i' etc. separately, and not as attached. In Turkish this would reduce the size of those categories which can take these clitics by _at least_ 6 times.
  • Because of how the lexicons are laid out. We could try and do some kind of continuation-based testvoc.