Difference between revisions of "Talk:Turkic-Turkic translator"

Revision as of 12:19, 23 April 2012

Lexicon trimming

<spectre> also, some 'corner cases' for the lexicon scraper
<spectre> when you have stems in a continuation lexicon, e.g. demonstratives in kazakh, personal pronouns in other places
<spectre> they will all get included disregarding the bilingual dictionary
<spectre> another example: when one language has a word that another doesn't have (e.g. a case that turns into a postposition, 
          but the postposition doesn't have an equivalent in the other language, e.g. is inserted by transfer
<spectre> (same goes for "particles")
<spectre> for verbs (this one can be fixed by having a correspondence between tags/continuation lexica) 
          e.g. when you have a verb which is both tv/iv but only one in the bidix
<spectre> another example: when you have an entry like foo<adj>:bar<n><attr> in the bilingual dictionary, but no entry 
          for foo<adj><subst>:bar<n> (or foo<adj><subst>:baz<n>) then there will be errors

Testvoc

We probably need to work out a way to run the testvoc in a reasonable amount of time. Here are some suggestions:

Create sub-lexicons, which just run one category through the testvoquing process.
Treat clitics like 'mi', 'i' etc. separately, and not as attached. In Turkish this would reduce the size of those categories which can take these clitics by _at least_ 6 times.
Because of how the lexicons are laid out. We could try and do some kind of continuation-based testvoc.

@@ Line 13: / Line 13: @@
           for foo<adj><subst>:bar<n> (or foo<adj><subst>:baz<n>) then there will be errors
 </pre>
+==Testvoc==
+We probably need to work out a way to run the testvoc in a reasonable amount of time.  Here are some suggestions:
+* Create sub-lexicons, which just run one category through the testvoquing process.
+* Treat clitics like 'mi', 'i' etc. separately, and not as attached. In Turkish this would reduce the size of those categories which can take these clitics by _at least_ 6 times.
+* Because of how the lexicons are laid out. We could try and do some kind of continuation-based testvoc.

Difference between revisions of "Talk:Turkic-Turkic translator"

Revision as of 12:19, 23 April 2012

Lexicon trimming

Testvoc

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools