Difference between revisions of "User:Ilnar.salimzyan/Coverage"

From Apertium
Jump to navigation Jump to search
(Created page with "While working on a machine translator for closely-related languages, we spend a lot of time adding new stems to the dictionary. Here are some thoughts I think would help to sp...")
 
(No difference)

Latest revision as of 12:26, 16 February 2015

While working on a machine translator for closely-related languages, we spend a lot of time adding new stems to the dictionary. Here are some thoughts I think would help to speed up that process and which I want to test out on the Tatar-Bashkir pair.

  • Given: a raw text in, a prototype morphological analyser for language X
  • run the text through the analyzer
  • for known words, disambiguate or correct the output of the morphological analyser leaving only one analysis you'd expect in that context
  • learn a function which assigns the lemma+tag(s) label to unknown words
  • assign the lemma+tag(s) label to unknown words using that function
  • manually correct the work of the function not looking/jumping over the words corrected at step 3