User:Ilnar.salimzyan/Coverage

While working on a machine translator for closely-related languages, we spend a lot of time adding new stems to the dictionary. Here are some thoughts I think would help to speed up that process and which I want to test out on the Tatar-Bashkir pair.

Given: a raw text in, a prototype morphological analyser for language X
run the text through the analyzer
for known words, disambiguate or correct the output of the morphological analyser leaving only one analysis you'd expect in that context
learn a function which assigns the lemma+tag(s) label to unknown words
assign the lemma+tag(s) label to unknown words using that function
manually correct the work of the function not looking/jumping over the words corrected at step 3

User:Ilnar.salimzyan/Coverage

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools