Difference between revisions of "Speeding up monodix creation"
Jump to navigation
Jump to search
Line 20: | Line 20: | ||
another variation of this without parallel corpora might be to use extract and then use a bilingual dictionary (or |
another variation of this without parallel corpora might be to use extract and then use a bilingual dictionary (or |
||
even just wordlist) and comparable corpus to disambiguate the possibilities. |
even just wordlist) and comparable corpus to disambiguate the possibilities. |
||
-- e.g. you have a surface form in language X which can be either Noun or Verb. |
|||
You look up the surface form in language X in a dictionary X--Y ( you have an analyser + tagger for Y) |
|||
You disambiguate the right analysis for X based on the analysis in Y. |
|||
</pre> |
</pre> |
||
[[Category:Documentation]] |
[[Category:Documentation]] |
Revision as of 09:27, 11 April 2008
This page outlines some ideas for increasing the speed at which monolingual dictionaries (analysers) can be created.
Extract
Tag transfer
Try this at some point:
<spectie> you have an aligned corpus <spectie> polish--czech, czech--slovak, danish--swedish <spectie> and you have an analyser for polish, czech or danish <spectie> you want to make an analyser for swedish <spectie> you make templates from the paradigms in the danish analyser <spectie> tag the danish of the corpus <spectie> that you have <spectie> align it with the swedish side <spectie> then read off the alignments, taking the surface forms from the right side and the tags from the left side another variation of this without parallel corpora might be to use extract and then use a bilingual dictionary (or even just wordlist) and comparable corpus to disambiguate the possibilities. -- e.g. you have a surface form in language X which can be either Noun or Verb. You look up the surface form in language X in a dictionary X--Y ( you have an analyser + tagger for Y) You disambiguate the right analysis for X based on the analysis in Y.