Automated extraction of lexical resources
Jump to navigation
Jump to search
(Thanks for spectie and jimregan for the input)
Some ideias for (semi-)automatically extracting lexical resources from corpora.
Things we want to extract:
- Morphological analysers
- Constraint rules (sensible ones)
- Bilingual dictionaries
- Transfer rules
== Morpholical resource extraction
First, i should state that our main aim will be to extract information about the open categories, and not the closed. While it would be interesting to try and learn everything from scratch, it would probably be counter-productive, if at all possible.
So, we leave stuff like prepositions, pronouns, irregular (very frequent) verbs like to be to be manually constructed, which should be doable. Our focus shall instead be on less frequent, but regular and much more numerous verbs, nouns, adjectives, etc.