Automated extraction of lexical resources

From Apertium
Revision as of 23:41, 31 March 2009 by Jorjao81 (talk | contribs) (New page: (Thanks for spectie and jimregan for the input) Some ideias for (semi-)automatically extracting lexical resources from corpora. Things we want to extract: # Morphological analysers # Co...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

(Thanks for spectie and jimregan for the input)

Some ideias for (semi-)automatically extracting lexical resources from corpora.

Things we want to extract:

  1. Morphological analysers
  2. Constraint rules (sensible ones)
  3. Bilingual dictionaries
  4. Transfer rules


== Morpholical resource extraction

First, i should state that our main aim will be to extract information about the open categories, and not the closed. While it would be interesting to try and learn everything from scratch, it would probably be counter-productive, if at all possible.

So, we leave stuff like prepositions, pronouns, irregular (very frequent) verbs like to be to be manually constructed, which should be doable. Our focus shall instead be on less frequent, but regular and much more numerous verbs, nouns, adjectives, etc.