User:Mlforcada/Sandbox/basque
< User:Mlforcada | Sandbox
Jump to navigation
Jump to search
Contents
How to improve Apertium-eu-es 0.3
These are some notes on how to improve apertium-eu-es 0.3 so that its performance improves for assimilation purposes and its maintenance is easier for future developers.
Lexical coverage
Lexical coverage may be improved in different ways:
Regular vocabulary
- Collect large corpora of basque news text and search for unknown words (as has been done for version 0.3)
- Using possible new vocabulary from the new version of Matxin
- Using existing vocabulary (esp. multiword lexical units or MWLUs) in current dictionaries of apertium-eu-es, especially tagging and activating untagged MWLUs.
Proper names
- Including massive lists of proper names (place names "gazeteer", person names, etc.).
- Using some kind of guesser for proper names so that we don't have to include them in the dictionary.
Structural transfer =
Verb chunks
We need to have paradigms for the potential ("ezan") and other verb structures. Perhaps we can use information in Matxin for this and other analytical verb forms.