Difference between revisions of "User:Mlforcada/Sandbox/basque"
< User:Mlforcada | Sandbox
Jump to navigation
Jump to search
Line 6: | Line 6: | ||
Lexical coverage may be improved in different ways: |
Lexical coverage may be improved in different ways: |
||
=== Regular vocabulary === |
|||
* Collect large corpora of basque news text and search for unknown words (as has been done for version 0.3) |
|||
* Using possible new vocabulary from the new version of Matxin |
|||
* Using existing vocabulary (esp. multiword lexical units or MWLUs) in current dictionaries of apertium-eu-es, especially tagging and activating untagged MWLUs. |
|||
=== Proper names === |
=== Proper names === |
||
Line 11: | Line 19: | ||
* Including massive lists of proper names (place names "gazeteer", person names, etc.). |
* Including massive lists of proper names (place names "gazeteer", person names, etc.). |
||
* Using some kind of guesser for proper names so that we don't have to include them in the |
* Using some kind of guesser for proper names so that we don't have to include them in the dictionary. |
||
== Structural transfer === |
|||
===Verb chunks=== |
|||
We need to have paradigms for the potential ("ezan") and other verb structures. Perhaps we can use information in Matxin for this and other analytical verb forms. |
|||
=== Noun phrases and prepositional phrases === |
|||
==== Naming conventions ==== |
Revision as of 09:03, 19 November 2008
Contents
How to improve Apertium-eu-es 0.3
These are some notes on how to improve apertium-eu-es 0.3 so that its performance improves for assimilation purposes and its maintenance is easier for future developers.
Lexical coverage
Lexical coverage may be improved in different ways:
Regular vocabulary
- Collect large corpora of basque news text and search for unknown words (as has been done for version 0.3)
- Using possible new vocabulary from the new version of Matxin
- Using existing vocabulary (esp. multiword lexical units or MWLUs) in current dictionaries of apertium-eu-es, especially tagging and activating untagged MWLUs.
Proper names
- Including massive lists of proper names (place names "gazeteer", person names, etc.).
- Using some kind of guesser for proper names so that we don't have to include them in the dictionary.
Structural transfer =
Verb chunks
We need to have paradigms for the potential ("ezan") and other verb structures. Perhaps we can use information in Matxin for this and other analytical verb forms.