Difference between revisions of "User:Francis Tyers/An MT system in one thousand steps"
Jump to navigation
Jump to search
Line 10: | Line 10: | ||
** Find a parallel corpus of language X and language Y |
** Find a parallel corpus of language X and language Y |
||
− | ==Morphological analysers== |
+ | ==Morphological analysers (200 tasks)== |
+ | |||
+ | For languages X and Y: |
||
* Add closed categories |
* Add closed categories |
||
Line 19: | Line 21: | ||
** Add numerals (1 task) |
** Add numerals (1 task) |
||
*** At least 1-100 leaving out compositional numerals |
*** At least 1-100 leaving out compositional numerals |
||
− | * Create frequency lists from your corpora |
+ | * Create frequency lists from your corpora |
+ | ** Categorise words (15 tasks) |
||
* Add open categories |
* Add open categories |
||
− | ** Add nouns |
+ | ** Add nouns (26 tasks) |
− | ** Add proper nouns |
+ | ** Add proper nouns (16 tasks) |
− | ** Add adjectives |
+ | ** Add adjectives (15 tasks) |
− | ** Add adverbs |
+ | ** Add adverbs (3 tasks) |
− | ** Add verbs |
+ | ** Add verbs (20 tasks) |
==Bilingual dictionary== |
==Bilingual dictionary== |
Revision as of 14:44, 30 October 2013
Research
- Amass resources (1 task)
- Find a grammar of language X and of language Y
- Find a bilingual dictionary X-Y
- Find bilingual dictionaries X-Z and Y-Z
- Find 1-3 large monolingual corpora of language X and language Y
- Find a parallel corpus of language X and language Y
Morphological analysers (200 tasks)
For languages X and Y:
- Add closed categories
- Add adpositions (1 task)
- Add conjunctions (1 task)
- Add determiners (1 task)
- Add pronouns (1 task)
- Add numerals (1 task)
- At least 1-100 leaving out compositional numerals
- Create frequency lists from your corpora
- Categorise words (15 tasks)
- Add open categories
- Add nouns (26 tasks)
- Add proper nouns (16 tasks)
- Add adjectives (15 tasks)
- Add adverbs (3 tasks)
- Add verbs (20 tasks)
Bilingual dictionary
- Morphologically analyse and word align parallel corpus
- Extract bilingual dictionary candidates
- Proofread and add candidates by frequency
- Find freely available dictionaries online
- Convert to lttoolbox format
Lexical selection
- POS tag and word align parallel corpus
- Extract default translation rules
Disambiguation
- Make a list of most frequent ambiguities
Transfer rules
- Write a contrastive grammar