Difference between revisions of "User:Francis Tyers/An MT system in one thousand steps"

From Apertium
Jump to navigation Jump to search
Line 3: Line 3:
 
==Research==
 
==Research==
   
  +
* Amass resources (1 task)
* Find a grammar of language X and of language Y
 
* Find a bilingual dictionary X-Y
+
** Find a grammar of language X and of language Y
* Find bilingual dictionaries X-Z and Y-Z
+
** Find a bilingual dictionary X-Y
* Find 1-3 large monolingual corpora of language X and language Y
+
** Find bilingual dictionaries X-Z and Y-Z
* Find a parallel corpus of language X and language Y
+
** Find 1-3 large monolingual corpora of language X and language Y
 
** Find a parallel corpus of language X and language Y
   
 
==Morphological analysers==
 
==Morphological analysers==

Revision as of 13:24, 30 October 2013

Research

  • Amass resources (1 task)
    • Find a grammar of language X and of language Y
    • Find a bilingual dictionary X-Y
    • Find bilingual dictionaries X-Z and Y-Z
    • Find 1-3 large monolingual corpora of language X and language Y
    • Find a parallel corpus of language X and language Y

Morphological analysers

  • Add closed categories
    • Add adpositions (1 task)
    • Add conjunctions (1 task)
    • Add determiners (1 task)
    • Add pronouns (1 task)
    • Add numerals (1 task)
      • At least 1-100 leaving out compositional numerals
  • Create frequency lists from your corpora
  • Add open categories
    • Add nouns
    • Add proper nouns
    • Add adjectives
    • Add adverbs
    • Add verbs

Bilingual dictionary

  • Morphologically analyse and word align parallel corpus
    • Extract bilingual dictionary candidates
    • Proofread and add candidates by frequency
  • Find freely available dictionaries online
    • Convert to lttoolbox format

Lexical selection

  • POS tag and word align parallel corpus
    • Extract default translation rules


Disambiguation

  • Make a list of most frequent ambiguities

Transfer rules

  • Write a contrastive grammar

Evaluation