User:Francis Tyers/An MT system in one thousand steps

From Apertium
Jump to navigation Jump to search

Research

  • Find a grammar of language X and of language Y
  • Find a bilingual dictionary X-Y
  • Find bilingual dictionaries X-Z and Y-Z
  • Find 1-3 large monolingual corpora of language X and language Y
  • Find a parallel corpus of language X and language Y

Morphological analysers

  • Add closed categories
    • Add adpositions (1 task)
    • Add conjunctions (1 task)
    • Add determiners (1 task)
    • Add pronouns (1 task)
    • Add numerals (1 task)
      • At least 1-100 leaving out compositional numerals
  • Create frequency lists from your corpora
  • Add open categories
    • Add nouns
    • Add proper nouns
    • Add adjectives
    • Add adverbs
    • Add verbs

Bilingual dictionary

  • Morphologically analyse and word align parallel corpus
    • Extract bilingual dictionary candidates
    • Proofread and add candidates by frequency
  • Find freely available dictionaries online
    • Convert to lttoolbox format

Lexical selection

  • POS tag and word align parallel corpus
    • Extract default translation rules


Disambiguation

  • Make a list of most frequent ambiguities

Transfer rules

  • Write a contrastive grammar

Evaluation