Difference between revisions of "User:Francis Tyers/An MT system in one thousand steps"

From Apertium
Jump to navigation Jump to search
Line 6: Line 6:
 
* Find a bilingual dictionary X-Y
 
* Find a bilingual dictionary X-Y
 
* Find bilingual dictionaries X-Z and Y-Z
 
* Find bilingual dictionaries X-Z and Y-Z
  +
* Find 1-3 large monolingual corpora of language X and language Y
  +
* Find a parallel corpus of language X and language Y
   
 
==Morphological analysers==
 
==Morphological analysers==
Line 16: Line 18:
 
** Add numerals (1 task)
 
** Add numerals (1 task)
 
*** At least 1-100 leaving out compositional numerals
 
*** At least 1-100 leaving out compositional numerals
  +
* Create frequency lists from your corpora
 
* Add open categories
 
* Add open categories
 
** Add nouns
 
** Add nouns

Revision as of 13:23, 30 October 2013

Research

  • Find a grammar of language X and of language Y
  • Find a bilingual dictionary X-Y
  • Find bilingual dictionaries X-Z and Y-Z
  • Find 1-3 large monolingual corpora of language X and language Y
  • Find a parallel corpus of language X and language Y

Morphological analysers

  • Add closed categories
    • Add adpositions (1 task)
    • Add conjunctions (1 task)
    • Add determiners (1 task)
    • Add pronouns (1 task)
    • Add numerals (1 task)
      • At least 1-100 leaving out compositional numerals
  • Create frequency lists from your corpora
  • Add open categories
    • Add nouns
    • Add proper nouns
    • Add adjectives
    • Add adverbs
    • Add verbs

Bilingual dictionary

  • Morphologically analyse and word align parallel corpus
    • Extract bilingual dictionary candidates
    • Proofread and add candidates by frequency
  • Find freely available dictionaries online
    • Convert to lttoolbox format

Lexical selection

  • POS tag and word align parallel corpus
    • Extract default translation rules


Disambiguation

  • Make a list of most frequent ambiguities

Transfer rules

  • Write a contrastive grammar

Evaluation