Difference between revisions of "Icelandic and English"

From Apertium
Jump to navigation Jump to search
Line 4: Line 4:
 
* Tag a corpus with IceTagger and train the <code>apertium-tagger</code>
 
* Tag a corpus with IceTagger and train the <code>apertium-tagger</code>
 
* Post-edit automatically-generated bilingual dictionaries
 
* Post-edit automatically-generated bilingual dictionaries
* Use IceParser to parse a corpus and extract the most frequent phrase patterns.
+
* Use IceParser to parse a corpus and extract the most frequent patterns in terms of chunks/phrases (lists of coarse POS tags) and phrase patterns (in terms of chunks/phrases).
 
* Merge analysed corpus (IceMorphy full-form list) with Apertium dictionary &mdash; will require matching partial information to paradigms... perhaps use [[extract]] ?
 
* Merge analysed corpus (IceMorphy full-form list) with Apertium dictionary &mdash; will require matching partial information to paradigms... perhaps use [[extract]] ?
   

Revision as of 00:59, 8 February 2009

Pending tasks

  • Tag a corpus with IceTagger and train the apertium-tagger
  • Post-edit automatically-generated bilingual dictionaries
  • Use IceParser to parse a corpus and extract the most frequent patterns in terms of chunks/phrases (lists of coarse POS tags) and phrase patterns (in terms of chunks/phrases).
  • Merge analysed corpus (IceMorphy full-form list) with Apertium dictionary — will require matching partial information to paradigms... perhaps use extract ?

Resources

Corpora

  • Mediawiki l10n, KDE4, OpenSubtitles, etc. — from OPUS (~60k sentences)

Bilingual dictionaries

See also