Difference between revisions of "Icelandic and English"
Jump to navigation
Jump to search
Line 4: | Line 4: | ||
* Tag a corpus with IceTagger and train the <code>apertium-tagger</code> |
* Tag a corpus with IceTagger and train the <code>apertium-tagger</code> |
||
* Post-edit automatically-generated bilingual dictionaries |
* Post-edit automatically-generated bilingual dictionaries |
||
* Use IceParser to parse a corpus and extract the most frequent phrase patterns. |
* Use IceParser to parse a corpus and extract the most frequent patterns in terms of chunks/phrases (lists of coarse POS tags) and phrase patterns (in terms of chunks/phrases). |
||
* Merge analysed corpus (IceMorphy full-form list) with Apertium dictionary — will require matching partial information to paradigms... perhaps use [[extract]] ? |
* Merge analysed corpus (IceMorphy full-form list) with Apertium dictionary — will require matching partial information to paradigms... perhaps use [[extract]] ? |
||
Revision as of 00:59, 8 February 2009
Pending tasks
- Tag a corpus with IceTagger and train the
apertium-tagger
- Post-edit automatically-generated bilingual dictionaries
- Use IceParser to parse a corpus and extract the most frequent patterns in terms of chunks/phrases (lists of coarse POS tags) and phrase patterns (in terms of chunks/phrases).
- Merge analysed corpus (IceMorphy full-form list) with Apertium dictionary — will require matching partial information to paradigms... perhaps use extract ?
Resources
Corpora
- Mediawiki l10n, KDE4, OpenSubtitles, etc. — from OPUS (~60k sentences)
Bilingual dictionaries
- Wikipedia interwiki (~1,100 entries)
- Freelang (~1,000 entries)
- Wiktionary (en) (~3,200 entries)
- An Icelandic-English Dictionary (Old Icelandic, 1876 — Public Domain)
- And here
- Wordbank at ismal.hi.is (licence unknown)