Difference between revisions of "Hindi and Urdu"
Jump to navigation
Jump to search
Line 4: | Line 4: | ||
* Create bilingual dictionary for all words in the Urdu morphology (some can be extracted from Wiktionary, see the <code>dev/</code> directory in the incubator module) |
* Create bilingual dictionary for all words in the Urdu morphology (some can be extracted from Wiktionary, see the <code>dev/</code> directory in the incubator module) |
||
* Make sure tagsets are consistent between Humayoun, IIIT and Apertium (see [[List of symbols]]) |
* Make sure tagsets are consistent between Humayoun, IIIT and Apertium (see [[List of symbols]]) |
||
* Train part-of-speech taggers for both Urdu and Hindi. |
|||
* Finish conversion of IIIT Hindi analyser (see [[Hindi]]... Verbs still need to be converted, and other categories checked.) |
* Finish conversion of IIIT Hindi analyser (see [[Hindi]]... Verbs still need to be converted, and other categories checked.) |
||
* Write transfer rules, if any needed |
|||
* Retrain part-of-speech taggers with [[target-language tagger training]]. |
|||
* Run quality controls (see [[Quality control]]) |
* Run quality controls (see [[Quality control]]) |
||
Revision as of 16:08, 30 March 2010
Some pending tasks:
- Convert M. Humayoun's Urdu Morphology → lttoolbox (probably using full form list and speling tools)
- Create bilingual dictionary for all words in the Urdu morphology (some can be extracted from Wiktionary, see the
dev/
directory in the incubator module) - Make sure tagsets are consistent between Humayoun, IIIT and Apertium (see List of symbols)
- Train part-of-speech taggers for both Urdu and Hindi.
- Finish conversion of IIIT Hindi analyser (see Hindi... Verbs still need to be converted, and other categories checked.)
- Write transfer rules, if any needed
- Retrain part-of-speech taggers with target-language tagger training.
- Run quality controls (see Quality control)