Difference between revisions of "Hindi and Urdu"

Revision as of 16:08, 30 March 2010

Some pending tasks:

Convert M. Humayoun's Urdu Morphology → lttoolbox (probably using full form list and speling tools)
Create bilingual dictionary for all words in the Urdu morphology (some can be extracted from Wiktionary, see the dev/ directory in the incubator module)
Make sure tagsets are consistent between Humayoun, IIIT and Apertium (see List of symbols)
Train part-of-speech taggers for both Urdu and Hindi.
Finish conversion of IIIT Hindi analyser (see Hindi... Verbs still need to be converted, and other categories checked.)
Write transfer rules, if any needed
Retrain part-of-speech taggers with target-language tagger training.
Run quality controls (see Quality control)

@@ Line 4: / Line 4: @@
 * Create bilingual dictionary for all words in the Urdu morphology (some can be extracted from Wiktionary, see the <code>dev/</code> directory in the incubator module)
 * Make sure tagsets are consistent between Humayoun, IIIT and Apertium (see [[List of symbols]])
+* Train part-of-speech taggers for both Urdu and Hindi.
 * Finish conversion of IIIT Hindi analyser (see [[Hindi]]... Verbs still need to be converted, and other categories checked.)
+* Write transfer rules, if any needed
+* Retrain part-of-speech taggers with [[target-language tagger training]].
 * Run quality controls (see [[Quality control]])