Difference between revisions of "Hindi and Urdu"

From Apertium
Jump to navigation Jump to search
Line 4: Line 4:
 
* Create bilingual dictionary for all words in the Urdu morphology (some can be extracted from Wiktionary, see the <code>dev/</code> directory in the incubator module)
 
* Create bilingual dictionary for all words in the Urdu morphology (some can be extracted from Wiktionary, see the <code>dev/</code> directory in the incubator module)
 
* Make sure tagsets are consistent between Humayoun, IIIT and Apertium (see [[List of symbols]])
 
* Make sure tagsets are consistent between Humayoun, IIIT and Apertium (see [[List of symbols]])
  +
* Train part-of-speech taggers for both Urdu and Hindi.
 
* Finish conversion of IIIT Hindi analyser (see [[Hindi]]... Verbs still need to be converted, and other categories checked.)
 
* Finish conversion of IIIT Hindi analyser (see [[Hindi]]... Verbs still need to be converted, and other categories checked.)
  +
* Write transfer rules, if any needed
  +
* Retrain part-of-speech taggers with [[target-language tagger training]].
 
* Run quality controls (see [[Quality control]])
 
* Run quality controls (see [[Quality control]])
   

Revision as of 16:08, 30 March 2010

Some pending tasks:

  • Convert M. Humayoun's Urdu Morphology → lttoolbox (probably using full form list and speling tools)
  • Create bilingual dictionary for all words in the Urdu morphology (some can be extracted from Wiktionary, see the dev/ directory in the incubator module)
  • Make sure tagsets are consistent between Humayoun, IIIT and Apertium (see List of symbols)
  • Train part-of-speech taggers for both Urdu and Hindi.
  • Finish conversion of IIIT Hindi analyser (see Hindi... Verbs still need to be converted, and other categories checked.)
  • Write transfer rules, if any needed
  • Retrain part-of-speech taggers with target-language tagger training.
  • Run quality controls (see Quality control)

See also

External links