Talk:Hindi and Urdu
Revision as of 00:49, 26 March 2011 by Francis Tyers (talk | contribs) (Created page with ' ==Message #1== <pre> Hi, you need to convert the IIIT morphological analyser to be compatible with Apertium. Thus: 1) The encoding needs to be changed from WX -> UTF-8 2) Th…')
Message #1
Hi, you need to convert the IIIT morphological analyser to be compatible with Apertium. Thus: 1) The encoding needs to be changed from WX -> UTF-8 2) The tagset needs to be standardised along Apertium lines These two tasks are non-negotiable and should be completed as part of your application. You can find the incomplete language pair in SVN: https://apertium.svn.sourceforge.net/svnroot/apertium/nursery/apertium-ur-hi The analyser is partially converted (by me) here: https://apertium.svn.sourceforge.net/svnroot/apertium/nursery/apertium-ur-hi/apertium-ur-hi.hi.dix Analyse Urdu: $ echo "عامل کی بیٹی" | lt-proc ur-hi.automorf.bin ^عامل/عامل<np><ant><m><sg><nom>$ ^کی/کا<post><f><sg><nom>$ ^بیٹی/بیٹی<n><f><sg><nom>/بیٹی<n><f><sg><obl>/بیٹی<n><f><sg><voc>$ Analyse Hindi: $ echo "आमिल की बेटी" | lt-proc hi-ur.automorf.bin ^आमिल/आमिल<np><ant><m><sg><nom>$ ^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ ^बेटी/बेटी<n><f><sg><nom>/बेटी<n><f><sg><obl>$ Tag Urdu: $ echo "عامل کی بیٹی" | lt-proc ur-hi.automorf.bin | apertium-tagger -g ur-hi.prob ^عامل<np><ant><m><sg><nom>$ ^کا<post><f><sg><nom>$ ^بیٹی<n><f><sg><nom>$ Transfer Urdu->Hindi $ echo "عامل کی بیٹی" | lt-proc ur-hi.automorf.bin | apertium-tagger -g ur-hi.prob | apertium-transfer apertium-ur-hi.ur-hi.t1x ur-hi.t1x.bin ur-hi.autobil.bin ^आमिल<np><ant><m><sg><nom>$ ^का<post><f><sg><nom>$ ^बेटी<n><f><sg><nom>$ Generate Hindi: $ echo "عامل کی بیٹی" | lt-proc ur-hi.automorf.bin | apertium-tagger -g ur-hi.prob | apertium-transfer apertium-ur-hi.ur-hi.t1x ur-hi.t1x.bin ur-hi.autobil.bin | lt-proc -g ur-hi.autogen.bin आमिल की बेटी Best regards, Fran