Talk:Hindi and Urdu

From Apertium
Revision as of 00:49, 26 March 2011 by Francis Tyers (talk | contribs) (Created page with ' ==Message #1== <pre> Hi, you need to convert the IIIT morphological analyser to be compatible with Apertium. Thus: 1) The encoding needs to be changed from WX -> UTF-8 2) Th…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Message #1

Hi, 

you need to convert the IIIT morphological analyser to be compatible with Apertium. Thus:

1) The encoding needs to be changed from WX -> UTF-8
2) The tagset needs to be standardised along Apertium lines

These two tasks are non-negotiable and should be completed as part of your application.

You can find the incomplete language pair in SVN:

https://apertium.svn.sourceforge.net/svnroot/apertium/nursery/apertium-ur-hi

The analyser is partially converted (by me) here:

https://apertium.svn.sourceforge.net/svnroot/apertium/nursery/apertium-ur-hi/apertium-ur-hi.hi.dix

Analyse Urdu:

$ echo "عامل کی بیٹی" | lt-proc ur-hi.automorf.bin 
^عامل/عامل<np><ant><m><sg><nom>$ ^کی/کا<post><f><sg><nom>$ ^بیٹی/بیٹی<n><f><sg><nom>/بیٹی<n><f><sg><obl>/بیٹی<n><f><sg><voc>$

Analyse Hindi:

$ echo "आमिल की बेटी" | lt-proc hi-ur.automorf.bin
^आमिल/आमिल<np><ant><m><sg><nom>$ ^की/का<post><f><sg><nom>/का<post><f><sg><obl>/का<post><f><pl><nom>/का<post><f><pl><obl>$ ^बेटी/बेटी<n><f><sg><nom>/बेटी<n><f><sg><obl>$

Tag Urdu:

$ echo "عامل کی بیٹی" | lt-proc ur-hi.automorf.bin | apertium-tagger -g ur-hi.prob 
^عامل<np><ant><m><sg><nom>$ ^کا<post><f><sg><nom>$ ^بیٹی<n><f><sg><nom>$

Transfer Urdu->Hindi

$ echo "عامل کی بیٹی" | lt-proc ur-hi.automorf.bin | apertium-tagger -g ur-hi.prob  | apertium-transfer apertium-ur-hi.ur-hi.t1x ur-hi.t1x.bin ur-hi.autobil.bin
^आमिल<np><ant><m><sg><nom>$ ^का<post><f><sg><nom>$ ^बेटी<n><f><sg><nom>$

Generate Hindi:

$ echo "عامل کی بیٹی" | lt-proc ur-hi.automorf.bin | apertium-tagger -g ur-hi.prob  | apertium-transfer apertium-ur-hi.ur-hi.t1x ur-hi.t1x.bin ur-hi.autobil.bin | lt-proc -g ur-hi.autogen.bin
आमिल की बेटी

Best regards,

Fran