2009-04-17 :Testing and fixing of the tagger component (viterbi) complete. Now, the Trigram tagger is fully functional (It can be trained in a supervised and unsupervised manner, and do the PoS tagging using the generated prob file.). Next step is to measure accuracy of the tagger and detect hidden bugs.
2009-04-14 :Here's a link to a report on TL based tagger training for 2nd order HMMs. It just lists the equations relevant to the Trigram tagger. The first two equations are related to frequency counts. The rest of the equations deal with calculating the a priori likelihood of disambiguation paths.
Here's a slightly modified report on the prototype trigram tagger. I have updated section 7 (added subsection 7.2). It deals with segmentation of text in Baum Welch Algorithm. Please have a look.
2009-04-14 :The project report on the prototype trigram tagger can be found here: http://mnnit-lug.googlegroups.com/web/zaid_sheikh_prototype_trigram_tagger_project_report.pdf?gda=Pn_XBWsAAABn7UpRrGHIm2dacpnGDws1ICxZH_Oc90syCGVQbARulTvgRi4BVro-PW_KaOsFV6hpnuDZxId8HR9zYfEtASdSXJYkGebxovffTfckk-udLrQ7cWdlCSplJENpUd9BotY6gzhYSadRTH0PskjdO-jk&gsc=N1OhxAsAAAAZ8uI1w712Sb6qVn9uDhic
2009-04-11 : Both supervised and unsupervised training are now working in the Trigram Tagger prototype. However, I have not been able to test the Viterbi algorithm yet, because of some complications in reading the .prob file. (Also, by the way, the supervised training in es-ca was not working out of the box. Errors related to multiwords. I had to change the tagged text to make it work.)
2009-04-10 : Committed code for the prototype Apertium Trigram Tagger to svn. The code compiles but probably doesn't work. Also, uploaded the trigram tagger version of prob2txt tool in attt.
2009-04-04 : I integrated the Baum-Welch and the supervised methods implementation in apertium-tagger-training-tools into the Apertium tagger (mainly the SmoothUtils class for parameter smoothing). The code is in svn
2009-04-02 : I have submitted my GSoC 2009 proposal (Trigram Tagger) to the gsoc site.
irc nick: disismt
Zaid Md. Abdul Wahab Sheikh
Computer Science and Engineering
B.Tech 3rd year
NIT Allahabad (MNNIT)