User:David Nemeskey/GSOC progress 2013
< User:David Nemeskey
Jump to navigation
Jump to search
Revision as of 09:08, 29 May 2013 by David Nemeskey (talk | contribs)
Contents
Tasks
XML format
Compiler
Miscellaneous / Extra
Hungarian CG grammar
Write a simple CG grammar for Hungarian, somewhere around 50-150 rules.
- Read Pasi Tapnainen's The Constraint Grammar Parser CG-2.
- Read the contents of cg_material.zip.
- Study the CG grammar of an Apertium language.
- Write a Hungarian grammar that covers the sentences in this sample file
- The tags will be based on those in KR-code[1]. See the next task.
Hunmorph converter
Write a converter from ocamorph's output to Apertium's format.
- Again, use the sentences in this sample file as reference.
- While a C-based converter would definitely be possible, I opted for a foma-based (xfst -- lexc?) implementation, so that this task also serves for practice.
ATT -> lttoolbox compiler
Write an ATT FST format reading for lttoolbox. A useful practice for moving from foma to lttoolbox. Since lttoolbox lacks some of the functionaty needed, the compiler will most likely stay in foma, but lttoolbox might work as the runtime component.
- ATT format
- "<spectie> the ATT->lttoolbox thing should be a simple as : beer = t.insertSingleTransduction(alphabet(L'e',L'e'), beer);"
References
- ↑ András Kornai, Péter Rebrus, Péter Vajda, Péter Halácsy, András Rung, Viktor Trón. 2004. Általános célú morfológiai elemző kimeneti formalizmusa (The output formalism of a general-purpose morphological analyzer). In: Proceedings of the 2nd Hungarian Computational Linguistics Conference.