User:David Nemeskey/GSOC progress 2013
< User:David Nemeskey
Jump to navigation
Jump to search
Revision as of 08:33, 29 May 2013 by David Nemeskey (talk | contribs)
Contents
Tasks
XML format
Compiler
Miscellaneous / Extra
Hungarian CG grammar
Write a simple CG grammar for Hungarian, somewhere around 50-150 rules.
- Read Pasi Tapnainen's The Constraint Grammar Parser CG-2.
- Read the contents of cg_material.zip.
- Study the CG grammar of an Apertium language.
- Write a Hungarian grammar that covers the sentences in this sample file
- The tags will be based on those in KR-code[1]. See the next task.
Hunmorph converter
Write a converter from ocamorph's output to Apertium's format.
- Again, use the sentences in this sample file as reference.
- While a C-based converter would definitely be possible, I opted for a foma-based (xfst -- lexc?) implementation, so that this task also serves for practice.
ATT -> lttoolbox compiler
Write an ATT FST format reading for lttoolbox.
References
- ↑ András Kornai, Péter Rebrus, Péter Vajda, Péter Halácsy, András Rung, Viktor Trón. 2004. Általános célú morfológiai elemző kimeneti formalizmusa (The output formalism of a general-purpose morphological analyzer). In: Proceedings of the 2nd Hungarian Computational Linguistics Conference.