Difference between revisions of "User:David Nemeskey/GSOC progress 2013"

From Apertium
Jump to navigation Jump to search
Line 15: Line 15:
 
* Study the CG grammar of an Apertium language.
 
* Study the CG grammar of an Apertium language.
 
* Write a Hungarian grammar that covers the sentences in [https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-hun-eng/texts/rasskaz.hun.txt this sample file]
 
* Write a Hungarian grammar that covers the sentences in [https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-hun-eng/texts/rasskaz.hun.txt this sample file]
* The tags will be based on those in KR-code<ref>András Kornai, Péter Rebrus, Péter Vajda, Péter Halácsy, András Rung, Viktor Trón. 2004. Általános célú morfológiai elemző kimeneti formalizmusa (The output formalism of a general-purpose morphological analyzer). In: Proceedings of the 2nd Hungarian Computational Linguistics Conference
+
* The tags will be based on those in KR-code<ref>András Kornai, Péter Rebrus, Péter Vajda, Péter Halácsy, András Rung, Viktor Trón. 2004. Általános célú morfológiai elemző kimeneti formalizmusa (The output formalism of a general-purpose morphological analyzer). In: Proceedings of the 2nd Hungarian Computational Linguistics Conference.</ref>. See the [[#Hunmorph converter|next task]].
  +
.</ref>. See the next task.
 
  +
==== Hunmorph converter ====
  +
  +
Write a converter from ocamorph's output to Apertium's format.
  +
  +
* Again, use the sentences in [https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-hun-eng/texts/rasskaz.hun.txt this sample file] as reference.
  +
* While a C-based converter would definitely be possible, I opted for a foma-based (xfst -- lexc?) implementation, so that this task also serves for practice.
  +
  +
==== ATT -> lttoolbox compiler ====
  +
  +
Write an ATT FST format reading for lttoolbox.
  +
  +
== References ==
  +
  +
<references/>

Revision as of 08:33, 29 May 2013

Tasks

XML format

Compiler

Miscellaneous / Extra

Hungarian CG grammar

Write a simple CG grammar for Hungarian, somewhere around 50-150 rules.

  • Read Pasi Tapnainen's The Constraint Grammar Parser CG-2.
  • Read the contents of cg_material.zip.
  • Study the CG grammar of an Apertium language.
  • Write a Hungarian grammar that covers the sentences in this sample file
  • The tags will be based on those in KR-code[1]. See the next task.

Hunmorph converter

Write a converter from ocamorph's output to Apertium's format.

  • Again, use the sentences in this sample file as reference.
  • While a C-based converter would definitely be possible, I opted for a foma-based (xfst -- lexc?) implementation, so that this task also serves for practice.

ATT -> lttoolbox compiler

Write an ATT FST format reading for lttoolbox.

References

  1. András Kornai, Péter Rebrus, Péter Vajda, Péter Halácsy, András Rung, Viktor Trón. 2004. Általános célú morfológiai elemző kimeneti formalizmusa (The output formalism of a general-purpose morphological analyzer). In: Proceedings of the 2nd Hungarian Computational Linguistics Conference.