Difference between revisions of "Ideas for Google Summer of Code/Plain-text formats for Apertium data"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
 
   
 
==Tasks==
 
==Tasks==
Line 6: Line 5:
 
This task would involve:
 
This task would involve:
   
# A preprocessor or compiler to avoid having to write structural transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very overt and clear, but clumsy and hard to write. Before Apertium, in interNOSTRUM.com we had a language for .t1x-style files called MorphTrans, which is described in the paper http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/3355/1843 . I believe this language is much easier to write; it should be upgraded and documented. The preprocessor would read .mt1, .mt2, and .mt3 files in MorphTrans-style format (with keywords in English) and generate the current XML. There would also be the opposite tool (much easier to write as an XSLT stylesheet) to generate MorphTrans-style code from current XML code. Morphtrans can of course be redesigned a bit, and, in fact, it should.
+
# A preprocessor or compiler to avoid having to write structural transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very overt and clear, but clumsy and hard to write. Before Apertium, in interNOSTRUM.com we had a language for .t1x-style files called MorphTrans, which is described in the paper http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/3355/1843 . I ([[User:Mlforcada|Mlforcada]]) believe this language is much easier to write; it should be upgraded and documented. The preprocessor would read .mt1, .mt2, and .mt3 files in MorphTrans-style format (with keywords in English) and generate the current XML. There would also be the opposite tool (much easier to write as an XSLT stylesheet) to generate MorphTrans-style code from current XML code. Morphtrans can of course be redesigned a bit, and, in fact, it should.
 
# The same for .dix files. Two roundtrip converters to use the old interNOSTRUM-style format (http://www.sepln.org/revistaSEPLN/revista/25/25-Pag93.pdf), which is much easier to write.
 
# The same for .dix files. Two roundtrip converters to use the old interNOSTRUM-style format (http://www.sepln.org/revistaSEPLN/revista/25/25-Pag93.pdf), which is much easier to write.
   
 
==Coding challenge==
 
==Coding challenge==
  +
  +
* Write a parser which converts a <code>.mode</code> shell-script fragment into a <code>modes.xml</code> file.
   
 
==Frequently asked questions==
 
==Frequently asked questions==

Revision as of 10:34, 15 March 2013

Tasks

This task would involve:

  1. A preprocessor or compiler to avoid having to write structural transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very overt and clear, but clumsy and hard to write. Before Apertium, in interNOSTRUM.com we had a language for .t1x-style files called MorphTrans, which is described in the paper http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/3355/1843 . I (Mlforcada) believe this language is much easier to write; it should be upgraded and documented. The preprocessor would read .mt1, .mt2, and .mt3 files in MorphTrans-style format (with keywords in English) and generate the current XML. There would also be the opposite tool (much easier to write as an XSLT stylesheet) to generate MorphTrans-style code from current XML code. Morphtrans can of course be redesigned a bit, and, in fact, it should.
  2. The same for .dix files. Two roundtrip converters to use the old interNOSTRUM-style format (http://www.sepln.org/revistaSEPLN/revista/25/25-Pag93.pdf), which is much easier to write.

Coding challenge

  • Write a parser which converts a .mode shell-script fragment into a modes.xml file.

Frequently asked questions

See also