Difference between revisions of "Ideas for Google Summer of Code/Plain-text formats for Apertium data"

Revision as of 10:34, 15 March 2013

Tasks

This task would involve:

A preprocessor or compiler to avoid having to write structural transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very overt and clear, but clumsy and hard to write. Before Apertium, in interNOSTRUM.com we had a language for .t1x-style files called MorphTrans, which is described in the paper http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/3355/1843 . I (Mlforcada) believe this language is much easier to write; it should be upgraded and documented. The preprocessor would read .mt1, .mt2, and .mt3 files in MorphTrans-style format (with keywords in English) and generate the current XML. There would also be the opposite tool (much easier to write as an XSLT stylesheet) to generate MorphTrans-style code from current XML code. Morphtrans can of course be redesigned a bit, and, in fact, it should.
The same for .dix files. Two roundtrip converters to use the old interNOSTRUM-style format (http://www.sepln.org/revistaSEPLN/revista/25/25-Pag93.pdf), which is much easier to write.

Coding challenge

Write a parser which converts a .mode shell-script fragment into a modes.xml file.

@@ Line 1: / Line 1: @@
 {{TOCD}}
 ==Tasks==
@@ Line 6: / Line 5: @@
 This task would involve:
-# A preprocessor or compiler to avoid having to write structural transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very overt and clear, but clumsy and hard to write. Before Apertium, in interNOSTRUM.com we had a language for .t1x-style files called MorphTrans, which is described in the paper http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/3355/1843 . I believe this language is much easier to write; it should be upgraded and documented. The preprocessor would read .mt1, .mt2, and .mt3 files in MorphTrans-style format (with keywords in English) and generate the current XML. There would also be the opposite tool (much easier to write as an XSLT stylesheet) to generate MorphTrans-style code from current XML code. Morphtrans can of course be redesigned a bit, and, in fact, it should.
+# A preprocessor or compiler to avoid having to write structural transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very overt and clear, but clumsy and hard to write. Before Apertium, in interNOSTRUM.com we had a language for .t1x-style files called MorphTrans, which is described in the paper http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/3355/1843 . I ([[User:Mlforcada|Mlforcada]]) believe this language is much easier to write; it should be upgraded and documented. The preprocessor would read .mt1, .mt2, and .mt3 files in MorphTrans-style format (with keywords in English) and generate the current XML. There would also be the opposite tool (much easier to write as an XSLT stylesheet) to generate MorphTrans-style code from current XML code. Morphtrans can of course be redesigned a bit, and, in fact, it should.
 # The same for .dix files. Two roundtrip converters to use the old interNOSTRUM-style format (http://www.sepln.org/revistaSEPLN/revista/25/25-Pag93.pdf), which is much easier to write.
 ==Coding challenge==
+* Write a parser which converts a <code>.mode</code> shell-script fragment into a <code>modes.xml</code> file.
 ==Frequently asked questions==

Difference between revisions of "Ideas for Google Summer of Code/Plain-text formats for Apertium data"

Revision as of 10:34, 15 March 2013

Contents

Tasks

Coding challenge

Frequently asked questions

See also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools