Ideas for Google Summer of Code/Plain-text formats for Apertium data

Tasks[edit]

This task would involve:

A preprocessor or compiler to avoid having to write structural transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very overt and clear, but clumsy and hard to write. Before Apertium, in interNOSTRUM.com we had a language for .t1x-style files called MorphTrans, which is described in the paper http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/3355/1843 . I (Mlforcada) believe this language is much easier to write; it should be upgraded and documented. The preprocessor would read .mt1, .mt2, and .mt3 files in MorphTrans-style format (with keywords in English) and generate the current XML. There would also be the opposite tool (much easier to write as an XSLT stylesheet) to generate MorphTrans-style code from current XML code. Morphtrans can of course be redesigned a bit, and, in fact, it should.
The same for .dix files. Two roundtrip converters to use the old interNOSTRUM-style format (http://www.sepln.org/revistaSEPLN/revista/25/25-Pag93.pdf), which is much easier to write.

Write a parser which converts a .mode shell-script fragment into a modes.xml file.

More or less related work:

The GCI task "Write a script to comment out entries in an XML file" which accepts input in lt-expand format, and comments out XML entries which match
dixtools:Enhance! which lets the user give a new word and an existing one to base it on
Emacs#dix-mode which turns lines like "a:b" into XML entries using the preceding <e> as a template (or just "a" to do an <i>)
Brendan Malloy's yaml format which converts into dix entries
Easy_dictionary_maintenance