Ideas for Google Summer of Code/Plain-text formats for Apertium data

From Apertium
< Ideas for Google Summer of Code
Revision as of 09:39, 13 March 2013 by Francis Tyers (talk | contribs) (Created page with '{{TOCD}} This task would involve: # A preprocessor or compiler to avoid having to write structural transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very overt and…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This task would involve:

  1. A preprocessor or compiler to avoid having to write structural transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very overt and clear, but clumsy and hard to write. Before Apertium, in interNOSTRUM.com we had a language for .t1x-style files called MorphTrans, which is described in the paper http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/3355/1843 . I believe this language is much easier to write; it should be upgraded and documented. The preprocessor would read .mt1, .mt2, and .mt3 files in MorphTrans-style format (with keywords in English) and generate the current XML. There would also be the opposite tool (much easier to write as an XSLT stylesheet) to generate MorphTrans-style code from current XML code. Morphtrans can of course be redesigned a bit, and, in fact, it should.
  2. The same for .dix files. Two roundtrip converters to use the old interNOSTRUM-style format (http://www.sepln.org/revistaSEPLN/revista/25/25-Pag93.pdf), which is much easier to write.