Ideas for Google Summer of Code/Plain-text formats for Apertium data

From Apertium
Jump to navigation Jump to search


This task would involve:

  1. A preprocessor or compiler to avoid having to write structural transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very overt and clear, but clumsy and hard to write. Before Apertium, in we had a language for .t1x-style files called MorphTrans, which is described in the paper . I (Mlforcada) believe this language is much easier to write; it should be upgraded and documented. The preprocessor would read .mt1, .mt2, and .mt3 files in MorphTrans-style format (with keywords in English) and generate the current XML. There would also be the opposite tool (much easier to write as an XSLT stylesheet) to generate MorphTrans-style code from current XML code. Morphtrans can of course be redesigned a bit, and, in fact, it should.
  2. The same for .dix files. Two roundtrip converters to use the old interNOSTRUM-style format (, which is much easier to write.

Coding challenge[edit]

  • Write a parser which converts a .mode shell-script fragment into a modes.xml file.

Frequently asked questions[edit]

  • none yet, ask us something! :)

See also[edit]

More or less related work:

  • The GCI task "Write a script to comment out entries in an XML file" which accepts input in lt-expand format, and comments out XML entries which match
  • dixtools:Enhance! which lets the user give a new word and an existing one to base it on
  • Emacs#dix-mode which turns lines like "a:b" into XML entries using the preceding <e> as a template (or just "a" to do an <i>)
  • Brendan Malloy's yaml format which converts into dix entries
  • Easy_dictionary_maintenance