Ideas for Google Summer of Code/Apertium TIPP

From Apertium
< Ideas for Google Summer of Code
Revision as of 19:49, 24 March 2020 by Popcorndude (talk | contribs) (categorize)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Writing a TIPP interface for Apertium[edit]

TIPP, the TMS Interoperability Protocol Package (where TMS means translation management system), [1], currently in version 1.5 but being upgraded to 2.0, specifies a container (package format) that allows the interchange of information along a translation value chain. There are various container varieties for different tasks. One such variety, called Translate-Strict-Bitext, represents a bilingual translation job. The 'request' TIPP would contain an XLIFF:doc ([2], a subset of XLIFF 1 [3]) file with the document to be translated and the corresponding metadata, and the corresponding 'response' TIPP would contain the results of Apertium MT applied to it, but taking into account the translation memory provided in the TIPP, if any (using Apertium's -m switch). Apertium should be endowed with the capacity to manage TIPP packages: unpack the request package, parse it, process it, and repack it. A demonstration web service would be a cool wrap-up of this project.

Coding challenge[edit]

  • Write a simple compliant XLIFF:doc (or XLIFF 1.2) package that validates against its specification, having at least four plain text segments (sentences) without any formatting. If needed, define the subset of XLIFF:doc (or XLIFF 1.2) that you will be supporting.
  • Write a script (or an XSLT file, etc.), which converts this to some of the formats that Apertium already handles, or write a de-formatter for XLIFF:doc / XLIFF 1.2
  • Write the corresponding scripts that converts the Apertium output to a valid XLIFF:doc (or XLIFF 1.2) package.

Important note[edit]

A group of people including Mikel L. Forcada are currently working on a new version of TIPP, labeled TIPP 2.0. It will support XLIFF 2.x, and will have a slightly different container format. It may be the case that TIPP 2.0 is released in time for this project to support it in some way, so that we have a reference implementation.