Matxin 1.0 New Language Pair HOWTO
This page intends to give a step-by-step walk-through of how to create a new translator in the Matxin platform.
Prerequisites
- Main article: Matxin
This page does not give instructions on installing Matxin, but presumes that the following packages are correctly installed.
Overview
As mentioned in the lead, this page intends to give a step-by-step guide to creating a new language pair with Matxin from scratch. No programming knowledge is required, all that needs to be defined are some dictionaries and grammars. The Matxin platform is described in detail in Documentation of Matxin and on the Matxin homepage. This page will only focus on the creation of a new language pair, and will avoid theoretical and methodological issues.
The language pair for the tutorial will be Breton to English. This has been chosen as the two languages have fairy divergent word order (Breton is fairly free, allowing VSO, OVS and SVO, where English is fairly uniformly SVO) which can show some of the advantage which Matxin has over Apertium.
Getting started
Analysis
The analysis process in Matxin is done by Freeling, an free / open-source suite of language analysers. The analysis is done in four stages, requiring four (or more) separate files. The first is the morphological dictionary, which is basically a full-form list (e.g. Speling format) compiled into a BerkeleyDB format. There are then files for word-category disambiguation and for specifying chunking and dependency rules.