Matxin 1.0 New Language Pair HOWTO

From Apertium
Jump to navigation Jump to search

This page intends to give a step-by-step walk-through of how to create a new translator in the Matxin platform.

Prerequisites

Main article: Matxin

This page does not give instructions on installing Matxin, but presumes that the following packages are correctly installed.

  • lttoolbox (from SVN)
  • Freeling (from SVN)
  • Matxin (from SVN)
  • a text editor (or a specialised XML editor if you prefer)

Overview

As mentioned in the lead, this page intends to give a step-by-step guide to creating a new language pair with Matxin from scratch. No programming knowledge is required, all that needs to be defined are some dictionaries and grammars. The Matxin platform is described in detail in Documentation of Matxin and on the Matxin homepage. This page will only focus on the creation of a new language pair, and will avoid theoretical and methodological issues.

The language pair for the tutorial will be Breton to English. This has been chosen as the two languages have fairy divergent word order (Breton is fairly free, allowing VSO, OVS and SVO, where English is fairly uniformly SVO) which can show some of the advantage which Matxin has over Apertium.

Getting started

Analysis

The analysis process in Matxin is done by Freeling, an free / open-source suite of language analysers. The analysis is done in four stages, requiring four (or more) separate files. The first is the morphological dictionary, which is basically a full-form list (e.g. Speling format) compiled into a BerkeleyDB format. There are then files for word-category disambiguation and for specifying chunking and dependency rules.

Morphological

Category disambiguation

Chunking

Dependency parsing