Difference between revisions of "Matxin 1.0 New Language Pair HOWTO"

From Apertium
Jump to navigation Jump to search
Line 14: Line 14:
   
 
As mentioned in the lead, this page intends to give a step-by-step guide to creating a new language pair with [[Matxin]] from scratch. No programming knowledge is required, all that needs to be defined are some dictionaries and grammars. The Matxin platform is described in detail in [[Documentation of Matxin]] and on the [http://matxin.sourceforge.net Matxin homepage]. This page will only focus on the creation of a new language pair, and will avoid theoretical and methodological issues.
 
As mentioned in the lead, this page intends to give a step-by-step guide to creating a new language pair with [[Matxin]] from scratch. No programming knowledge is required, all that needs to be defined are some dictionaries and grammars. The Matxin platform is described in detail in [[Documentation of Matxin]] and on the [http://matxin.sourceforge.net Matxin homepage]. This page will only focus on the creation of a new language pair, and will avoid theoretical and methodological issues.
  +
  +
The language pair for the tutorial will be Breton to English. This has been chosen as the two languages have fairy divergent word order (Breton is fairly free, allowing VSO, OVS and SVO, where English is fairly uniformly SVO) which can show some of the advantage which Matxin has over Apertium.
   
 
==Analysis==
 
==Analysis==

Revision as of 21:30, 1 June 2009

This page intends to give a step-by-step walk-through of how to create a new translator in the Matxin platform.

Prerequisites

Main article: Matxin

This page does not give instructions on installing Matxin, but presumes that the following packages are correctly installed.

Overview

As mentioned in the lead, this page intends to give a step-by-step guide to creating a new language pair with Matxin from scratch. No programming knowledge is required, all that needs to be defined are some dictionaries and grammars. The Matxin platform is described in detail in Documentation of Matxin and on the Matxin homepage. This page will only focus on the creation of a new language pair, and will avoid theoretical and methodological issues.

The language pair for the tutorial will be Breton to English. This has been chosen as the two languages have fairy divergent word order (Breton is fairly free, allowing VSO, OVS and SVO, where English is fairly uniformly SVO) which can show some of the advantage which Matxin has over Apertium.

Analysis

The analysis process in Matxin is done by Freeling, an free / open-source suite of language analysers. The analysis is done in four stages, requiring four (or more) separate files. The first is the morphological dictionary, which is basically a full-form list (e.g. Speling format) compiled into a BerkeleyDB format. There are then files for word-category disambiguation and for specifying chunking and dependency rules.

Morphological

Category disambiguation

Chunking

Dependency parsing