Difference between revisions of "Matxin New Language Pair HOWTO"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
  +
{{TOCD}}
  +
 
This page describes the process of creating a new language pair with [[Matxin]], a dependency-based machine translation system.
 
This page describes the process of creating a new language pair with [[Matxin]], a dependency-based machine translation system.
   
 
==Analysis==
 
==Analysis==
   
  +
There are a number of ways analysis can be done in Matxin, the [[matxin-spa-eus|Spanish to Basque]] system uses [[FreeLing]], while the [[matxin-eng-eus|English to Basqu]] system uses a wrapper around the Stanford parser. In this tutorial we're going to be using [[Constraint Grammar]] to do dependency parsing of pre-disambiguated sentences. Writing a morphological analyser and morphological disambiguator is out of the scope of this HOWTO, but for more information, check out the following pages:
lttoolbox | hfst
 
  +
  +
* [[Starting a new language with lttoolbox]]
  +
* [[Starting a new language with HFST]]
  +
* [[Constraint Grammar]]
   
  +
So, let's assume that you've been through those tutorials and have a morphological analyser capable of analysing and disambiguating sentences in Turkish. You'll give it a sentence like "Dün benim için aldığın birayı içeceğim." and get some output like:
-> CG
 
   
  +
<pre>
  +
^Dün/dün<adv>$ ^benim/ben<prn><pers><p1><sg><gen>$ ^için/için<post>$ ^aldığın/al<v><tv><gpr_past><px2sg>$
  +
^birayı/bira<n><acc>$ ^içeceğim/iç<v><tv><fut><p1><sg>$^./.<sent>$
  +
</pre>
   
 
==Transfer==
 
==Transfer==

Revision as of 14:00, 12 May 2016

This page describes the process of creating a new language pair with Matxin, a dependency-based machine translation system.

Analysis

There are a number of ways analysis can be done in Matxin, the Spanish to Basque system uses FreeLing, while the English to Basqu system uses a wrapper around the Stanford parser. In this tutorial we're going to be using Constraint Grammar to do dependency parsing of pre-disambiguated sentences. Writing a morphological analyser and morphological disambiguator is out of the scope of this HOWTO, but for more information, check out the following pages:

So, let's assume that you've been through those tutorials and have a morphological analyser capable of analysing and disambiguating sentences in Turkish. You'll give it a sentence like "Dün benim için aldığın birayı içeceğim." and get some output like:

^Dün/dün<adv>$ ^benim/ben<prn><pers><p1><sg><gen>$ ^için/için<post>$ ^aldığın/al<v><tv><gpr_past><px2sg>$ 
^birayı/bira<n><acc>$ ^içeceğim/iç<v><tv><fut><p1><sg>$^./.<sent>$

Transfer

lttoolbox matxin

Generation

lttoolbox | hfst

See also