Difference between revisions of "Matxin New Language Pair HOWTO"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
{{TOCD}}

This page describes the process of creating a new language pair with [[Matxin]], a dependency-based machine translation system.
This page describes the process of creating a new language pair with [[Matxin]], a dependency-based machine translation system.


==Analysis==
==Analysis==


There are a number of ways analysis can be done in Matxin, the [[matxin-spa-eus|Spanish to Basque]] system uses [[FreeLing]], while the [[matxin-eng-eus|English to Basqu]] system uses a wrapper around the Stanford parser. In this tutorial we're going to be using [[Constraint Grammar]] to do dependency parsing of pre-disambiguated sentences. Writing a morphological analyser and morphological disambiguator is out of the scope of this HOWTO, but for more information, check out the following pages:
lttoolbox | hfst

* [[Starting a new language with lttoolbox]]
* [[Starting a new language with HFST]]
* [[Constraint Grammar]]


So, let's assume that you've been through those tutorials and have a morphological analyser capable of analysing and disambiguating sentences in Turkish. You'll give it a sentence like "Dün benim için aldığın birayı içeceğim." and get some output like:
-> CG


<pre>
^Dün/dün<adv>$ ^benim/ben<prn><pers><p1><sg><gen>$ ^için/için<post>$ ^aldığın/al<v><tv><gpr_past><px2sg>$
^birayı/bira<n><acc>$ ^içeceğim/iç<v><tv><fut><p1><sg>$^./.<sent>$
</pre>


==Transfer==
==Transfer==

Revision as of 14:00, 12 May 2016

This page describes the process of creating a new language pair with Matxin, a dependency-based machine translation system.

Analysis

There are a number of ways analysis can be done in Matxin, the Spanish to Basque system uses FreeLing, while the English to Basqu system uses a wrapper around the Stanford parser. In this tutorial we're going to be using Constraint Grammar to do dependency parsing of pre-disambiguated sentences. Writing a morphological analyser and morphological disambiguator is out of the scope of this HOWTO, but for more information, check out the following pages:

So, let's assume that you've been through those tutorials and have a morphological analyser capable of analysing and disambiguating sentences in Turkish. You'll give it a sentence like "Dün benim için aldığın birayı içeceğim." and get some output like:

^Dün/dün<adv>$ ^benim/ben<prn><pers><p1><sg><gen>$ ^için/için<post>$ ^aldığın/al<v><tv><gpr_past><px2sg>$ 
^birayı/bira<n><acc>$ ^içeceğim/iç<v><tv><fut><p1><sg>$^./.<sent>$

Transfer

lttoolbox matxin

Generation

lttoolbox | hfst

See also