Difference between revisions of "Matxin New Language Pair HOWTO"
Line 1: | Line 1: | ||
{{TOCD}} |
|||
This page describes the process of creating a new language pair with [[Matxin]], a dependency-based machine translation system. |
This page describes the process of creating a new language pair with [[Matxin]], a dependency-based machine translation system. |
||
==Analysis== |
==Analysis== |
||
There are a number of ways analysis can be done in Matxin, the [[matxin-spa-eus|Spanish to Basque]] system uses [[FreeLing]], while the [[matxin-eng-eus|English to Basqu]] system uses a wrapper around the Stanford parser. In this tutorial we're going to be using [[Constraint Grammar]] to do dependency parsing of pre-disambiguated sentences. Writing a morphological analyser and morphological disambiguator is out of the scope of this HOWTO, but for more information, check out the following pages: |
|||
lttoolbox | hfst |
|||
* [[Starting a new language with lttoolbox]] |
|||
* [[Starting a new language with HFST]] |
|||
* [[Constraint Grammar]] |
|||
So, let's assume that you've been through those tutorials and have a morphological analyser capable of analysing and disambiguating sentences in Turkish. You'll give it a sentence like "Dün benim için aldığın birayı içeceğim." and get some output like: |
|||
-> CG |
|||
<pre> |
|||
^Dün/dün<adv>$ ^benim/ben<prn><pers><p1><sg><gen>$ ^için/için<post>$ ^aldığın/al<v><tv><gpr_past><px2sg>$ |
|||
^birayı/bira<n><acc>$ ^içeceğim/iç<v><tv><fut><p1><sg>$^./.<sent>$ |
|||
</pre> |
|||
==Transfer== |
==Transfer== |
Revision as of 14:00, 12 May 2016
Contents |
This page describes the process of creating a new language pair with Matxin, a dependency-based machine translation system.
Analysis
There are a number of ways analysis can be done in Matxin, the Spanish to Basque system uses FreeLing, while the English to Basqu system uses a wrapper around the Stanford parser. In this tutorial we're going to be using Constraint Grammar to do dependency parsing of pre-disambiguated sentences. Writing a morphological analyser and morphological disambiguator is out of the scope of this HOWTO, but for more information, check out the following pages:
So, let's assume that you've been through those tutorials and have a morphological analyser capable of analysing and disambiguating sentences in Turkish. You'll give it a sentence like "Dün benim için aldığın birayı içeceğim." and get some output like:
^Dün/dün<adv>$ ^benim/ben<prn><pers><p1><sg><gen>$ ^için/için<post>$ ^aldığın/al<v><tv><gpr_past><px2sg>$ ^birayı/bira<n><acc>$ ^içeceğim/iç<v><tv><fut><p1><sg>$^./.<sent>$
Transfer
lttoolbox matxin
Generation
lttoolbox | hfst