Difference between revisions of "Apertium-tat"
m |
|||
Line 70: | Line 70: | ||
If you want to know what exactly each mode does (in our case, modes are "tat-morph", "tat-tagger" and "tat-disam"), have a look at the modes.xml file or the modes/ directory. |
If you want to know what exactly each mode does (in our case, modes are "tat-morph", "tat-tagger" and "tat-disam"), have a look at the modes.xml file or the modes/ directory. |
||
== Components |
== Components == |
||
* apertium-tat.tat.lexc : lexicon (stems + morphotactics) |
* apertium-tat.tat.lexc : lexicon (stems + morphotactics) |
Revision as of 19:49, 27 August 2014
Apertium-tat is a morphological analyser/generator and CG tagger for Tatar, currently under development. It is intended to be compatible with transducers for other Turkic languages so that they can be translated between. It's used in the following language pairs:
Contents
Current State
{{#set_param_default | corpus1 | None }} {{#set_param_default | corpus2 | None }} {{#set_param_default | corpus3 | None }} {{#set_param_default | corpus4 | None }} {{#set_param_default | corpus5 | None }} {{#set_param_default | corpus6 | None }} {{#set_param_default | corpus7 | None }} {{#set_param_default | corpus8 | None }} {{#set_param_default | corpus9 | None }} {{#set_param_default | corpus10 | None }}
- Number of stems: 55,702 {{#ifneq | | | () }}
- Disambiguation rules: 123
- Coverage: ~91%
{{#ifneq | quran | None |
{{#ifneq | | | | }}}}
{{#ifneq | NewTestament | None |
{{#ifneq | | | | }}}}
{{#ifneq | aytmatov | None |
{{#ifneq | | | | }}}}
{{#ifneq | wp2013 | None |
{{#ifneq | | | | }}}}
{{#ifneq | tatnews2005/11 | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus6}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus7}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus8}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus9}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus10}}} | None |
{{#ifneq | | | | }}}}
corpus | words | coverage | |
---|---|---|---|
<nowinter>[[|quran]]</nowinter> | quran | 165K | ~89.2% |
<nowinter>[[|NewTestament]]</nowinter> | NewTestament | 137K | ~94.2% |
<nowinter>[[|aytmatov]]</nowinter> | aytmatov | 5K | ~93.4% |
<nowinter>[[|wp2013]]</nowinter> | wp2013 | 128K | ~87.3% |
<nowinter>[[|tatnews2005/11]]</nowinter> | tatnews2005/11 | 4.6M | ~90.7% |
<nowinter>[[|{{{corpus6}}}]]</nowinter> | {{{corpus6}}} | ~% | |
<nowinter>[[|{{{corpus7}}}]]</nowinter> | {{{corpus7}}} | ~% | |
<nowinter>[[|{{{corpus8}}}]]</nowinter> | {{{corpus8}}} | ~% | |
<nowinter>[[|{{{corpus9}}}]]</nowinter> | {{{corpus9}}} | ~% | |
<nowinter>[[|{{{corpus10}}}]]</nowinter> | {{{corpus10}}} | ~% |
Installation
apertium-tat is located in languages module.
You will need HFST, lttoolbox, apertium and vislcg installed on your computer to be able to use it.
If you are using a Debian-based distro, the easiest way to get those dependencies is to install them with apt-get from User:Tino Didriksen's repository:
wget http://apertium.projectjj.com/apt/install-nightly.sh -O - | sudo bash sudo apt-get -f install locales build-essential automake subversion pkg-config \ gawk apertium lttoolbox libapertium3-3.3-dev liblttoolbox3-3.3-dev apertium-lex-tools \ cg3 hfst libhfst36-dev
Then you can check out apertium-tat from our svn repository and compile it:
svn co http://svn.code.sf.net/p/apertium/svn/languages/apertium-tat/ cd apertium-tat ./autogen.sh make
Otherwise, see the Installation page for instructions.
Usage
Morphological analysis:
$ echo "барабыз" | apertium -d . tat-morph ^барабыз/бар<v><iv><pres><p1><pl>$^./.<sent>$
apertium -d . tat-tagger
will give disambiguated output, apertium -d . tat-disam
-- disambiguated output in CG format showing which rules were applied:
$ echo "Без урманга таба барабыз." | apertium -d . tat-disam "<Без>" "без" prn pers p1 pl nom ; "без" prn pers p1 pl nom ; "и" cop p3 pl REMOVE:236 ; "без" prn pers p1 pl nom ; "и" cop p3 sg REMOVE:236 "<урманга>" "урман" n dat "<таба>" "таба" post SELECT:543 ; "таба" n nom ; "и" cop p3 pl REMOVE:236 ; "таба" n nom ; "и" cop p3 sg REMOVE:236 ; "таба" n attr REMOVE:452 ; "тап" v tv prc_impf SELECT:543 ; "таба" n nom SELECT:543 ; "тап" v tv pres p3 sg SELECT:543 "<барабыз>" "бар" v iv pres p1 pl "<..>" ".." sent
If you want to know what exactly each mode does (in our case, modes are "tat-morph", "tat-tagger" and "tat-disam"), have a look at the modes.xml file or the modes/ directory.
Components
- apertium-tat.tat.lexc : lexicon (stems + morphotactics)
- apertium-tat.tat.twol : morphophnological rules
- apertium-tat.tat.rlx : disambiguation rules