Difference between revisions of "Apertium-tat"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
  +
{{Infobox language
  +
|name=Tatar
  +
|states=Tatarstan
  +
|iso=tat
  +
|iso2=tt
  +
}}
   
 
'''Apertium-tat''' is a morphological analyser/generator and CG tagger for [[Tatar]], currently under development. It is intended to be compatible with transducers for other [[Turkic languages]] so that they can be translated between. It's used in the following language pairs:
 
'''Apertium-tat''' is a morphological analyser/generator and CG tagger for [[Tatar]], currently under development. It is intended to be compatible with transducers for other [[Turkic languages]] so that they can be translated between. It's used in the following language pairs:
  +
 
* [[Kazakh and Tatar]]
 
* [[Kazakh and Tatar]]
 
* [[Tatar and Bashkir]]
 
* [[Tatar and Bashkir]]
 
* [[Tatar and Russian]]
 
* [[Tatar and Russian]]
 
== Current State ==
 
{{LangStats | lang = tat | corpus1 = quran | corpus2 = NewTestament | corpus3 = aytmatov | corpus4 = wp2013 | corpus5 = tatnews2005/11}}
 
   
 
== Installation ==
 
== Installation ==
  +
 
'''apertium-tat''' is located in [https://svn.code.sf.net/p/apertium/svn/languages/apertium-tat/ languages] module.
 
'''apertium-tat''' is located in [https://svn.code.sf.net/p/apertium/svn/languages/apertium-tat/ languages] module.
   
 
You will need [[HFST]], lttoolbox, apertium and [[CG|vislcg]] installed on your computer to be able to use it.
 
You will need [[HFST]], lttoolbox, apertium and [[CG|vislcg]] installed on your computer to be able to use it.
   
If you are using a Debian-based distro, the easiest way to get those dependencies is to install them from [[User:Tino Didriksen]]'s [[Prerequisites for Debian|repository]].
+
The easiest way to get those dependencies is to install them from [[User:Tino Didriksen]]'s [[Prerequisites for Debian|repository]].
   
 
Then you can check out apertium-tat from our svn repository and compile it:
 
Then you can check out apertium-tat from our svn repository and compile it:
Line 26: Line 31:
   
 
Otherwise, see the [[Installation]] page for instructions.
 
Otherwise, see the [[Installation]] page for instructions.
  +
  +
=== For spell checking ===
  +
  +
If you're compiling the apertium-tat spell checker, you'll additionally need these dependencies:
  +
* [https://svn.code.sf.net/p/hfst/code/trunk/hfst-ospell hfst-ospell] (./configure --enable-zhfst)
  +
** libarchive-dev
  +
* corevoikko/libvoikko/src/tools/voikkospell (./configure --enable-hfst)
  +
  +
You'll want to configure apertium-tat with --enable-ospell and then after making it, copy tat.zhfst to ~/.voikko/3/tt.zhfst
  +
  +
Then you can do this:
  +
<pre>
  +
$ echo "татарча билмим" | sed 's/ /\n/' | voikkospell -d tt -s
  +
C: татарча
  +
W: билмим
  +
S: белмим
  +
S: биелмим
  +
S: бөлмим
  +
S: бирмим
  +
S: бүлмим
  +
</pre>
   
 
== Usage ==
 
== Usage ==

Revision as of 14:10, 13 August 2015

Tatar - татар теле
language transducer
Coverage: ~91%
Stems: 55,702
Vanilla stems: 54,773
Paradigms:
Location: apertium-tat (languages)
Families:
Areas:
Lang info Tatar

Apertium-tat is a morphological analyser/generator and CG tagger for Tatar, currently under development. It is intended to be compatible with transducers for other Turkic languages so that they can be translated between. It's used in the following language pairs:

Installation

apertium-tat is located in languages module.

You will need HFST, lttoolbox, apertium and vislcg installed on your computer to be able to use it.

The easiest way to get those dependencies is to install them from User:Tino Didriksen's repository.

Then you can check out apertium-tat from our svn repository and compile it:

svn co http://svn.code.sf.net/p/apertium/svn/languages/apertium-tat/
cd apertium-tat
./autogen.sh
make

Otherwise, see the Installation page for instructions.

For spell checking

If you're compiling the apertium-tat spell checker, you'll additionally need these dependencies:

  • hfst-ospell (./configure --enable-zhfst)
    • libarchive-dev
  • corevoikko/libvoikko/src/tools/voikkospell (./configure --enable-hfst)

You'll want to configure apertium-tat with --enable-ospell and then after making it, copy tat.zhfst to ~/.voikko/3/tt.zhfst

Then you can do this:

$ echo "татарча билмим" | sed 's/ /\n/' | voikkospell -d tt -s
C: татарча
W: билмим
S: белмим
S: биелмим
S: бөлмим
S: бирмим
S: бүлмим

Usage

Morphological analysis:

$ echo "барабыз" | apertium -d . tat-morph 
^барабыз/бар<v><iv><pres><p1><pl>$^./.<sent>$

apertium -d . tat-tagger will give disambiguated output, apertium -d . tat-disam – disambiguated output in CG format showing which rules were applied:

$ echo "Без урманга таба барабыз." | apertium -d . tat-disam 
"<Без>"
	"без" prn pers p1 pl nom
;	"без" prn pers p1 pl nom
;		"и" cop p3 pl REMOVE:236
;	"без" prn pers p1 pl nom
;		"и" cop p3 sg REMOVE:236
"<урманга>"
	"урман" n dat
"<таба>"
	"таба" post SELECT:543
;	"таба" n nom
;		"и" cop p3 pl REMOVE:236
;	"таба" n nom
;		"и" cop p3 sg REMOVE:236
;	"таба" n attr REMOVE:452
;	"тап" v tv prc_impf SELECT:543
;	"таба" n nom SELECT:543
;	"тап" v tv pres p3 sg SELECT:543
"<барабыз>"
	"бар" v iv pres p1 pl
"<..>"
	".." sent

If you want to know what exactly each mode does (in our case, modes are "tat-morph", "tat-tagger" and "tat-disam"), have a look at the modes.xml file or the modes/ directory.

Components

  • apertium-tat.tat.lexc : lexicon (stems + morphotactics)
  • apertium-tat.tat.twol : morphophnological rules
  • apertium-tat.tat.rlx : disambiguation rules