Difference between revisions of "Apertium-tat"
m |
ScoopGracie (talk | contribs) (Grammar) |
||
(36 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
{{Infobox language |
|||
|name=Tatar |
|||
|states=Tatarstan |
|||
|iso=tat |
|||
|iso2=tt |
|||
}} |
|||
''' |
'''Apertium-tat''' is a morphological analyser/generator and CG tagger for [[Tatar]], currently under development. It is intended to be compatible with transducers for other [[Turkic languages]] so they can be translated among. It's used in the following language pairs: |
||
* [[Kazakh and Tatar]] |
|||
* [[Tatar and Bashkir]] |
|||
* [[Tatar and Russian]] |
|||
== Installation == |
== Installation == |
||
'''tatmorph''' is currently located in [[tt-ba]]. |
|||
'''apertium-tat''' is located in [https://github.com/apertium/apertium-tat languages/apertium-tat]. |
|||
== Current State == |
|||
* Number of stems: {{:Tatmorph/stems}} |
|||
You will need [[HFST]], [[lttoolbox]], apertium and [[CG|vislcg]] installed on your computer to be able to use it. |
|||
* Coverage: {{:Tatmorph/coverage/average}} |
|||
The easiest way to get those dependencies is to install them from [[User:Tino Didriksen]]'s repository (see [[Prerequisites for Debian]] or [[Prerequisites for RPM]]). |
|||
Then you can check out apertium-tat from our git repository and compile it: |
|||
<pre> |
|||
git clone https://github.com/apertium/apertium-tat.git |
|||
cd apertium-tat |
|||
./autogen.sh |
|||
make |
|||
</pre> |
|||
Otherwise, see the [[Installation]] page for instructions. |
|||
=== For spell checking === |
|||
If you're compiling the apertium-tat spell checker, you'll additionally need these dependencies: |
|||
* [https://svn.code.sf.net/p/hfst/code/trunk/hfst-ospell hfst-ospell] (./configure --enable-zhfst) |
|||
** libarchive-dev |
|||
* corevoikko/libvoikko/src/tools/voikkospell (./configure --enable-hfst) |
|||
You'll want to configure apertium-tat with --enable-ospell and then after making it, copy tat.zhfst to ~/.voikko/3/tt.zhfst |
|||
Then you can do this: |
|||
<pre> |
|||
$ echo "татарча билмим" | sed 's/ /\n/' | voikkospell -d tt -s |
|||
C: татарча |
|||
W: билмим |
|||
S: белмим |
|||
S: биелмим |
|||
S: бөлмим |
|||
S: бирмим |
|||
S: бүлмим |
|||
</pre> |
|||
== Usage == |
|||
Morphological analysis: |
|||
<pre> |
|||
$ echo "барабыз" | apertium -d . tat-morph |
|||
^барабыз/бар<v><iv><pres><p1><pl>$^./.<sent>$ |
|||
</pre> |
|||
<code>apertium -d . tat-tagger</code> will give disambiguated output, <code>apertium -d . tat-disam</code> – disambiguated output in CG format showing which rules were applied: |
|||
<pre> |
|||
$ echo "Без урманга таба барабыз." | apertium -d . tat-disam |
|||
"<Без>" |
|||
"без" prn pers p1 pl nom |
|||
; "без" prn pers p1 pl nom |
|||
; "и" cop p3 pl REMOVE:236 |
|||
; "без" prn pers p1 pl nom |
|||
; "и" cop p3 sg REMOVE:236 |
|||
"<урманга>" |
|||
"урман" n dat |
|||
"<таба>" |
|||
"таба" post SELECT:543 |
|||
; "таба" n nom |
|||
; "и" cop p3 pl REMOVE:236 |
|||
; "таба" n nom |
|||
; "и" cop p3 sg REMOVE:236 |
|||
; "таба" n attr REMOVE:452 |
|||
; "тап" v tv prc_impf SELECT:543 |
|||
; "таба" n nom SELECT:543 |
|||
; "тап" v tv pres p3 sg SELECT:543 |
|||
"<барабыз>" |
|||
"бар" v iv pres p1 pl |
|||
"<..>" |
|||
".." sent |
|||
</pre> |
|||
If you want to know what exactly each mode does (in our case, modes are "tat-morph", "tat-tagger" and "tat-disam"), have a look at the modes.xml file or the modes/ directory. |
|||
== Components == |
|||
* apertium-tat.tat.lexc : lexicon (stems + morphotactics) |
|||
* apertium-tat.tat.twol : morphophnological rules |
|||
* apertium-tat.tat.rlx : disambiguation rules |
|||
== Paradigms == |
|||
<pre> |
|||
= Nouns = |
|||
LEXICON N1 ! Standard noun |
|||
LEXICON N1-RUS ! Noun with Russian phonology (e.g. университет) |
|||
LEXICON N-COMPOUND-PX ! Noun with obligatory possession (e.g. күз% яше) |
|||
LEXICON N3 ! Singularia tantum (e.g. Аллаһ) |
|||
= Proper nouns = |
|||
LEXICON NP-TOP ! Toponym (placename) |
|||
LEXICON NP-ANT-M ! Male first name |
|||
LEXICON NP-ANT-F ! Female first name |
|||
LEXICON NP-COG-M ! Male фамилия |
|||
LEXICON NP-COG-MF ! Male + female фамилия |
|||
LEXICON NP-COG-OB ! Normal фамилия in -ов / -ев |
|||
LEXICON NPCOGFLEX ! Polish фамилия with -ска |
|||
LEXICON NP-PAT-VICH ! Patronymics in -вич |
|||
LEXICON NP-ORG ! Organisations |
|||
LEXICON NP-AL ! Other proper names. |
|||
= Verbs = |
|||
LEXICON V-IV-IR ! Intransitive verbs with aorist in -Iр |
|||
LEXICON V-TV-NOPASS-IR ! Transitive verb without a passive. |
|||
LEXICON V-TV-PASS-IR ! Irregular passive stem of a transitive verb. |
|||
LEXICON V-TV-IR ! Transitive verb with aorist in -Iр |
|||
LEXICON V-IV-AR ! Intransitive verb with aorist in -Ар |
|||
LEXICON V-TV-AR ! Transitive verb with aorist in -Aр |
|||
= Adjectives = |
|||
LEXICON A1 ! adjectives that can be both substantivized and andverbialized; |
|||
! all three readings (<adj>, <adj.subst> and <adj.advl>) have comparison levels. |
|||
!# яхшы, тиз |
|||
LEXICON A2 ! <adj> and <adj.subst> readings have comparison levels. |
|||
!# иске |
|||
LEXICON A3 ! adjectives without adverbial reading & so-called "predicatives" (бар, юк); |
|||
! no comparison levels at all. |
|||
!# язгы, бар, юк |
|||
LEXICON A4 ! "pure" adjectives - no adverbial and substantive readings, no comparison levels; |
|||
= Adverbs = |
|||
LEXICON ADV-LANG ! languages ('татарча') |
|||
LEXICON ADV ! Normal adverb |
|||
LEXICON ADV-WITH-KI ! Adverb that can also take -GI |
|||
LEXICON ADV-WITH-KI-BIRE ! Adverb that can take -дәге |
|||
</pre> |
|||
[[Category:Tools]] |
|||
{| class="wikitable" |
|||
[[Category:Tatar]] |
|||
|- |
|||
! corpus !! words !! coverage |
|||
|- |
|||
|new testament |
|||
|align="right"|137K |
|||
| ~{{:Tatmorph/coverage/new_testament}}% |
|||
|- |
|||
|wp 2011-12-15 |
|||
|align="right"| 87.7k |
|||
| ~{{:Tatmorph/coverage/wp}}% |
|||
|- |
|||
|} |
Latest revision as of 03:25, 22 January 2020
Tatar - татар теле | |
---|---|
language transducer | |
Coverage: | ~91% |
Stems: | 55,702 |
Vanilla stems: | 54,773 |
Paradigms: | |
Location: | apertium-tat (languages) |
Families: | |
Areas: | |
Lang info | Tatar |
Apertium-tat is a morphological analyser/generator and CG tagger for Tatar, currently under development. It is intended to be compatible with transducers for other Turkic languages so they can be translated among. It's used in the following language pairs:
Installation[edit]
apertium-tat is located in languages/apertium-tat.
You will need HFST, lttoolbox, apertium and vislcg installed on your computer to be able to use it.
The easiest way to get those dependencies is to install them from User:Tino Didriksen's repository (see Prerequisites for Debian or Prerequisites for RPM).
Then you can check out apertium-tat from our git repository and compile it:
git clone https://github.com/apertium/apertium-tat.git cd apertium-tat ./autogen.sh make
Otherwise, see the Installation page for instructions.
For spell checking[edit]
If you're compiling the apertium-tat spell checker, you'll additionally need these dependencies:
- hfst-ospell (./configure --enable-zhfst)
- libarchive-dev
- corevoikko/libvoikko/src/tools/voikkospell (./configure --enable-hfst)
You'll want to configure apertium-tat with --enable-ospell and then after making it, copy tat.zhfst to ~/.voikko/3/tt.zhfst
Then you can do this:
$ echo "татарча билмим" | sed 's/ /\n/' | voikkospell -d tt -s C: татарча W: билмим S: белмим S: биелмим S: бөлмим S: бирмим S: бүлмим
Usage[edit]
Morphological analysis:
$ echo "барабыз" | apertium -d . tat-morph ^барабыз/бар<v><iv><pres><p1><pl>$^./.<sent>$
apertium -d . tat-tagger
will give disambiguated output, apertium -d . tat-disam
– disambiguated output in CG format showing which rules were applied:
$ echo "Без урманга таба барабыз." | apertium -d . tat-disam "<Без>" "без" prn pers p1 pl nom ; "без" prn pers p1 pl nom ; "и" cop p3 pl REMOVE:236 ; "без" prn pers p1 pl nom ; "и" cop p3 sg REMOVE:236 "<урманга>" "урман" n dat "<таба>" "таба" post SELECT:543 ; "таба" n nom ; "и" cop p3 pl REMOVE:236 ; "таба" n nom ; "и" cop p3 sg REMOVE:236 ; "таба" n attr REMOVE:452 ; "тап" v tv prc_impf SELECT:543 ; "таба" n nom SELECT:543 ; "тап" v tv pres p3 sg SELECT:543 "<барабыз>" "бар" v iv pres p1 pl "<..>" ".." sent
If you want to know what exactly each mode does (in our case, modes are "tat-morph", "tat-tagger" and "tat-disam"), have a look at the modes.xml file or the modes/ directory.
Components[edit]
- apertium-tat.tat.lexc : lexicon (stems + morphotactics)
- apertium-tat.tat.twol : morphophnological rules
- apertium-tat.tat.rlx : disambiguation rules
Paradigms[edit]
= Nouns = LEXICON N1 ! Standard noun LEXICON N1-RUS ! Noun with Russian phonology (e.g. университет) LEXICON N-COMPOUND-PX ! Noun with obligatory possession (e.g. күз% яше) LEXICON N3 ! Singularia tantum (e.g. Аллаһ) = Proper nouns = LEXICON NP-TOP ! Toponym (placename) LEXICON NP-ANT-M ! Male first name LEXICON NP-ANT-F ! Female first name LEXICON NP-COG-M ! Male фамилия LEXICON NP-COG-MF ! Male + female фамилия LEXICON NP-COG-OB ! Normal фамилия in -ов / -ев LEXICON NPCOGFLEX ! Polish фамилия with -ска LEXICON NP-PAT-VICH ! Patronymics in -вич LEXICON NP-ORG ! Organisations LEXICON NP-AL ! Other proper names. = Verbs = LEXICON V-IV-IR ! Intransitive verbs with aorist in -Iр LEXICON V-TV-NOPASS-IR ! Transitive verb without a passive. LEXICON V-TV-PASS-IR ! Irregular passive stem of a transitive verb. LEXICON V-TV-IR ! Transitive verb with aorist in -Iр LEXICON V-IV-AR ! Intransitive verb with aorist in -Ар LEXICON V-TV-AR ! Transitive verb with aorist in -Aр = Adjectives = LEXICON A1 ! adjectives that can be both substantivized and andverbialized; ! all three readings (<adj>, <adj.subst> and <adj.advl>) have comparison levels. !# яхшы, тиз LEXICON A2 ! <adj> and <adj.subst> readings have comparison levels. !# иске LEXICON A3 ! adjectives without adverbial reading & so-called "predicatives" (бар, юк); ! no comparison levels at all. !# язгы, бар, юк LEXICON A4 ! "pure" adjectives - no adverbial and substantive readings, no comparison levels; = Adverbs = LEXICON ADV-LANG ! languages ('татарча') LEXICON ADV ! Normal adverb LEXICON ADV-WITH-KI ! Adverb that can also take -GI LEXICON ADV-WITH-KI-BIRE ! Adverb that can take -дәге