English and Kazakh
Starting work on Apertium English to Kazakh
These notes are basically for Anel, Aizhan and Assem who have started to develop this language pair...
Installing what is needed
Operating System
Install a suitable GNU/Linux system such as Debian, Ubuntu, Mint...
Install build essentials, etc.
Open a terminal window and type
sudo apt-get install subversion build-essential g++ pkg-config gawk libxml2 \ libxml2-dev libxml2-utils xsltproc flex automake autoconf libtool libpcre3-dev \ cmake libicu-dev libboost-dev libgoogle-perftools-dev bison libreadline-dev zlib1g-dev
enter your password and Wait till the packages are downloaded and installed.
If you don't already have a directory for sources, make one in your home directory and enter it:
cd ~ mkdir Source cd Source
Install HFST
This language pair uses the Helsinki Finite State Toolkit for Kazakh generation, so we need to install it, and its dependencies. (But OpenFST is now included with HFST, so there is no longer a need to install OpenFST separately.)
Install Foma
- Main article: Foma
svn checkout http://foma.googlecode.com/svn/trunk/foma/ foma cd foma make sudo make install cd ..
Install HFST
- Main article: HFST
svn co https://svn.code.sf.net/p/hfst/code/trunk/hfst3 cd hfst3/ ./autogen.sh scripts/generate-cc-files.sh # It's OK if this step fails ./configure --enable-lexc --with-foma --disable-tagger --enable-proc make sudo make install sudo ldconfig cd ..
Troubleshooting
When doing "make" with old autotools (pre 1.14?)
make[5]: *** No rule to make target `xre_parse.hh', needed by `xre_lex.ll'. Stop.
Run scripts/generate-cc-files.sh
and then make again.
Install VISLCG3
- Main article: Apertium and Constraint Grammar
svn co http://beta.visl.sdu.dk/svn/visl/tools/vislcg3/trunk vislcg3 cd vislcg3 ./cmake.sh make -j3 sudo make install cd ..
Download apertium, lttoolbox and eng-kaz data from SVN
- Main article: Minimal installation from SVN
cd ~/Source svn co https://svn.code.sf.net/p/apertium/svn/trunk/lttoolbox svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-lex-tools svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-kaz svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-eng-kaz
Compile and install lttoolbox
cd lttoolbox/ PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh make sudo make install sudo ldconfig
Compile and install apertium
cd .. cd apertium/ PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh make sudo make install sudo ldconfig
Compile and install apertium-lex-tools
cd .. cd apertium-lex-tools PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh make sudo make install sudo ldconfig
Install Kazakh language
cd .. cd apertium-kaz ./autogen.sh make
Install English--Kazakh language pair data from incubator
cd .. cd apertium-eng-kaz/ PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh --with-lang2=/home/aida/Source/apertium-kaz make
Troubleshooting
If you get:
lt-comp: error while loading shared libraries: liblttoolbox3-3.2.so.0: cannot open shared object file: No such file or directory
Then you should do:
sudo ldconfig
Browse SVN
Here you can look at changes that have been made:
http://apertium.svn.sf.net/viewvc/apertium/incubator/apertium-eng-kaz/
Contact
IRC
Open up XChat (normally "Programs -> Internet -> XChat IRC") and type:
/server irc.freenode.net /join #apertium /join #hfst
To install xchat:
sudo apt-get install xchat
In Windows:
http://www.silverex.org/download/
Chat logs/archives: http://alpha.visl.sdu.dk/~tino/pisg/freenode/logs/
Mailing list
Email: apertium-turkic@lists.sourceforge.net
http://blog.gmane.org/gmane.science.linguistics.turkic.mt
Some information before generating data
Some of this information is outdated and needs work
Postpositions
Apparently Kazakh has 5 kinds of postpositions, according to the case of the NP they follow. Some of those following genitive may be interpreted as "nouns" with a case, such as
бақшаның астында
garden-of bottom-in
garden.gen bottom.loc
"under the garden"
where астын is roughly the noun "bottom", much as in Basque "ortu-a-ren azpi-an" "azpi" is a noun.
With nominative (or base form)
Check this list:
- арқылы through
- туралы about
- секілді similarly to
- жөнінде about
With genitive
- астынан from below
- астында above (top-its-in)
- жанынан from beside (side-its-from)
- жанында beside (side-its-in)
With dative
- қарай (towards)
- арналған (intended for)
With ablative
- кейін behind, after
With instrumental
- қатар beside
- бірге together with