English and Kazakh

From Apertium
Jump to navigation Jump to search

Starting work on Apertium English to Kazakh

These notes are basically for Anel, Aizhan and Assem who have started to develop this language pair...

Installing what is needed

Operating System

Install a suitable GNU/Linux system such as Debian, Ubuntu, Mint...

Install build essentials, etc.

Open a terminal window and type

sudo apt-get install subversion build-essential g++ pkg-config gawk libxml2 \
libxml2-dev libxml2-utils xsltproc flex automake autoconf libtool libpcre3-dev \
cmake libicu-dev libboost-dev libgoogle-perftools-dev bison libreadline-dev zlib1g-dev

enter your password and Wait till the packages are downloaded and installed.

If you don't already have a directory for sources, make one in your home directory and enter it:

cd ~
mkdir Source
cd Source

Install HFST

This language pair uses the Helsinki Finite State Toolkit for Kazakh generation, so we need to install it, and its dependencies. (But OpenFST is now included with HFST, so there is no longer a need to install OpenFST separately.)

Install Foma

Main article: Foma
svn checkout http://foma.googlecode.com/svn/trunk/foma/ foma 
cd foma
make
sudo make install
cd ..

Install HFST

Main article: HFST
svn co https://svn.code.sf.net/p/hfst/code/trunk/hfst3
cd hfst3/
./autogen.sh
scripts/generate-cc-files.sh # It's OK if this step fails
./configure --enable-lexc --with-foma --disable-tagger --enable-proc
make
sudo make install
sudo ldconfig
cd ..
Troubleshooting

When doing "make" with old autotools (pre 1.14?)

make[5]: *** No rule to make target `xre_parse.hh', needed by `xre_lex.ll'.  Stop.

Run scripts/generate-cc-files.sh and then make again.

Install VISLCG3

Main article: Apertium and Constraint Grammar
svn co http://beta.visl.sdu.dk/svn/visl/tools/vislcg3/trunk vislcg3
cd vislcg3
./cmake.sh 
make -j3
sudo make install
cd ..

Download apertium, lttoolbox and eng-kaz data from SVN

Main article: Minimal installation from SVN
cd ~/Source
svn co https://svn.code.sf.net/p/apertium/svn/trunk/lttoolbox
svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium
svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-lex-tools
svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-kaz
svn co https://svn.code.sf.net/p/apertium/svn/incubator/apertium-eng-kaz

Compile and install lttoolbox

cd lttoolbox/
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh
make
sudo make install
sudo ldconfig

Compile and install apertium

cd ..
cd apertium/
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh
make
sudo make install
sudo ldconfig

Compile and install apertium-lex-tools

cd ..
cd apertium-lex-tools
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh
make
sudo make install
sudo ldconfig

Install Kazakh language

cd ..
cd apertium-kaz
./autogen.sh
make

Install English--Kazakh language pair data from incubator

cd ..
cd apertium-eng-kaz/
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh --with-lang2=$HOME/Source/apertium-kaz
make

Troubleshooting

If you get:

lt-comp: error while loading shared libraries: liblttoolbox3-3.2.so.0: cannot open shared object file: No such file or directory

Then you should do:

sudo ldconfig

Browse SVN

Here you can look at changes that have been made:

http://apertium.svn.sf.net/viewvc/apertium/incubator/apertium-eng-kaz/

Contact

IRC

Open up XChat (normally "Programs -> Internet -> XChat IRC") and type:

/server irc.freenode.net
/join #apertium
/join #hfst

To install xchat:

sudo apt-get install xchat 

In Windows:

http://www.silverex.org/download/

Chat logs/archives: http://alpha.visl.sdu.dk/~tino/pisg/freenode/logs/

Mailing list

Email: apertium-turkic@lists.sourceforge.net

http://blog.gmane.org/gmane.science.linguistics.turkic.mt

Some information before generating data

Some of this information is outdated and needs work

Postpositions

Apparently Kazakh has 5 kinds of postpositions, according to the case of the NP they follow. Some of those following genitive may be interpreted as "nouns" with a case, such as

бақшаның астында

garden-of bottom-in

garden.gen bottom.loc

"under the garden"

where астын is roughly the noun "bottom", much as in Basque "ortu-a-ren azpi-an" "azpi" is a noun.

With nominative (or base form)

Check this list:

  • арқылы through
  • туралы about
  • секілді similarly to
  • жөнінде about

With genitive

  • астынан from below
  • астында above (top-its-in)
  • жанынан from beside (side-its-from)
  • жанында beside (side-its-in)

With dative

  • қарай (towards)
  • арналған (intended for)

With ablative

  • кейін behind, after

With instrumental

  • қатар beside
  • бірге together with