Difference between revisions of "Analysing Finnish text"

From Apertium
Jump to navigation Jump to search
(Redirected page to Finnish)
 
(11 intermediate revisions by 7 users not shown)
Line 1: Line 1:
  +
#redirect[[Finnish]]
{{TOCD}}
 
 
==Installation==
 
 
First make a directory called something like "source" in your home directory. The commands below assume you start in that directory.
 
 
===Install SFST===
 
 
<pre>
 
$ sudo apt-get install libreadline5-dev
 
$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/SFST/SFST-1.4.2.tar.gz
 
$ cd SFST/src
 
$ make
 
$ sudo make install
 
$ sudo make libinstall
 
$ cd ..
 
</pre>
 
 
===Install OpenFST===
 
 
<pre>
 
$ wget http://openfst.cs.nyu.edu/twiki/pub/FST/FstDownload/openfst-1.2.6.tar.gz
 
$ tar -xzvf openfst-1.2.6.tar.gz
 
$ cd openfst-1.2.6/
 
$ ./configure
 
$ make
 
$ sudo make install
 
$ cd ..
 
</pre>
 
 
===Install HFST===
 
<!--
 
<pre>
 
$ svn co https://hfst.svn.sourceforge.net/svnroot/hfst/trunk hfst
 
$ cd hfst/hfst3
 
$ sh autogen.sh
 
$ ./configure --without-foma
 
$ make
 
$ sudo make install
 
$ cd ..
 
</pre>
 
-->
 
 
<pre>
 
$ wget http://downloads.sourceforge.net/project/hfst/hfst/hfst-2.4.1.tar.gz
 
$ tar -xzvf hfst-2.4.1.tar.gz
 
$ cd hfst-2.4.1/
 
$ ./configure
 
$ make
 
$ sudo make install
 
$ cd ..
 
</pre>
 
 
===Install Omorfi===
 
 
<pre>
 
$ svn co http://svn.gna.org/svn/omorfi/trunk omorfi
 
$ cd omorfi
 
$ sh autogen.sh
 
$ ./configure
 
$ make
 
$ sudo make install
 
$ cd ..
 
</pre>
 
 
===Install VISLCG===
 
 
Some hints can be found here:
 
* [http://giellatekno.uit.no/doc/tools/docu-vislcg3.html Giellatekno's vislcg3 installation page]
 
* [http://giellatekno.uit.no/doc/tools/cg3-usage.html Giellatekno's vislcg3 usage page]
 
 
The main vislcg3 page is at [http://beta.visl.sdu.dk/cg3.html http://beta.visl.sdu.dk/cg3.html].
 
 
==Usage==
 
 
===Morphological analysis===
 
 
====Testing====
 
 
<pre>
 
$ cd omorfi/src
 
$ echo "auton" | hfst-optimized-lookup mor-omorfi.cg.hfst.ol
 
auton auto+N+Sg+Gen
 
$ echo "autojen" | hfst-optimized-lookup mor-omorfi.cg.hfst.ol
 
autojen auto+N+Pl+Gen
 
 
</pre>
 
 
===Morphological disambiguation===
 
 
The analysis chain is as follows:
 
 
# take text
 
# preprocess: change it to one token per line (with proper punctuation handling)
 
# morphological analysis
 
# change it from omorfi output to vislcg3 input
 
# run it through vislcg3
 
 
In the following, apertium-style example, points 2, 3, 4 above have been included in one operation, ''hfst-proc''. In [http://giellatekno.uit.no/doc/tools/docu-sme-manual.html the Giellatekno environment] they are run separately. The command below is run with the ''--trace'' option, giving the type (MAP, REMOVE, etc.) and line number in the cg file (here ''apertium-sme-fin.fin-sme.rlx'', cf. the message ''MAP:2859'', indicating that the addition of @+FMAINV tag is done with a mapping rule on line 2859).
 
 
<pre>
 
$ echo "Alussa loi Jumala taivaan ja maan." | hfst-proc -C fin-sme.automorf.hfst | vislcg3 --trace -g apertium-sme-fin.fin-sme.rlx
 
VISL CG-3 Disambiguator version 0.9.7.6378
 
Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8
 
Parsing grammar took 0.06 seconds.
 
Grammar has 6 sections, 0 templates, 1379 rules, 1312 sets, 748 c-tags, 1299 s-tags.
 
"<Alussa>"
 
"alku" N Sg Ine
 
"<loi>"
 
"luoda" V Act Ind Prt Sg3 @+FMAINV MAP:2859
 
"<Jumala>"
 
"Jumala" N Prop Sg Nom @→N MAP:2694
 
"jumala" N Sg Nom @→N MAP:2694
 
"<taivaan>"
 
"taivas" N Sg Gen @→N MAP:2680
 
"<ja>"
 
"ja" CC @CNP MAP:2651
 
; "ja" Pcle REMOVE:820
 
"<maan>"
 
"maa" N Sg Gen @←OBJ MAP:2785
 
"<.>"
 
"." Punct CLB ADD:793
 
 
 
</pre>
 
 
[[Category:Documentation]]
 

Latest revision as of 19:33, 18 April 2017

Redirect to: