Difference between revisions of "Analysing Finnish text"
Jump to navigation
Jump to search
TommiPirinen (talk | contribs) (outdated) |
(Redirected page to Finnish) |
||
Line 1: | Line 1: | ||
+ | #redirect[[Finnish]] |
||
− | [[Analyser un texte finnois|En français]] |
||
− | |||
− | The informations here are a bit outdated, now-a-days you can probably just follow [[Installation]] guides to install [[Hfst]] and giella-fin. Read also more infomrations from the pages like [[Finnish]]. |
||
− | |||
− | {{TOCD}} |
||
− | |||
− | ==Installation== |
||
− | |||
− | First make a directory called something like "source" in your home directory. The commands below assume you start in that directory. |
||
− | |||
− | ===Install SFST=== |
||
− | |||
− | Note: you might have to uncomment the FPIC line in Makefile, in order to avoid a relocation error in sudo make libinstall. |
||
− | |||
− | <pre> |
||
− | $ sudo apt-get install libreadline5-dev |
||
− | $ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/SFST/SFST-1.4.6a.tar.gz |
||
− | $ tar -xzvf SFST-1.4.6a.tar.gz |
||
− | $ cd SFST/src |
||
− | $ make |
||
− | $ sudo make install |
||
− | $ sudo make libinstall |
||
− | $ cd .. |
||
− | </pre> |
||
− | |||
− | ===Install OpenFST=== |
||
− | |||
− | <pre> |
||
− | $ wget http://openfst.cs.nyu.edu/twiki/pub/FST/FstDownload/openfst-1.2.6.tar.gz |
||
− | $ tar -xzvf openfst-1.2.6.tar.gz |
||
− | $ cd openfst-1.2.6/ |
||
− | $ ./configure |
||
− | $ make |
||
− | $ sudo make install |
||
− | $ cd .. |
||
− | </pre> |
||
− | |||
− | ===Install HFST=== |
||
− | <!-- |
||
− | <pre> |
||
− | $ svn co https://hfst.svn.sourceforge.net/svnroot/hfst/trunk hfst |
||
− | $ cd hfst/hfst3 |
||
− | $ sh autogen.sh |
||
− | $ ./configure --without-foma |
||
− | $ make |
||
− | $ sudo make install |
||
− | $ cd .. |
||
− | </pre> |
||
− | --> |
||
− | |||
− | <pre> |
||
− | $ wget http://downloads.sourceforge.net/project/hfst/hfst/hfst-2.4.1.tar.gz |
||
− | $ tar -xzvf hfst-2.4.1.tar.gz |
||
− | $ cd hfst-2.4.1/ |
||
− | $ ./configure |
||
− | $ make |
||
− | $ sudo make install |
||
− | $ cd .. |
||
− | </pre> |
||
− | |||
− | ===Install Omorfi=== |
||
− | |||
− | <pre> |
||
− | $ svn co http://svn.gna.org/svn/omorfi/trunk omorfi |
||
− | $ cd omorfi |
||
− | $ sh autogen.sh |
||
− | $ ./configure |
||
− | $ make |
||
− | $ sudo make install |
||
− | $ cd .. |
||
− | </pre> |
||
− | |||
− | ===Install VISLCG=== |
||
− | |||
− | Some hints can be found here: |
||
− | * [http://giellatekno.uit.no/doc/tools/docu-vislcg3.html Giellatekno's vislcg3 installation page] |
||
− | * [http://giellatekno.uit.no/doc/tools/cg3-usage.html Giellatekno's vislcg3 usage page] |
||
− | |||
− | The main vislcg3 page is at [http://beta.visl.sdu.dk/cg3.html http://beta.visl.sdu.dk/cg3.html]. |
||
− | |||
− | ==Usage== |
||
− | |||
− | ===Morphological analysis=== |
||
− | |||
− | ====Testing==== |
||
− | |||
− | <pre> |
||
− | $ cd omorfi/src |
||
− | $ echo "auton" | hfst-optimized-lookup mor-omorfi.cg.hfst.ol |
||
− | auton auto+N+Sg+Gen |
||
− | $ echo "autojen" | hfst-optimized-lookup mor-omorfi.cg.hfst.ol |
||
− | autojen auto+N+Pl+Gen |
||
− | |||
− | </pre> |
||
− | |||
− | ===Morphological disambiguation=== |
||
− | |||
− | The analysis chain is as follows: |
||
− | |||
− | # take text |
||
− | # preprocess: change it to one token per line (with proper punctuation handling) |
||
− | # morphological analysis |
||
− | # change it from omorfi output to vislcg3 input |
||
− | # run it through vislcg3 |
||
− | |||
− | In the following, apertium-style example, points 2, 3, 4 above have been included in one operation, ''hfst-proc''. In [http://giellatekno.uit.no/doc/tools/docu-sme-manual.html the Giellatekno environment] they are run separately. The command below is run with the ''--trace'' option, giving the type (MAP, REMOVE, etc.) and line number in the cg file (here ''apertium-sme-fin.fin-sme.rlx'', cf. the message ''MAP:2859'', indicating that the addition of @+FMAINV tag is done with a mapping rule on line 2859). Running the same command but without the ''--trace'' option will give a clean output., but with less info. |
||
− | |||
− | <pre> |
||
− | $ echo "Alussa loi Jumala taivaan ja maan." | hfst-proc -C fin-sme.automorf.hfst | vislcg3 --trace -g apertium-sme-fin.fin-sme.rlx |
||
− | VISL CG-3 Disambiguator version 0.9.7.6378 |
||
− | Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8 |
||
− | Parsing grammar took 0.06 seconds. |
||
− | Grammar has 6 sections, 0 templates, 1379 rules, 1312 sets, 748 c-tags, 1299 s-tags. |
||
− | "<Alussa>" |
||
− | "alku" N Sg Ine |
||
− | "<loi>" |
||
− | "luoda" V Act Ind Prt Sg3 @+FMAINV MAP:2859 |
||
− | "<Jumala>" |
||
− | "Jumala" N Prop Sg Nom @→N MAP:2694 |
||
− | "jumala" N Sg Nom @→N MAP:2694 |
||
− | "<taivaan>" |
||
− | "taivas" N Sg Gen @→N MAP:2680 |
||
− | "<ja>" |
||
− | "ja" CC @CNP MAP:2651 |
||
− | ; "ja" Pcle REMOVE:820 |
||
− | "<maan>" |
||
− | "maa" N Sg Gen @←OBJ MAP:2785 |
||
− | "<.>" |
||
− | "." Punct CLB ADD:793 |
||
− | |||
− | |||
− | </pre> |
||
− | |||
− | |||
− | ===Morphological disambiguation within the Giellatekno framework=== |
||
− | |||
− | Note that the Finnish analysers are improved (on an admittedly slow pace) in Giellatekno's $GTHOME/langs/fin branch: |
||
− | |||
− | echo "Alussa loi Jumala taivaan ja maan." |preprocess|ufin|lookup2cg|vislcg3 -g main/langs/fin/src/syntax/disambiguation.cg3 |
||
− | |||
− | <pre> |
||
− | "<Alussa>" |
||
− | "alku" N Sg Ine |
||
− | "<loi>" |
||
− | "luoda" V Act Ind Pst Sg3 |
||
− | "<Jumala>" |
||
− | "Jumala" N Prop Sg Nom @→N |
||
− | "<taivaan>" |
||
− | "taivas" N Sg Gen @→N |
||
− | "<ja>" |
||
− | "ja" CC @CNP |
||
− | "<maan>" |
||
− | "maa" N Sg Gen @←OBJ |
||
− | "<.>" |
||
− | "." Punct CLB |
||
− | </pre> |
||
− | |||
− | |||
− | [[Category:Documentation]] |
||
− | [[Category:Documentation in English]] |
||
− | [[Category:Finnish]] |
Latest revision as of 19:33, 18 April 2017
Redirect to: