Difference between revisions of "Omorfi"

From Apertium
Jump to navigation Jump to search
(Update omorfi stuff slightly)
(updated locations and usage)
Line 11: Line 11:
   
 
<pre>
 
<pre>
$ svn co http://svn.gna.org/svn/omorfi/trunk omorfi
+
$ git clone https://code.google.com/p/omorfi/
 
$ cd omorfi/
 
$ cd omorfi/
 
$ ./autogen.sh
 
$ ./autogen.sh
$ ./configure --prefix=${HOME}/local
+
$ ./configure --prefix=${HOME}/local --enable-multichar-format=apertium
 
</pre>
 
</pre>
   
Line 27: Line 27:
 
</pre>
 
</pre>
   
This will compile everything. If your machine has less than 2Gb RAM you might want to just compile the analyser:
+
This will compile everything.
   
  +
To prepare source code for new apertium language pair, use src/scripts/omor2apertium.sh
<pre>
 
$ cd src
 
$ make mor-omorfi.hfst
 
</pre>
 
 
This could take 10--30 minutes.
 
 
==Key==
 
 
{|class="wikitable"
 
! Feature !! Notes
 
|-
 
| <code>KTN</code> || Inflection class as it is in official dictionary (the first 49 or so are for nouns, the 52 through 72 for verbs)
 
|-
 
| <code>KAV</code> || Code (K) for consonant gradation (astevaihtelu (AV)). This is a letter from A to K or so.
 
|-
 
| <code>PCP</code> || VA participle is the VA participle, i.e., the present participle
 
|}
 
 
cf. [[http://home.gna.org/omorfi/omorfi/inflection.html Omorfi's documentation on inflection for up-to-date table]]
 
   
 
==Usage==
 
==Usage==
   
 
After compiling, you can test it with the <code>hfst-lookup</code> program.
 
After compiling, you can test it with the <code>hfst-lookup</code> program.
 
<pre>
 
 
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan ." |\
 
sed 's/ /\n/g' | hfst-lookup src/mor-omorfi.hfst
 
 
kaikki [##]kaikki[POS=PRONOUN][NUM=SG][CASE=NOM][##]
 
 
ihmiset [##]ihminen[POS=NOUN][KTN=38][NUM=PL][CASE=NOM,ACC][##]
 
 
syntyvät [##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][MOOD=INDV][TENSE=PRES][PRS=PL3][##]
 
syntyvät [##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][PCP=VA][CMP=POS][NUM=PL][CASE=NOM,ACC][##]
 
 
vapaina [##]vapaa[POS=ADJECTIVE][KTN=17][CMP=POS][NUM=PL][CASE=ESS][##]
 
 
ja [##]ja[POS=PARTICLE][##]
 
ja [##]ja[POS=CONJUNCTION][##]
 
 
tasavertaisina [##]tasavertainen[POS=ADJECTIVE][KTN=38][CMP=POS][NUM=PL][CASE=ESS][##]
 
tasavertaisina [##]tasa[POS=NOUN][KTN=9][NUM=SG][CASE=NOM][#][?]vertainen[POS=ADJECTIVE][KTN=38][CMP=POS]
 
[NUM=PL][CASE=ESS][##]
 
 
arvoltaan [##]arvo[POS=NOUN][KTN=1][NUM=SG][CASE=ABL][POSS=SG3,PL3][##]
 
 
ja [##]ja[POS=PARTICLE][##]
 
ja [##]ja[POS=CONJUNCTION][##]
 
 
oikeuksiltaan [##]oikeus[POS=NOUN][KTN=40][NUM=PL][CASE=ABL][POSS=SG3,PL3][##]
 
 
. [##].[POS=PUNCTUATION][##]
 
 
</pre>
 
 
To get the output to something approaching Apertium use mor-omorfi.apertium.hfst instead
 
 
<pre>
 
echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\
 
sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.apertium.hfst
 
 
^kaikki/kaikki<Pron><Sg><Nom>$ ^ihmiset/ihminen<N><38><Pl><NOM,ACC>$
 
^syntyvät/syntyä<V><52><J><Act><Indv><Pres><PL3>/syntyä<V><52><J><Act><VA><Pos><Pl><NOM,ACC>$
 
^vapaina/vapaa<A><17><Pos><Pl><Ess>$ ^ja/ja<Part>/ja<Conj>$
 
^tasavertaisina/tasavertainen<A><38><Pos><Pl><Ess>/tasa<N><9><Sg><Nom>+vertainen<A><38><Pos><Pl><Ess>$
 
^arvoltaan/arvo<N><1><Sg><Abl><SG3,PL3>$ ^ja/ja<Part>/ja<Conj>$ ^oikeuksiltaan/oikeus<N><40><Pl><Abl><SG3,PL3>$
 
^./.<Punct>$
 
 
</pre>
 
   
 
==See also==
 
==See also==
Line 107: Line 41:
   
 
==External links==
 
==External links==
  +
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion#Installation OMorFi: Installation]
 
* [https://gna.org/projects/omorfi/ Gna!: Omorfi]
+
* [http://code.google.com/p/omorfi Omorfi project site at google code]
 
* [http://langtech.jrc.it/FSMNLP2008/m/Koskenniemi_invited_talk.pdf Overview of the HFST project (pdf)], esp. in relation to other FST technology
 
* [http://langtech.jrc.it/FSMNLP2008/m/Koskenniemi_invited_talk.pdf Overview of the HFST project (pdf)], esp. in relation to other FST technology
   

Revision as of 02:01, 9 June 2013

Omorfi (Open Morphology of Finnish) is a computational morphology of Finnish written using HFST.

Requirements

You will need HFST installed, you can follow the instructions on the HFST page.

Download

The following commands will download and prepare the build for OMorFi.

$ git clone https://code.google.com/p/omorfi/
$ cd omorfi/
$ ./autogen.sh
$ ./configure --prefix=${HOME}/local --enable-multichar-format=apertium

In case autogen.sh does not work, do report a bug (autoreconf -i should work just as well in the meantime).

Compilation

You need at least 1.5Gb RAM to compile Omorfi, or be willing to let your machine sit around trashing for some hours.

$ make

This will compile everything.

To prepare source code for new apertium language pair, use src/scripts/omor2apertium.sh

Usage

After compiling, you can test it with the hfst-lookup program.

See also

External links