Omorfi
Omorfi (Open Morphology of Finnish) is a computational morphology of Finnish written using HFST.
Requirements
You will need HFST installed, you can follow the instructions on the HFST page.
Download
The following commands will download and prepare the build for OMorFi.
$ svn co http://svn.gna.org/svn/omorfi/trunk omorfi $ cd omorfi/ $ ./autogen.sh $ ./configure --prefix=${HOME}/local
In case autogen.sh does not work, do report a bug (autoreconf -i should work just as well in the meantime).
Compilation
You need at least 1.5Gb RAM to compile Omorfi, or be willing to let your machine sit around trashing for some hours.
$ make
This will compile everything. If your machine has less than 2Gb RAM you might want to just compile the analyser:
$ cd src $ make mor-omorfi.hfst
This could take 10--30 minutes.
Key
Feature | Notes |
---|---|
KTN |
Inflection class as it is in official dictionary (the first 49 or so are for nouns, the 52 through 72 for verbs) |
KAV |
Code (K) for consonant gradation (astevaihtelu (AV)). This is a letter from A to K or so. |
PCP |
VA participle is the VA participle, i.e., the present participle |
cf. [Omorfi's documentation on inflection for up-to-date table]
Usage
After compiling, you can test it with the hfst-lookup
program.
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan ." |\ sed 's/ /\n/g' | hfst-lookup src/mor-omorfi.hfst kaikki [##]kaikki[POS=PRONOUN][NUM=SG][CASE=NOM][##] ihmiset [##]ihminen[POS=NOUN][KTN=38][NUM=PL][CASE=NOM,ACC][##] syntyvät [##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][MOOD=INDV][TENSE=PRES][PRS=PL3][##] syntyvät [##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][PCP=VA][CMP=POS][NUM=PL][CASE=NOM,ACC][##] vapaina [##]vapaa[POS=ADJECTIVE][KTN=17][CMP=POS][NUM=PL][CASE=ESS][##] ja [##]ja[POS=PARTICLE][##] ja [##]ja[POS=CONJUNCTION][##] tasavertaisina [##]tasavertainen[POS=ADJECTIVE][KTN=38][CMP=POS][NUM=PL][CASE=ESS][##] tasavertaisina [##]tasa[POS=NOUN][KTN=9][NUM=SG][CASE=NOM][#][?]vertainen[POS=ADJECTIVE][KTN=38][CMP=POS] [NUM=PL][CASE=ESS][##] arvoltaan [##]arvo[POS=NOUN][KTN=1][NUM=SG][CASE=ABL][POSS=SG3,PL3][##] ja [##]ja[POS=PARTICLE][##] ja [##]ja[POS=CONJUNCTION][##] oikeuksiltaan [##]oikeus[POS=NOUN][KTN=40][NUM=PL][CASE=ABL][POSS=SG3,PL3][##] . [##].[POS=PUNCTUATION][##]
To get the output to something approaching Apertium use mor-omorfi.apertium.hfst instead
echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\ sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.apertium.hfst ^kaikki/kaikki<Pron><Sg><Nom>$ ^ihmiset/ihminen<N><38><Pl><NOM,ACC>$ ^syntyvät/syntyä<V><52><J><Act><Indv><Pres><PL3>/syntyä<V><52><J><Act><VA><Pos><Pl><NOM,ACC>$ ^vapaina/vapaa<A><17><Pos><Pl><Ess>$ ^ja/ja<Part>/ja<Conj>$ ^tasavertaisina/tasavertainen<A><38><Pos><Pl><Ess>/tasa<N><9><Sg><Nom>+vertainen<A><38><Pos><Pl><Ess>$ ^arvoltaan/arvo<N><1><Sg><Abl><SG3,PL3>$ ^ja/ja<Part>/ja<Conj>$ ^oikeuksiltaan/oikeus<N><40><Pl><Abl><SG3,PL3>$ ^./.<Punct>$
See also
External links
- OMorFi: Installation
- Gna!: Omorfi
- Overview of the HFST project (pdf), esp. in relation to other FST technology