Nine students selected for the 2010 Google Summer of Code!
Omorfi
From Apertium
|
Omorfi (Open Morphology of Finnish) is a computational morphology of Finnish written using HFST.
[edit] Requirements
You will need HFST installed, you can follow the instructions on the HFST page.
[edit] Download
The following commands will download and prepare the build for OMorFi.
$ svn co http://svn.gna.org/svn/omorfi/trunk omorfi
$ cd omorfi/
$ ./autogen.sh
$ ./configure --prefix=${HOME}/local
In case autogen.sh does not work, do report a bug (autoreconf -i should work just as well in the meantime).
[edit] Compilation
You need at least 1.5Gb RAM to compile Omorfi, or be willing to let your machine sit around trashing for some hours.
$ make
This will compile everything. If your machine has less than 2Gb RAM you might want to just compile the analyser:
$ cd src $ make mor-omorfi.hfst
This could take 10--30 minutes.
[edit] Key
| Feature | Notes |
|---|---|
KTN | Inflection class as it is in official dictionary (the first 49 or so are for nouns, the 52 through 72 for verbs) |
KAV | Code (K) for consonant gradation (astevaihtelu (AV)). This is a letter from A to K or so. |
PCP | VA participle is the VA participle, i.e., the present participle |
cf. [Omorfi's documentation on inflection for up-to-date table]
[edit] Usage
After compiling, you can test it with the hfst-lookup program.
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan ." |\ sed 's/ /\n/g' | hfst-lookup src/mor-omorfi.hfst kaikki [##]kaikki[POS=PRONOUN][NUM=SG][CASE=NOM][##] ihmiset [##]ihminen[POS=NOUN][KTN=38][NUM=PL][CASE=NOM,ACC][##] syntyvät [##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][MOOD=INDV][TENSE=PRES][PRS=PL3][##] syntyvät [##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][PCP=VA][CMP=POS][NUM=PL][CASE=NOM,ACC][##] vapaina [##]vapaa[POS=ADJECTIVE][KTN=17][CMP=POS][NUM=PL][CASE=ESS][##] ja [##]ja[POS=PARTICLE][##] ja [##]ja[POS=CONJUNCTION][##] tasavertaisina [##]tasavertainen[POS=ADJECTIVE][KTN=38][CMP=POS][NUM=PL][CASE=ESS][##] tasavertaisina [##]tasa[POS=NOUN][KTN=9][NUM=SG][CASE=NOM][#][?]vertainen[POS=ADJECTIVE][KTN=38][CMP=POS] [NUM=PL][CASE=ESS][##] arvoltaan [##]arvo[POS=NOUN][KTN=1][NUM=SG][CASE=ABL][POSS=SG3,PL3][##] ja [##]ja[POS=PARTICLE][##] ja [##]ja[POS=CONJUNCTION][##] oikeuksiltaan [##]oikeus[POS=NOUN][KTN=40][NUM=PL][CASE=ABL][POSS=SG3,PL3][##] . [##].[POS=PUNCTUATION][##]
To get the output to something approaching Apertium use mor-omorfi.apertium.hfst instead
echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\ sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.apertium.hfst ^kaikki/kaikki<Pron><Sg><Nom>$ ^ihmiset/ihminen<N><38><Pl><NOM,ACC>$ ^syntyvät/syntyä<V><52><J><Act><Indv><Pres><PL3>/syntyä<V><52><J><Act><VA><Pos><Pl><NOM,ACC>$ ^vapaina/vapaa<A><17><Pos><Pl><Ess>$ ^ja/ja<Part>/ja<Conj>$ ^tasavertaisina/tasavertainen<A><38><Pos><Pl><Ess>/tasa<N><9><Sg><Nom>+vertainen<A><38><Pos><Pl><Ess>$ ^arvoltaan/arvo<N><1><Sg><Abl><SG3,PL3>$ ^ja/ja<Part>/ja<Conj>$ ^oikeuksiltaan/oikeus<N><40><Pl><Abl><SG3,PL3>$ ^./.<Punct>$
[edit] See also
[edit] External links
- OMorFi: Installation
- Gna!: Omorfi
- Overview of the HFST project (pdf), esp. in relation to other FST technology

