Omorfi
From Apertium
|
OMorFi (Open Morphology of Finnish) is a computational morphology of Finnish written using HFST.
[edit] Requirements
You will need HFST installed, you can follow the instructions on the HFST page.
[edit] Download
The following commands will download and prepare the build for OMorFi.
$ svn co http://svn.gna.org/svn/omorfi/trunk omorfi $ cd omorfi/ $ autoreconf -i $ ./configure --prefix=/home/fran/local $ cd src/
[edit] Compilation
You need at least 1.5Gb RAM to compile Omorfi, or be willing to let your machine sit around trashing for some hours.
$ make
This will compile everything. If your machine has less than 2Gb RAM you might want to just compile the analyser:
$ make mor-omorfi.hwfst
This could take 10--30 minutes.
[edit] Key
| Feature | Notes |
|---|---|
KTN | Inflection class (the first 40 or so are for nouns, the next 30 or so for verbs) |
KAV | Code (K) for consonant gradation (astevaihtelu (AV)). This is a letter from A to K or so. |
PCP | VA participle is the VA participle, i.e., the present participle |
[edit] Usage
After compiling, you can test it with the hfst-lookup program.
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan ." |\ sed 's/ /\n/g' | hfst-lookup src/mor-omorfi.hwfst kaikki [##]kaikki[POS=PRONOUN][NUM=SG][CASE=NOM][##] ihmiset [##]ihminen[POS=NOUN][KTN=38][NUM=PL][CASE=NOM,ACC][##] syntyvät [##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][MOOD=INDV][TENSE=PRES][PRS=PL3][##] syntyvät [##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][PCP=VA][CMP=POS][NUM=PL][CASE=NOM,ACC][##] vapaina [##]vapaa[POS=ADJECTIVE][KTN=17][CMP=POS][NUM=PL][CASE=ESS][##] ja [##]ja[POS=PARTICLE][##] ja [##]ja[POS=CONJUNCTION][##] tasavertaisina [##]tasavertainen[POS=ADJECTIVE][KTN=38][CMP=POS][NUM=PL][CASE=ESS][##] tasavertaisina [##]tasa[POS=NOUN][KTN=9][NUM=SG][CASE=NOM][#][?]vertainen[POS=ADJECTIVE][KTN=38][CMP=POS] [NUM=PL][CASE=ESS][##] arvoltaan [##]arvo[POS=NOUN][KTN=1][NUM=SG][CASE=ABL][POSS=SG3,PL3][##] ja [##]ja[POS=PARTICLE][##] ja [##]ja[POS=CONJUNCTION][##] oikeuksiltaan [##]oikeus[POS=NOUN][KTN=40][NUM=PL][CASE=ABL][POSS=SG3,PL3][##] . [##].[POS=PUNCTUATION][##]
To get the output to something approaching Apertium:
echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\ sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.hwfst |\ python omorfi-to-apertium.py ^kaikki/kaikki<PRONOUN><SG><NOM>$ ^ihmiset/ihminen<NOUN><38><PL><NOM,ACC>$ ^syntyvät/syntyä<VERB><52><J><ACT><INDV><PRES><PL3>/syntyä<VERB><52><J><ACT><VA><POS><PL><NOM,ACC>$ ^vapaina/vapaa<ADJECTIVE><17><POS><PL><ESS>$ ^ja/ja<PARTICLE>/ja<CONJUNCTION>$ ^tasavertaisina/tasavertainen<ADJECTIVE><38><POS><PL><ESS>/tasa<NOUN><9><SG><NOM>+vertainen<ADJECTIVE><38><POS><PL><ESS>$ ^arvoltaan/arvo<NOUN><1><SG><ABL><SG3,PL3>$ ^ja/ja<PARTICLE>/ja<CONJUNCTION>$ ^oikeuksiltaan/oikeus<NOUN><40><PL><ABL><SG3,PL3>$ ^./.<PUNCTUATION>$
The omorfi-to-apertium.py script can be found here and can also be run with the -c option to use the in-file tag conversion table.
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\ sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.hwfst |\ python omorfi-to-apertium.py -c ^kaikki/kaikki<Pron><Sg><Nom>$ ^ihmiset/ihminen<N><38><Pl><NOM,ACC>$ ^syntyvät/syntyä<V><52><J><Act><Indv><Pres><PL3>/syntyä<V><52><J><Act><VA><Pos><Pl><NOM,ACC>$ ^vapaina/vapaa<A><17><Pos><Pl><Ess>$ ^ja/ja<Part>/ja<Conj>$ ^tasavertaisina/tasavertainen<A><38><Pos><Pl><Ess>/tasa<N><9><Sg><Nom>+vertainen<A><38><Pos><Pl><Ess>$ ^arvoltaan/arvo<N><1><Sg><Abl><SG3,PL3>$ ^ja/ja<Part>/ja<Conj>$ ^oikeuksiltaan/oikeus<N><40><Pl><Abl><SG3,PL3>$ ^./.<Punct>$
[edit] See also
[edit] External links
- OMorFi: Installation
- Gna!: Omorfi
- Overview of the HFST project (pdf), esp. in relation to other FST technology

