Difference between revisions of "Omorfi"

From Apertium
Jump to navigation Jump to search
(Update omorfi stuff slightly)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
'''OMorFi''' (Open Morphology of Finnish) is a computational morphology of Finnish written using [[HFST]].
+
'''Omorfi''' (Open Morphology of Finnish) is a computational morphology of Finnish written using [[HFST]].
   
 
==Requirements==
 
==Requirements==
Line 13: Line 13:
 
$ svn co http://svn.gna.org/svn/omorfi/trunk omorfi
 
$ svn co http://svn.gna.org/svn/omorfi/trunk omorfi
 
$ cd omorfi/
 
$ cd omorfi/
  +
$ ./autogen.sh
$ autoreconf -i
 
$ ./configure --prefix=/home/fran/local
+
$ ./configure --prefix=${HOME}/local
$ cd src/
 
 
</pre>
 
</pre>
  +
  +
In case autogen.sh does not work, do report a bug (autoreconf -i should work just as well in the meantime).
   
 
==Compilation==
 
==Compilation==
Line 29: Line 30:
   
 
<pre>
 
<pre>
 
$ cd src
$ make mor-omorfi.hwfst
+
$ make mor-omorfi.hfst
 
</pre>
 
</pre>
   
Line 39: Line 41:
 
! Feature !! Notes
 
! Feature !! Notes
 
|-
 
|-
| <code>KTN</code> || Inflection class (the first 40 or so are for nouns, the next 30 or so for verbs)
+
| <code>KTN</code> || Inflection class as it is in official dictionary (the first 49 or so are for nouns, the 52 through 72 for verbs)
 
|-
 
|-
 
| <code>KAV</code> || Code (K) for consonant gradation (astevaihtelu (AV)). This is a letter from A to K or so.
 
| <code>KAV</code> || Code (K) for consonant gradation (astevaihtelu (AV)). This is a letter from A to K or so.
Line 45: Line 47:
 
| <code>PCP</code> || VA participle is the VA participle, i.e., the present participle
 
| <code>PCP</code> || VA participle is the VA participle, i.e., the present participle
 
|}
 
|}
  +
  +
cf. [[http://home.gna.org/omorfi/omorfi/inflection.html Omorfi's documentation on inflection for up-to-date table]]
   
 
==Usage==
 
==Usage==
Line 53: Line 57:
   
 
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan ." |\
 
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan ." |\
sed 's/ /\n/g' | hfst-lookup src/mor-omorfi.hwfst
+
sed 's/ /\n/g' | hfst-lookup src/mor-omorfi.hfst
   
 
kaikki [##]kaikki[POS=PRONOUN][NUM=SG][CASE=NOM][##]
 
kaikki [##]kaikki[POS=PRONOUN][NUM=SG][CASE=NOM][##]
Line 82: Line 86:
 
</pre>
 
</pre>
   
To get the output to something approaching Apertium:
+
To get the output to something approaching Apertium use mor-omorfi.apertium.hfst instead
   
 
<pre>
 
<pre>
 
echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\
 
echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\
sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.hwfst |\
+
sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.apertium.hfst
python omorfi-to-apertium.py
 
 
^kaikki/kaikki<PRONOUN><SG><NOM>$ ^ihmiset/ihminen<NOUN><38><PL><NOM,ACC>$
 
^syntyvät/syntyä<VERB><52><J><ACT><INDV><PRES><PL3>/syntyä<VERB><52><J><ACT><VA><POS><PL><NOM,ACC>$
 
^vapaina/vapaa<ADJECTIVE><17><POS><PL><ESS>$ ^ja/ja<PARTICLE>/ja<CONJUNCTION>$
 
^tasavertaisina/tasavertainen<ADJECTIVE><38><POS><PL><ESS>/tasa<NOUN><9><SG><NOM>+vertainen<ADJECTIVE><38><POS><PL><ESS>$
 
^arvoltaan/arvo<NOUN><1><SG><ABL><SG3,PL3>$ ^ja/ja<PARTICLE>/ja<CONJUNCTION>$
 
^oikeuksiltaan/oikeus<NOUN><40><PL><ABL><SG3,PL3>$ ^./.<PUNCTUATION>$
 
 
</pre>
 
 
The <code>omorfi-to-apertium.py</code> script can be found [https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/tools/omorfi-to-apertium.py here] and can also be run with the <code>-c</code> option to use the in-file tag conversion table.
 
 
<pre>
 
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\
 
sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.hwfst |\
 
python omorfi-to-apertium.py -c
 
   
 
^kaikki/kaikki<Pron><Sg><Nom>$ ^ihmiset/ihminen<N><38><Pl><NOM,ACC>$
 
^kaikki/kaikki<Pron><Sg><Nom>$ ^ihmiset/ihminen<N><38><Pl><NOM,ACC>$
Line 120: Line 107:
   
 
==External links==
 
==External links==
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation OMorFi: Installation]
+
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion#Installation OMorFi: Installation]
 
* [https://gna.org/projects/omorfi/ Gna!: Omorfi]
 
* [https://gna.org/projects/omorfi/ Gna!: Omorfi]
 
* [http://langtech.jrc.it/FSMNLP2008/m/Koskenniemi_invited_talk.pdf Overview of the HFST project (pdf)], esp. in relation to other FST technology
 
* [http://langtech.jrc.it/FSMNLP2008/m/Koskenniemi_invited_talk.pdf Overview of the HFST project (pdf)], esp. in relation to other FST technology

Revision as of 07:13, 20 April 2010

Omorfi (Open Morphology of Finnish) is a computational morphology of Finnish written using HFST.

Requirements

You will need HFST installed, you can follow the instructions on the HFST page.

Download

The following commands will download and prepare the build for OMorFi.

$ svn co http://svn.gna.org/svn/omorfi/trunk omorfi
$ cd omorfi/
$ ./autogen.sh
$ ./configure --prefix=${HOME}/local

In case autogen.sh does not work, do report a bug (autoreconf -i should work just as well in the meantime).

Compilation

You need at least 1.5Gb RAM to compile Omorfi, or be willing to let your machine sit around trashing for some hours.

$ make

This will compile everything. If your machine has less than 2Gb RAM you might want to just compile the analyser:

$ cd src
$ make mor-omorfi.hfst

This could take 10--30 minutes.

Key

Feature Notes
KTN Inflection class as it is in official dictionary (the first 49 or so are for nouns, the 52 through 72 for verbs)
KAV Code (K) for consonant gradation (astevaihtelu (AV)). This is a letter from A to K or so.
PCP VA participle is the VA participle, i.e., the present participle

cf. [Omorfi's documentation on inflection for up-to-date table]

Usage

After compiling, you can test it with the hfst-lookup program.


$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan ." |\
   sed 's/ /\n/g' | hfst-lookup src/mor-omorfi.hfst

kaikki	[##]kaikki[POS=PRONOUN][NUM=SG][CASE=NOM][##]

ihmiset	[##]ihminen[POS=NOUN][KTN=38][NUM=PL][CASE=NOM,ACC][##]

syntyvät	[##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][MOOD=INDV][TENSE=PRES][PRS=PL3][##]
syntyvät	[##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][PCP=VA][CMP=POS][NUM=PL][CASE=NOM,ACC][##]

vapaina	[##]vapaa[POS=ADJECTIVE][KTN=17][CMP=POS][NUM=PL][CASE=ESS][##]

ja	[##]ja[POS=PARTICLE][##]
ja	[##]ja[POS=CONJUNCTION][##]

tasavertaisina	[##]tasavertainen[POS=ADJECTIVE][KTN=38][CMP=POS][NUM=PL][CASE=ESS][##]
tasavertaisina	[##]tasa[POS=NOUN][KTN=9][NUM=SG][CASE=NOM][#][?]vertainen[POS=ADJECTIVE][KTN=38][CMP=POS]
[NUM=PL][CASE=ESS][##]

arvoltaan	[##]arvo[POS=NOUN][KTN=1][NUM=SG][CASE=ABL][POSS=SG3,PL3][##]

ja	[##]ja[POS=PARTICLE][##]
ja	[##]ja[POS=CONJUNCTION][##]

oikeuksiltaan	[##]oikeus[POS=NOUN][KTN=40][NUM=PL][CASE=ABL][POSS=SG3,PL3][##]

.	[##].[POS=PUNCTUATION][##]

To get the output to something approaching Apertium use mor-omorfi.apertium.hfst instead

echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\
  sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.apertium.hfst

^kaikki/kaikki<Pron><Sg><Nom>$ ^ihmiset/ihminen<N><38><Pl><NOM,ACC>$ 
^syntyvät/syntyä<V><52><J><Act><Indv><Pres><PL3>/syntyä<V><52><J><Act><VA><Pos><Pl><NOM,ACC>$ 
^vapaina/vapaa<A><17><Pos><Pl><Ess>$ ^ja/ja<Part>/ja<Conj>$ 
^tasavertaisina/tasavertainen<A><38><Pos><Pl><Ess>/tasa<N><9><Sg><Nom>+vertainen<A><38><Pos><Pl><Ess>$ 
^arvoltaan/arvo<N><1><Sg><Abl><SG3,PL3>$ ^ja/ja<Part>/ja<Conj>$ ^oikeuksiltaan/oikeus<N><40><Pl><Abl><SG3,PL3>$ 
^./.<Punct>$ 

See also

External links