Difference between revisions of "Omorfi"

From Apertium
Jump to navigation Jump to search
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
'''OMorFi''' (Open Morphology of Finnish) is a computational morphology of Finnish written using [[HFST]].
+
'''Omorfi''' (Open Morphology of Finnish) is a computational morphology of Finnish written using [[HFST]].
   
 
==Requirements==
 
==Requirements==
Line 11: Line 11:
   
 
<pre>
 
<pre>
$ svn co http://svn.gna.org/svn/omorfi/trunk omorfi
+
$ git clone https://github.com/flammie/omorfi
 
$ cd omorfi/
 
$ cd omorfi/
  +
$ ./autogen.sh
$ autoreconf -i
 
$ ./configure --prefix=/home/fran/local
+
$ ./configure
$ cd src/
 
 
</pre>
 
</pre>
  +
  +
In case autogen.sh does not work, do report a bug (autoreconf -i should work just as well in the meantime).
   
 
==Compilation==
 
==Compilation==
Line 26: Line 27:
 
</pre>
 
</pre>
   
This will compile everything. If your machine has less than 2Gb RAM you might want to just compile the analyser:
+
This will compile everything.
   
  +
To prepare source code for new apertium language pair, use src/scripts/omor2apertium.sh... or just copy one from an existing pair, such as apertium-fin-eng.
<pre>
 
$ make mor-omorfi.hwfst
 
</pre>
 
 
This could take 10--30 minutes.
 
 
==Key==
 
 
{|
 
! Feature !! Notes
 
|-
 
| <code>KTN</code> || Inflection class (the first 40 or so are for nouns, the next 30 or so for verbs)
 
|-
 
| <code>KAV</code> || Code (K) for consonant gradation (astevaihtelu (AV)). This is a letter from A to K or so.
 
|-
 
| <code>PCP</code> || VA participle is the VA participle, i.e., the present participle
 
|}
 
   
 
==Usage==
 
==Usage==
   
 
After compiling, you can test it with the <code>hfst-lookup</code> program.
 
After compiling, you can test it with the <code>hfst-lookup</code> program.
 
<pre>
 
 
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan ." |\
 
sed 's/ /\n/g' | hfst-lookup src/mor-omorfi.hwfst
 
 
kaikki [##]kaikki[POS=PRONOUN][NUM=SG][CASE=NOM][##]
 
 
ihmiset [##]ihminen[POS=NOUN][KTN=38][NUM=PL][CASE=NOM,ACC][##]
 
 
syntyvät [##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][MOOD=INDV][TENSE=PRES][PRS=PL3][##]
 
syntyvät [##]syntyä[POS=VERB][KTN=52][KAV=J][GEN=ACT][PCP=VA][CMP=POS][NUM=PL][CASE=NOM,ACC][##]
 
 
vapaina [##]vapaa[POS=ADJECTIVE][KTN=17][CMP=POS][NUM=PL][CASE=ESS][##]
 
 
ja [##]ja[POS=PARTICLE][##]
 
ja [##]ja[POS=CONJUNCTION][##]
 
 
tasavertaisina [##]tasavertainen[POS=ADJECTIVE][KTN=38][CMP=POS][NUM=PL][CASE=ESS][##]
 
tasavertaisina [##]tasa[POS=NOUN][KTN=9][NUM=SG][CASE=NOM][#][?]vertainen[POS=ADJECTIVE][KTN=38][CMP=POS]
 
[NUM=PL][CASE=ESS][##]
 
 
arvoltaan [##]arvo[POS=NOUN][KTN=1][NUM=SG][CASE=ABL][POSS=SG3,PL3][##]
 
 
ja [##]ja[POS=PARTICLE][##]
 
ja [##]ja[POS=CONJUNCTION][##]
 
 
oikeuksiltaan [##]oikeus[POS=NOUN][KTN=40][NUM=PL][CASE=ABL][POSS=SG3,PL3][##]
 
 
. [##].[POS=PUNCTUATION][##]
 
 
</pre>
 
 
To get the output to something approaching Apertium:
 
 
<pre>
 
echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\
 
sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.hwfst |\
 
python omorfi-to-apertium.py
 
 
^kaikki/kaikki<PRONOUN><SG><NOM>$ ^ihmiset/ihminen<NOUN><38><PL><NOM,ACC>$
 
^syntyvät/syntyä<VERB><52><J><ACT><INDV><PRES><PL3>/syntyä<VERB><52><J><ACT><VA><POS><PL><NOM,ACC>$
 
^vapaina/vapaa<ADJECTIVE><17><POS><PL><ESS>$ ^ja/ja<PARTICLE>/ja<CONJUNCTION>$
 
^tasavertaisina/tasavertainen<ADJECTIVE><38><POS><PL><ESS>/tasa<NOUN><9><SG><NOM>+vertainen<ADJECTIVE><38><POS><PL><ESS>$
 
^arvoltaan/arvo<NOUN><1><SG><ABL><SG3,PL3>$ ^ja/ja<PARTICLE>/ja<CONJUNCTION>$
 
^oikeuksiltaan/oikeus<NOUN><40><PL><ABL><SG3,PL3>$ ^./.<PUNCTUATION>$
 
 
</pre>
 
 
The <code>omorfi-to-apertium.py</code> script can be found [https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/tools/omorfi-to-apertium.py here] and can also be run with the <code>-c</code> option to use the in-file tag conversion table.
 
 
<pre>
 
$ echo "kaikki ihmiset syntyvät vapaina ja tasavertaisina arvoltaan ja oikeuksiltaan." |\
 
sed 's/$/¶/g' | sed 's/\W/\n&\n/g' | grep -v '^ $' | hfst-lookup src/mor-omorfi.hwfst |\
 
python omorfi-to-apertium.py -c
 
 
^kaikki/kaikki<Pron><Sg><Nom>$ ^ihmiset/ihminen<N><38><Pl><NOM,ACC>$
 
^syntyvät/syntyä<V><52><J><Act><Indv><Pres><PL3>/syntyä<V><52><J><Act><VA><Pos><Pl><NOM,ACC>$
 
^vapaina/vapaa<A><17><Pos><Pl><Ess>$ ^ja/ja<Part>/ja<Conj>$
 
^tasavertaisina/tasavertainen<A><38><Pos><Pl><Ess>/tasa<N><9><Sg><Nom>+vertainen<A><38><Pos><Pl><Ess>$
 
^arvoltaan/arvo<N><1><Sg><Abl><SG3,PL3>$ ^ja/ja<Part>/ja<Conj>$ ^oikeuksiltaan/oikeus<N><40><Pl><Abl><SG3,PL3>$
 
^./.<Punct>$
 
 
</pre>
 
   
 
==See also==
 
==See also==
Line 120: Line 41:
   
 
==External links==
 
==External links==
  +
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation OMorFi: Installation]
 
* [https://gna.org/projects/omorfi/ Gna!: Omorfi]
+
* [http://code.google.com/p/omorfi Omorfi project site at google code]
 
* [http://langtech.jrc.it/FSMNLP2008/m/Koskenniemi_invited_talk.pdf Overview of the HFST project (pdf)], esp. in relation to other FST technology
 
* [http://langtech.jrc.it/FSMNLP2008/m/Koskenniemi_invited_talk.pdf Overview of the HFST project (pdf)], esp. in relation to other FST technology
   

Latest revision as of 14:53, 2 June 2016

Omorfi (Open Morphology of Finnish) is a computational morphology of Finnish written using HFST.

Requirements[edit]

You will need HFST installed, you can follow the instructions on the HFST page.

Download[edit]

The following commands will download and prepare the build for OMorFi.

$ git clone https://github.com/flammie/omorfi
$ cd omorfi/
$ ./autogen.sh
$ ./configure

In case autogen.sh does not work, do report a bug (autoreconf -i should work just as well in the meantime).

Compilation[edit]

You need at least 1.5Gb RAM to compile Omorfi, or be willing to let your machine sit around trashing for some hours.

$ make

This will compile everything.

To prepare source code for new apertium language pair, use src/scripts/omor2apertium.sh... or just copy one from an existing pair, such as apertium-fin-eng.

Usage[edit]

After compiling, you can test it with the hfst-lookup program.

See also[edit]

External links[edit]