Difference between revisions of "Freeling"
|  (GitHub migration) | |||
| (16 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
| {{Github-unmigrated-tool}} | |||
| ⚫ | '''Freeling''' is a suite of language processing tools, including a tokeniser, sentence splitter, morphological analyser, sense tagger, named-entity recogniser, chunker and dependency parser, etc. Much of  | ||
| {{TOCD}} | |||
| ⚫ | '''Freeling''' is a suite of language processing tools, including a tokeniser, sentence splitter, morphological analyser, sense tagger, named-entity recogniser, chunker and dependency parser, etc. It is used for example in the Spanish→Basque MT system [[Matxin]]. Much of the work of the suite is also done in [[lttoolbox]], [[constraint grammar]] and [[apertium]], but in some cases data or tools from Freeling could be useful to apertium, and data from apertium could be useful to Freeling. | ||
| ==Tools== | ==Tools== | ||
| Line 6: | Line 8: | ||
| * <code>dix-to-maco.py</code> -- Convert between an [[lttoolbox]] expanded dictionary and a Freeling 'maco' format full-form list. | * <code>dix-to-maco.py</code> -- Convert between an [[lttoolbox]] expanded dictionary and a Freeling 'maco' format full-form list. | ||
| * <code>tagger-to- | * <code>tagger-to-freeling.py</code> -- Convert between the output of <code>apertium-tagger</code> to Freeling style tagged output. | ||
| * <code>freeling-to-tagger.py</code> -- Convert Freeling tagged output to apertium tagged output. | |||
| Both scripts require a file with correspondences between apertium tags and PAROLE style tags. | Both scripts require a file with correspondences between apertium tags and PAROLE style tags. The following mappings exist: | ||
| * <code>br-tags.parole.txt</code> -- Mappings between apertium tags and PAROLE tags for Breton | |||
| ==Example== | |||
| * <code>cy-tags.parole.txt</code> -- Mappings between apertium tags and PAROLE tags for Welsh | |||
| * <code>es-tags.parole.txt</code> -- Mappings between apertium tags and PAROLE tags for Spanish | |||
| ==Examples== | |||
| Say for example we want to analyse and tag a text with apertium format, and then convert to Freeling format in order to perform a chunking. | Say for example we want to analyse and tag a text with apertium format, and then convert to Freeling format in order to perform a chunking. | ||
| Line 47: | Line 54: | ||
| <pre> | <pre> | ||
| $ echo "Bro gozh ma zadoù" |  | $ echo "Bro gozh ma zadoù" | fl-morph -f br.cfg | ||
| Bro bro NCFSV0 0.992647   0.00735294 | Bro bro NCFSV0 0.992647   0.00735294 | ||
| gozh kozh AQ0CN0 0.975   0.025 | gozh kozh AQ0CN0 0.975   0.025 | ||
| Line 53: | Line 60: | ||
| zadoù tad NCMPV0 0.992647   0.00735294 | zadoù tad NCMPV0 0.992647   0.00735294 | ||
| $ echo "Bro gozh ma zadoù" |  | $ echo "Bro gozh ma zadoù" | fl-morph -f br.cfg | fl-tagger -f br.cfg  | ||
| Bro bro NCFSV0 0.992647 | Bro bro NCFSV0 0.992647 | ||
| gozh kozh AQ0CN0 0.975 | gozh kozh AQ0CN0 0.975 | ||
| Line 61: | Line 68: | ||
| (Note: In this example, the tagging is erroneous, resulting from a poorly trained HMM). | (Note: In this example, the tagging is erroneous, resulting from a poorly trained HMM). | ||
| ==Todo== | |||
| * Extraction of multiword units into <code>locucion.dat</code> file. | |||
| ==Installing from SVN== | |||
| See [http://garraf.epsevg.upc.es/freeling/doc/userman/html/node13.html their docs].  | |||
| ===Mac OS X=== | |||
| Install the prerequisites with [http://www.macports.org Macports]: | |||
|  sudo port install boost pcre db46 | |||
| For the configure stage on a Mac, I had to use: | |||
|  env LDFLAGS=-L/opt/local/lib CPPFLAGS=-I/opt/local/include ./configure | |||
| for omlet and fries, and | |||
|  env LDFLAGS="-L/opt/local/lib -L/opt/local/lib/db46" CPPFLAGS="-I/opt/local/include -I/opt/local/include/boost -I/opt/local/include/db46" ./configure | |||
| for freeling. Additionally, I had to do the following: | |||
|  ln -s /opt/local/lib/libboost_filesystem-mt.a  /opt/local/lib/libboost_filesystem.a | |||
|  ln -s /opt/local/lib/libboost_filesystem-mt.dylib  /opt/local/lib/libboost_filesystem.dylib | |||
| ...since [http://trac.macports.org/ticket/14365 apparently they're the same]. | |||
| Also, to install the data, I had to change the lines in freeling/data/Makefile.am that looked like | |||
|  asdata_DATA = es/* | |||
| into  | |||
|  asdata_DATA = es/*.* | |||
| since otherwise install complained about there being subdirectories (<code>install: ./as/dep: Inappropriate file type or format make[2]: *** [install-asdataDATA] Error 71 make[1]: *** [install-am] Error 2make: *** [install-recursive] Error 1</code>). | |||
| Now you should get something like this: | |||
|  $ echo es muy importante| FREELINGSHARE=/usr/local/share/FreeLing/ analyzer -f /usr/local/share/FreeLing/config/es.cfg  | |||
|  es es RG 0.181644 | |||
|  muy muy RG 1 | |||
|  importante importante AQ0CS0 1 | |||
| ==External links== | ==External links== | ||
| * [http://garraf.epsevg.upc.es/freeling/  FreeLing Home Page] | * [http://garraf.epsevg.upc.es/freeling/  FreeLing Home Page] | ||
| * [http://cl.aist-nara.ac.jp/~eric-n/ubuntu-nlp/ the Ubuntu NLP Repository] contains a packaged version for Debian-based distributions. | |||
| [[Category:Tools]] | [[Category:Tools]] | ||
| [[Category:Morphological analysers]] | |||
Latest revision as of 02:20, 10 March 2018
Freeling is a suite of language processing tools, including a tokeniser, sentence splitter, morphological analyser, sense tagger, named-entity recogniser, chunker and dependency parser, etc. It is used for example in the Spanish→Basque MT system Matxin. Much of the work of the suite is also done in lttoolbox, constraint grammar and apertium, but in some cases data or tools from Freeling could be useful to apertium, and data from apertium could be useful to Freeling.
Tools[edit]
There are some scripts in apertium SVN (module apertium-tools/freeling) for converting between apertium formats and Freeling formats.
- dix-to-maco.py-- Convert between an lttoolbox expanded dictionary and a Freeling 'maco' format full-form list.
- tagger-to-freeling.py-- Convert between the output of- apertium-taggerto Freeling style tagged output.
- freeling-to-tagger.py-- Convert Freeling tagged output to apertium tagged output.
Both scripts require a file with correspondences between apertium tags and PAROLE style tags. The following mappings exist:
- br-tags.parole.txt-- Mappings between apertium tags and PAROLE tags for Breton
- cy-tags.parole.txt-- Mappings between apertium tags and PAROLE tags for Welsh
- es-tags.parole.txt-- Mappings between apertium tags and PAROLE tags for Spanish
Examples[edit]
Say for example we want to analyse and tag a text with apertium format, and then convert to Freeling format in order to perform a chunking.
$ echo "Bro gozh ma zadoù" | lt-proc br-fr.automorf.bin  | cg-proc br-fr.rlx.bin  | apertium-tagger -p -g br-fr.prob  
^Bro/Bro<n><f><sg>$ ^gozh/kozh<adj><mf><sp>$ ^ma/ma<det><pos><mf><sp>$ ^zadoù/tad<n><m><pl>$
$ echo "Bro gozh ma zadoù" | lt-proc br-fr.automorf.bin  | cg-proc br-fr.rlx.bin  | apertium-tagger -p -g br-fr.prob  | \
  tagger-to-freeling.py parole-tags.txt 
Bro Bro NCFSV0
gozh kozh AQ0CN0
ma ma DP0CN0
zadoù tad NCMPV0
$ echo "Bro gozh ma zadoù" | lt-proc br-fr.automorf.bin  | cg-proc br-fr.rlx.bin  | apertium-tagger -p -g br-fr.prob  | \
  tagger-to-freeling.py parole-tags.txt | fl-chunker -f br.cfg
sn_[
  +grup-n_[
    +(Bro Bro NCFSV0 -)
     (gozh kozh AQ0CN0 -)
    ]
  ]
  det_[
    +(ma ma DP0CN0 -)
  ]
  grup-n_[
    +(zadoù tad NCMPV0 -)
  ]
]
Or perhaps we want to analyse and tag directly with a Freeling analyser generated from an lttoolbox dictionary:
$ echo "Bro gozh ma zadoù" | fl-morph -f br.cfg Bro bro NCFSV0 0.992647 0.00735294 gozh kozh AQ0CN0 0.975 0.025 ma ma NCMSV0 0.934028 ma DP0CN0 0.0590278 ma CS 0.00347222 0.00347222 zadoù tad NCMPV0 0.992647 0.00735294 $ echo "Bro gozh ma zadoù" | fl-morph -f br.cfg | fl-tagger -f br.cfg Bro bro NCFSV0 0.992647 gozh kozh AQ0CN0 0.975 ma ma NCMSV0 0.934028 zadoù tad NCMPV0 0.992647
(Note: In this example, the tagging is erroneous, resulting from a poorly trained HMM).
Todo[edit]
- Extraction of multiword units into locucion.datfile.
Installing from SVN[edit]
See their docs.
Mac OS X[edit]
Install the prerequisites with Macports:
sudo port install boost pcre db46
For the configure stage on a Mac, I had to use:
env LDFLAGS=-L/opt/local/lib CPPFLAGS=-I/opt/local/include ./configure
for omlet and fries, and
env LDFLAGS="-L/opt/local/lib -L/opt/local/lib/db46" CPPFLAGS="-I/opt/local/include -I/opt/local/include/boost -I/opt/local/include/db46" ./configure
for freeling. Additionally, I had to do the following:
ln -s /opt/local/lib/libboost_filesystem-mt.a /opt/local/lib/libboost_filesystem.a ln -s /opt/local/lib/libboost_filesystem-mt.dylib /opt/local/lib/libboost_filesystem.dylib
...since apparently they're the same.
Also, to install the data, I had to change the lines in freeling/data/Makefile.am that looked like
asdata_DATA = es/*
into
asdata_DATA = es/*.*
since otherwise install complained about there being subdirectories (install: ./as/dep: Inappropriate file type or format make[2]: *** [install-asdataDATA] Error 71 make[1]: *** [install-am] Error 2make: *** [install-recursive] Error 1).
Now you should get something like this:
$ echo es muy importante| FREELINGSHARE=/usr/local/share/FreeLing/ analyzer -f /usr/local/share/FreeLing/config/es.cfg es es RG 0.181644 muy muy RG 1 importante importante AQ0CS0 1
External links[edit]
- FreeLing Home Page
- the Ubuntu NLP Repository contains a packaged version for Debian-based distributions.

