Difference between revisions of "Freeling"

From Apertium
Jump to navigation Jump to search
Line 12: Line 12:
   
 
* <code>br-tags.parole.txt</code> -- Mappings between apertium tags and PAROLE tags for Breton
 
* <code>br-tags.parole.txt</code> -- Mappings between apertium tags and PAROLE tags for Breton
  +
* <code>cy-tags.parole.txt</code> -- Mappings between apertium tags and PAROLE tags for Welsh
 
* <code>es-tags.parole.txt</code> -- Mappings between apertium tags and PAROLE tags for Spanish
 
* <code>es-tags.parole.txt</code> -- Mappings between apertium tags and PAROLE tags for Spanish
   

Revision as of 17:03, 26 December 2008

Freeling is a suite of language processing tools, including a tokeniser, sentence splitter, morphological analyser, sense tagger, named-entity recogniser, chunker and dependency parser, etc. Much of this is also done in lttoolbox and apertium, but in some cases data or tools from Freeling could be useful.

Tools

There are some scripts in apertium SVN (module apertium-tools/freeling) for converting between apertium formats and Freeling formats.

  • dix-to-maco.py -- Convert between an lttoolbox expanded dictionary and a Freeling 'maco' format full-form list.
  • tagger-to-freeling.py -- Convert between the output of apertium-tagger to Freeling style tagged output.
  • freeling-to-tagger.py -- Convert Freeling tagged output to apertium tagged output.

Both scripts require a file with correspondences between apertium tags and PAROLE style tags. The following mappings exist:

  • br-tags.parole.txt -- Mappings between apertium tags and PAROLE tags for Breton
  • cy-tags.parole.txt -- Mappings between apertium tags and PAROLE tags for Welsh
  • es-tags.parole.txt -- Mappings between apertium tags and PAROLE tags for Spanish

Examples

Say for example we want to analyse and tag a text with apertium format, and then convert to Freeling format in order to perform a chunking.

$ echo "Bro gozh ma zadoù" | lt-proc br-fr.automorf.bin  | cg-proc br-fr.rlx.bin  | apertium-tagger -p -g br-fr.prob  

^Bro/Bro<n><f><sg>$ ^gozh/kozh<adj><mf><sp>$ ^ma/ma<det><pos><mf><sp>$ ^zadoù/tad<n><m><pl>$

$ echo "Bro gozh ma zadoù" | lt-proc br-fr.automorf.bin  | cg-proc br-fr.rlx.bin  | apertium-tagger -p -g br-fr.prob  | \
  tagger-to-freeling.py parole-tags.txt 

Bro Bro NCFSV0
gozh kozh AQ0CN0
ma ma DP0CN0
zadoù tad NCMPV0

$ echo "Bro gozh ma zadoù" | lt-proc br-fr.automorf.bin  | cg-proc br-fr.rlx.bin  | apertium-tagger -p -g br-fr.prob  | \
  tagger-to-freeling.py parole-tags.txt | fl-chunker -f br.cfg

sn_[
  +grup-n_[
    +(Bro Bro NCFSV0 -)
     (gozh kozh AQ0CN0 -)
    ]
  ]
  det_[
    +(ma ma DP0CN0 -)
  ]
  grup-n_[
    +(zadoù tad NCMPV0 -)
  ]
]

Or perhaps we want to analyse and tag directly with a Freeling analyser generated from an lttoolbox dictionary:

$ echo "Bro gozh ma zadoù" | ./fl-morph -f br.cfg
Bro bro NCFSV0 0.992647   0.00735294
gozh kozh AQ0CN0 0.975   0.025
ma ma NCMSV0 0.934028 ma DP0CN0 0.0590278 ma CS 0.00347222   0.00347222
zadoù tad NCMPV0 0.992647   0.00735294

$ echo "Bro gozh ma zadoù" | ./fl-morph -f br.cfg | ./fl-tagger -f br.cfg 
Bro bro NCFSV0 0.992647
gozh kozh AQ0CN0 0.975
ma ma NCMSV0 0.934028
zadoù tad NCMPV0 0.992647

(Note: In this example, the tagging is erroneous, resulting from a poorly trained HMM).

External links