Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Perceptron tagger

From Apertium
(Difference between revisions)
Jump to: navigation, search
(Created page with "== Step by step == Mostly things are as in Supervised tagger training except you need an MTX file (and optionally a TSX file) instead of a TSX file. 1a. '''Get an MTX fi...")
 
Line 1: Line 1:
  +
The perceptron part-of-speech tagger implements part-of-speech tagging using the averaged, structured perceptron algorithm. Some information about the implementation is available [https://github.com/frankier/perceptron-tagger-slides/raw/master/presentation.pdf in this presentation]. The implementation is based on the references in the final slide.
  +
 
== Step by step ==
 
== Step by step ==
   

Revision as of 13:26, 22 August 2016

The perceptron part-of-speech tagger implements part-of-speech tagging using the averaged, structured perceptron algorithm. Some information about the implementation is available in this presentation. The implementation is based on the references in the final slide.

Step by step

Mostly things are as in Supervised tagger training except you need an MTX file (and optionally a TSX file) instead of a TSX file.

1a. Get an MTX file Copy an MTX file into your language directory and optionally modify it (or start from scratch). See MTX format. 1b. Get a tagged corpus.

2. Train the tagger like so: apertium-tagger [--skip-on-error] -xs [ITERATIONS] TAGGED_CORPUS UNTAGGED_CORPUS MTX_FILE You can put this in a Makefile. Use --skip-on-error to discard sentences for which the TAGGED and UNTAGGED corpus don't really match. 10 is a good value for ITERATIONS.

3. Run the tagger like so: apertium-tagger --tagger --perceptron model. You can put this in your modes.xml.

Personal tools