Difference between revisions of "Training perceptron tagger"
Jump to navigation
Jump to search
(Created page with "In this article, I will describe the pipeline for learning the [http://wiki.apertium.org/wiki/Perceptron_tagger Perceptron Tagger].") |
|||
Line 1: | Line 1: | ||
In this article, I will describe the pipeline for learning the [http://wiki.apertium.org/wiki/Perceptron_tagger Perceptron Tagger]. |
In this article, I will describe the pipeline for learning the [http://wiki.apertium.org/wiki/Perceptron_tagger Perceptron Tagger]. |
||
+ | |||
+ | |||
+ | ==Convert UD-Tree dataset into Apertium== |
||
+ | Firstly, you need to convert .conllu format into apertium format, you need using this tool [https://github.com/alxmamaev/UdTree2Apertium UdTree2Apertium]. |
||
+ | |||
+ | First you need to get a raw Apertium file. Example for english: |
||
+ | <pre>cat en-ud-train.conllu | grep -e '^$' -e '^[0-9]' | cut -f2 | sed 's/$/¶/g' | |
||
+ | apertium-destxt | lt-proc -w ~/source/apertium//languages/apertium-eng/eng.automorf.bin | apertium-retxt | sed 's/¶//g' > en-ud-train.apertium</pre> |
||
+ | |||
+ | Then you need to run this utility: |
||
+ | <pre>python3 converter.py tags/eng.csv en-ud-train.apertium en-ud-train.conllu eng.tagged</pre> |
||
+ | |||
+ | |||
+ | ==Prepearing data for tagger== |
Revision as of 15:52, 5 January 2018
In this article, I will describe the pipeline for learning the Perceptron Tagger.
Convert UD-Tree dataset into Apertium
Firstly, you need to convert .conllu format into apertium format, you need using this tool UdTree2Apertium.
First you need to get a raw Apertium file. Example for english:
cat en-ud-train.conllu | grep -e '^$' -e '^[0-9]' | cut -f2 | sed 's/$/¶/g' | apertium-destxt | lt-proc -w ~/source/apertium//languages/apertium-eng/eng.automorf.bin | apertium-retxt | sed 's/¶//g' > en-ud-train.apertium
Then you need to run this utility:
python3 converter.py tags/eng.csv en-ud-train.apertium en-ud-train.conllu eng.tagged