Difference between revisions of "UDPipe"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
+ | |||
− | train: |
||
+ | ==First things first== |
||
+ | |||
+ | ;Get the code! |
||
<pre> |
<pre> |
||
git clone https://github.com/ufal/udpipe |
git clone https://github.com/ufal/udpipe |
||
cd udpipe/src |
cd udpipe/src |
||
make |
make |
||
+ | </pre> |
||
+ | |||
+ | Now copy the <code>udpipe/src/udpipe</code> binary executable to somewhere in your <code>$PATH</code>. |
||
+ | |||
+ | ;Get some data! |
||
+ | <pre> |
||
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal |
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal |
||
cd UD_Norwegian-Bokmaal |
cd UD_Norwegian-Bokmaal |
||
⚫ | |||
</pre> |
</pre> |
||
+ | ;Train a default model |
||
− | test: |
||
+ | |||
+ | With tokeniser and tagger: |
||
<pre> |
<pre> |
||
− | cat no_bokmaal-ud- |
+ | cat no_bokmaal-ud-train.conllu | udpipe --train nob.udpipe |
</pre> |
</pre> |
||
− | + | Without tokeniser and tagger: |
|
+ | <pre> |
||
⚫ | |||
+ | </pre> |
||
+ | ; Parse some input |
||
+ | |||
+ | With gold standard POS tags: |
||
<pre> |
<pre> |
||
− | + | cat no_bokmaal-ud-dev.conllu |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe > output |
|
</pre> |
</pre> |
||
+ | Full pipeline: |
||
− | accuracy |
||
+ | <pre> |
||
+ | echo "Det ligger en bok på bordet." | ../udpipe --tokenize --tag --parse nob.udpipe |
||
+ | </pre> |
||
+ | |||
+ | ; Calculate accuracy |
||
<pre> |
<pre> |
||
− | + | udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu |
|
</pre> |
</pre> |
||
+ | |||
+ | ==Parameters== |
||
+ | |||
+ | [[Category:Tools|*]] |
Revision as of 08:10, 23 March 2017
First things first
- Get the code!
git clone https://github.com/ufal/udpipe cd udpipe/src make
Now copy the udpipe/src/udpipe
binary executable to somewhere in your $PATH
.
- Get some data!
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal cd UD_Norwegian-Bokmaal
- Train a default model
With tokeniser and tagger:
cat no_bokmaal-ud-train.conllu | udpipe --train nob.udpipe
Without tokeniser and tagger:
cat no_bokmaal-ud-train.conllu | udpipe --tokenizer none --tagger none --train nob.udpipe
- Parse some input
With gold standard POS tags:
cat no_bokmaal-ud-dev.conllu |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe > output
Full pipeline:
echo "Det ligger en bok på bordet." | ../udpipe --tokenize --tag --parse nob.udpipe
- Calculate accuracy
udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu