Difference between revisions of "UDPipe"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
train: |
|||
==First things first== |
|||
;Get the code! |
|||
<pre> |
<pre> |
||
git clone https://github.com/ufal/udpipe |
git clone https://github.com/ufal/udpipe |
||
cd udpipe/src |
cd udpipe/src |
||
make |
make |
||
</pre> |
|||
Now copy the <code>udpipe/src/udpipe</code> binary executable to somewhere in your <code>$PATH</code>. |
|||
;Get some data! |
|||
<pre> |
|||
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal |
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal |
||
cd UD_Norwegian-Bokmaal |
cd UD_Norwegian-Bokmaal |
||
⚫ | |||
</pre> |
</pre> |
||
;Train a default model |
|||
test: |
|||
With tokeniser and tagger: |
|||
<pre> |
<pre> |
||
cat no_bokmaal-ud- |
cat no_bokmaal-ud-train.conllu | udpipe --train nob.udpipe |
||
</pre> |
</pre> |
||
Without tokeniser and tagger: |
|||
<pre> |
|||
⚫ | |||
</pre> |
|||
; Parse some input |
|||
With gold standard POS tags: |
|||
<pre> |
<pre> |
||
cat no_bokmaal-ud-dev.conllu |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe > output |
|||
</pre> |
</pre> |
||
Full pipeline: |
|||
accuracy |
|||
<pre> |
|||
echo "Det ligger en bok på bordet." | ../udpipe --tokenize --tag --parse nob.udpipe |
|||
</pre> |
|||
; Calculate accuracy |
|||
<pre> |
<pre> |
||
udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu |
|||
</pre> |
</pre> |
||
==Parameters== |
|||
[[Category:Tools|*]] |
Revision as of 08:10, 23 March 2017
First things first
- Get the code!
git clone https://github.com/ufal/udpipe cd udpipe/src make
Now copy the udpipe/src/udpipe
binary executable to somewhere in your $PATH
.
- Get some data!
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal cd UD_Norwegian-Bokmaal
- Train a default model
With tokeniser and tagger:
cat no_bokmaal-ud-train.conllu | udpipe --train nob.udpipe
Without tokeniser and tagger:
cat no_bokmaal-ud-train.conllu | udpipe --tokenizer none --tagger none --train nob.udpipe
- Parse some input
With gold standard POS tags:
cat no_bokmaal-ud-dev.conllu |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe > output
Full pipeline:
echo "Det ligger en bok på bordet." | ../udpipe --tokenize --tag --parse nob.udpipe
- Calculate accuracy
udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu