Difference between revisions of "UDPipe"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:

train:
==First things first==

;Get the code!
<pre>
<pre>
git clone https://github.com/ufal/udpipe
git clone https://github.com/ufal/udpipe
cd udpipe/src
cd udpipe/src
make
make
</pre>

Now copy the <code>udpipe/src/udpipe</code> binary executable to somewhere in your <code>$PATH</code>.

;Get some data!
<pre>
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal
cd UD_Norwegian-Bokmaal
cd UD_Norwegian-Bokmaal
cat no_bokmaal-ud-train.conllu |../udpipe --train nob.udpipe
</pre>
</pre>


;Train a default model
test:

With tokeniser and tagger:
<pre>
<pre>
cat no_bokmaal-ud-test.conllu |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe > output
cat no_bokmaal-ud-train.conllu | udpipe --train nob.udpipe
</pre>
</pre>


test (with tokeniser):
Without tokeniser and tagger:
<pre>
cat no_bokmaal-ud-train.conllu | udpipe --tokenizer none --tagger none --train nob.udpipe
</pre>


; Parse some input

With gold standard POS tags:
<pre>
<pre>
echo "Det ligger en bok bordet." | ../udpipe --tokenize --tag --parse nob.udpipe
cat no_bokmaal-ud-dev.conllu |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe > output
</pre>
</pre>


Full pipeline:
accuracy
<pre>
echo "Det ligger en bok på bordet." | ../udpipe --tokenize --tag --parse nob.udpipe
</pre>

; Calculate accuracy


<pre>
<pre>
../udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu
udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu
</pre>
</pre>

==Parameters==

[[Category:Tools|*]]

Revision as of 08:10, 23 March 2017

First things first

Get the code!
git clone https://github.com/ufal/udpipe
cd udpipe/src
make

Now copy the udpipe/src/udpipe binary executable to somewhere in your $PATH.

Get some data!
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal
cd UD_Norwegian-Bokmaal
Train a default model

With tokeniser and tagger:

cat no_bokmaal-ud-train.conllu | udpipe  --train nob.udpipe                  

Without tokeniser and tagger:

cat no_bokmaal-ud-train.conllu | udpipe  --tokenizer none --tagger none --train nob.udpipe                  
Parse some input

With gold standard POS tags:

cat no_bokmaal-ud-dev.conllu  |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe  > output               

Full pipeline:

echo "Det ligger en bok på bordet." | ../udpipe --tokenize --tag --parse nob.udpipe
Calculate accuracy
udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu

Parameters