Difference between revisions of "UDPipe"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
  +
train:
 
  +
==First things first==
  +
  +
;Get the code!
 
<pre>
 
<pre>
 
git clone https://github.com/ufal/udpipe
 
git clone https://github.com/ufal/udpipe
 
cd udpipe/src
 
cd udpipe/src
 
make
 
make
  +
</pre>
  +
  +
Now copy the <code>udpipe/src/udpipe</code> binary executable to somewhere in your <code>$PATH</code>.
  +
  +
;Get some data!
  +
<pre>
 
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal
 
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal
 
cd UD_Norwegian-Bokmaal
 
cd UD_Norwegian-Bokmaal
cat no_bokmaal-ud-train.conllu |../udpipe --train nob.udpipe
 
 
</pre>
 
</pre>
   
  +
;Train a default model
test:
 
  +
  +
With tokeniser and tagger:
 
<pre>
 
<pre>
cat no_bokmaal-ud-test.conllu |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe > output
+
cat no_bokmaal-ud-train.conllu | udpipe --train nob.udpipe
 
</pre>
 
</pre>
   
test (with tokeniser):
+
Without tokeniser and tagger:
  +
<pre>
 
cat no_bokmaal-ud-train.conllu | udpipe --tokenizer none --tagger none --train nob.udpipe
  +
</pre>
   
  +
; Parse some input
  +
  +
With gold standard POS tags:
 
<pre>
 
<pre>
echo "Det ligger en bok bordet." | ../udpipe --tokenize --tag --parse nob.udpipe
+
cat no_bokmaal-ud-dev.conllu |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe > output
 
</pre>
 
</pre>
   
  +
Full pipeline:
accuracy
 
  +
<pre>
  +
echo "Det ligger en bok på bordet." | ../udpipe --tokenize --tag --parse nob.udpipe
  +
</pre>
  +
  +
; Calculate accuracy
   
 
<pre>
 
<pre>
../udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu
+
udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu
 
</pre>
 
</pre>
  +
  +
==Parameters==
  +
  +
[[Category:Tools|*]]

Revision as of 08:10, 23 March 2017

First things first

Get the code!
git clone https://github.com/ufal/udpipe
cd udpipe/src
make

Now copy the udpipe/src/udpipe binary executable to somewhere in your $PATH.

Get some data!
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal
cd UD_Norwegian-Bokmaal
Train a default model

With tokeniser and tagger:

cat no_bokmaal-ud-train.conllu | udpipe  --train nob.udpipe                  

Without tokeniser and tagger:

cat no_bokmaal-ud-train.conllu | udpipe  --tokenizer none --tagger none --train nob.udpipe                  
Parse some input

With gold standard POS tags:

cat no_bokmaal-ud-dev.conllu  |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe  > output               

Full pipeline:

echo "Det ligger en bok på bordet." | ../udpipe --tokenize --tag --parse nob.udpipe
Calculate accuracy
udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu

Parameters