Difference between revisions of "UDPipe"

From Apertium
Jump to navigation Jump to search
Line 33: Line 33:
 
With gold standard POS tags:
 
With gold standard POS tags:
 
<pre>
 
<pre>
cat no_bokmaal-ud-dev.conllu |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'|../udpipe --parse nob.udpipe > output
+
cat no_bokmaal-ud-dev.conllu |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'| udpipe --parse nob.udpipe > output
 
</pre>
 
</pre>
   
 
Full pipeline:
 
Full pipeline:
 
<pre>
 
<pre>
echo "Det ligger en bok på bordet." | ../udpipe --tokenize --tag --parse nob.udpipe
+
echo "Det ligger en bok på bordet." | udpipe --tokenize --tag --parse nob.udpipe
 
</pre>
 
</pre>
   

Revision as of 08:11, 23 March 2017

First things first

Get the code!
git clone https://github.com/ufal/udpipe
cd udpipe/src
make

Now copy the udpipe/src/udpipe binary executable to somewhere in your $PATH.

Get some data!
git clone https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal
cd UD_Norwegian-Bokmaal
Train a default model

With tokeniser and tagger:

cat no_bokmaal-ud-train.conllu | udpipe  --train nob.udpipe                  

Without tokeniser and tagger:

cat no_bokmaal-ud-train.conllu | udpipe  --tokenizer none --tagger none --train nob.udpipe                  
Parse some input

With gold standard POS tags:

cat no_bokmaal-ud-dev.conllu  |cut -f1-6 | sed 's/$/\t_\t_\t_\t_/g' | sed 's/^\t.*//g'| udpipe --parse nob.udpipe  > output               

Full pipeline:

echo "Det ligger en bok på bordet." | udpipe --tokenize --tag --parse nob.udpipe
Calculate accuracy
udpipe --accuracy --parse nob.udpipe no_bokmaal-ud-dev.conllu

Parameters