Difference between revisions of "Shallow syntactic function labeller"
Deltamachine (talk | contribs) |
Deltamachine (talk | contribs) |
||
Line 4: | Line 4: | ||
A workplan and progress notes can be found here: [[Shallow syntactic function labeller/Workplan]] |
A workplan and progress notes can be found here: [[Shallow syntactic function labeller/Workplan]] |
||
== Description == |
|||
The shallow syntactic function labeller takes a string in Apertium stream format, parses it into a sequence of morphological tags and gives it to a classifier. The classifier is a simple RNN model trained on prepared datasets which were made from parsed syntax-labelled corpora (mostly UD-treebanks). The classifier analyzes the given sequence of morphological tags, gives a sequence of labels as an output and the labeller applies these labels to the original string. |
|||
== Labeller in the pipeline == |
|||
In sme-nob the labeller runs between sme-nob-disam and sme-nob-pretransfer, like an original syntax module. |
|||
<pre> |
|||
... | cg-proc 'sme-nob.mor.rlx.bin' | python 'sme-nob-labeller.py' | apertium-pretransfer | lt-proc -b 'sme-nob.autobil.bin' | ... |
|||
</pre> |
|||
In other language pairs it may run between morphological analyzer and pretransfer. |
|||
== Prerequisites == |
== Prerequisites == |
||
Line 20: | Line 32: | ||
</pre> |
</pre> |
||
Script '' |
Script ''install_labeller.py'' adds all the needed files in apertium-sme-nob directory and changes all files with modes. |
||
'''Arguments:''' |
'''Arguments:''' |
||
* ''apertium_path:'' path to your apertium-sme-nob directory |
* ''apertium_path:'' path to your apertium-sme-nob directory |
||
* ''python_path:'' path to current Python interpreteur (NB: if you just type "python" instead of full path, some dependencies might not work) |
* ''python_path:'' path to current Python interpreteur (NB: if you just type "python" instead of full path, some dependencies might not work) |
||
* '' |
* ''work_mode:'' '''-install''' for installing the labeller and changing modes, '''-change''' for just changing modes. |
||
* ''type_of_change:'' '''-lb''' for using the labeller in the pipeline, '''-cg''' for using the original syntax module (sme-nob.syn.rlx.bin) in the pipeline. |
* ''type_of_change:'' '''-lb''' for using the labeller in the pipeline, '''-cg''' for using the original syntax module (sme-nob.syn.rlx.bin) in the pipeline. |
||
Line 31: | Line 43: | ||
For example, this script will install the labeller and add it to the pipeline: |
For example, this script will install the labeller and add it to the pipeline: |
||
<pre> |
<pre> |
||
python |
python install_labeller.py /home/user/apertium/apertium-sme-nob /home/user/anaconda3/bin/python -install -lb |
||
</pre> |
</pre> |
||
And this script will backward modes changes: |
And this script will backward modes changes: |
||
<pre> |
<pre> |
||
python |
python install_labeller.py /home/user/apertium/apertium-sme-nob /home/user/anaconda3/bin/python -change -cg |
||
</pre> |
</pre> |
||
== To do == |
== To do == |
||
* |
* <s>Add an ability to handle more than one sentence.</s> |
||
* Do more tests. MORE. |
* Do more tests. MORE. |
||
* Write docstrings and refactore the main code. |
* Write docstrings and refactore the main code. |
Revision as of 06:50, 15 August 2017
This is Google Summer of Code 2017 project
A repository for the whole project: https://github.com/deltamachine/shallow_syntactic_function_labeller
A workplan and progress notes can be found here: Shallow syntactic function labeller/Workplan
Description
The shallow syntactic function labeller takes a string in Apertium stream format, parses it into a sequence of morphological tags and gives it to a classifier. The classifier is a simple RNN model trained on prepared datasets which were made from parsed syntax-labelled corpora (mostly UD-treebanks). The classifier analyzes the given sequence of morphological tags, gives a sequence of labels as an output and the labeller applies these labels to the original string.
Labeller in the pipeline
In sme-nob the labeller runs between sme-nob-disam and sme-nob-pretransfer, like an original syntax module.
... | cg-proc 'sme-nob.mor.rlx.bin' | python 'sme-nob-labeller.py' | apertium-pretransfer | lt-proc -b 'sme-nob.autobil.bin' | ...
In other language pairs it may run between morphological analyzer and pretransfer.
Prerequisites
1. Python libraries:
- DyNet (installation instructions can be found here: http://dynet.readthedocs.io/en/latest/python.html)
- Streamparser (https://github.com/goavki/streamparser)
2. Precompiled language pairs which support the labeller (sme-nob)
Installation
Currently only the test version for sme-nob pair is available.
git clone https://github.com/deltamachine/sme-nob_testpack.git cd sme-nob_testpack
Script install_labeller.py adds all the needed files in apertium-sme-nob directory and changes all files with modes.
Arguments:
- apertium_path: path to your apertium-sme-nob directory
- python_path: path to current Python interpreteur (NB: if you just type "python" instead of full path, some dependencies might not work)
- work_mode: -install for installing the labeller and changing modes, -change for just changing modes.
- type_of_change: -lb for using the labeller in the pipeline, -cg for using the original syntax module (sme-nob.syn.rlx.bin) in the pipeline.
For example, this script will install the labeller and add it to the pipeline:
python install_labeller.py /home/user/apertium/apertium-sme-nob /home/user/anaconda3/bin/python -install -lb
And this script will backward modes changes:
python install_labeller.py /home/user/apertium/apertium-sme-nob /home/user/anaconda3/bin/python -change -cg
To do
Add an ability to handle more than one sentence.- Do more tests. MORE.
- Write docstrings and refactore the main code.
- Take the trash out of the github repository before the final evaluation.
- Continue improving the perfomance of the models.