Difference between revisions of "Shallow syntactic function labeller"

From Apertium
Jump to navigation Jump to search
Line 4: Line 4:


A workplan and progress notes can be found here: [[Shallow syntactic function labeller/Workplan]]
A workplan and progress notes can be found here: [[Shallow syntactic function labeller/Workplan]]

== Description ==
The shallow syntactic function labeller takes a string in Apertium stream format, parses it into a sequence of morphological tags and gives it to a classifier. The classifier is a simple RNN model trained on prepared datasets which were made from parsed syntax-labelled corpora (mostly UD-treebanks). The classifier analyzes the given sequence of morphological tags, gives a sequence of labels as an output and the labeller applies these labels to the original string.

== Labeller in the pipeline ==
In sme-nob the labeller runs between sme-nob-disam and sme-nob-pretransfer, like an original syntax module.

<pre>
... | cg-proc 'sme-nob.mor.rlx.bin' | python 'sme-nob-labeller.py' | apertium-pretransfer | lt-proc -b 'sme-nob.autobil.bin' | ...
</pre>

In other language pairs it may run between morphological analyzer and pretransfer.


== Prerequisites ==
== Prerequisites ==
Line 20: Line 32:
</pre>
</pre>


Script ''add_labeller.py'' adds all the needed files in apertium-sme-nob directory and changes all files with modes.
Script ''install_labeller.py'' adds all the needed files in apertium-sme-nob directory and changes all files with modes.


'''Arguments:'''
'''Arguments:'''
* ''apertium_path:'' path to your apertium-sme-nob directory
* ''apertium_path:'' path to your apertium-sme-nob directory
* ''python_path:'' path to current Python interpreteur (NB: if you just type "python" instead of full path, some dependencies might not work)
* ''python_path:'' path to current Python interpreteur (NB: if you just type "python" instead of full path, some dependencies might not work)
* ''install_mode:'' '''-install''' for installing the labeller and changing modes, '''-change''' for just changing modes.
* ''work_mode:'' '''-install''' for installing the labeller and changing modes, '''-change''' for just changing modes.
* ''type_of_change:'' '''-lb''' for using the labeller in the pipeline, '''-cg''' for using the original syntax module (sme-nob.syn.rlx.bin) in the pipeline.
* ''type_of_change:'' '''-lb''' for using the labeller in the pipeline, '''-cg''' for using the original syntax module (sme-nob.syn.rlx.bin) in the pipeline.


Line 31: Line 43:
For example, this script will install the labeller and add it to the pipeline:
For example, this script will install the labeller and add it to the pipeline:
<pre>
<pre>
python add_labeller.py /home/user/apertium/apertium-sme-nob /home/user/anaconda3/bin/python -install -lb
python install_labeller.py /home/user/apertium/apertium-sme-nob /home/user/anaconda3/bin/python -install -lb
</pre>
</pre>


And this script will backward modes changes:
And this script will backward modes changes:
<pre>
<pre>
python add_labeller.py /home/user/apertium/apertium-sme-nob /home/user/anaconda3/bin/python -change -cg
python install_labeller.py /home/user/apertium/apertium-sme-nob /home/user/anaconda3/bin/python -change -cg
</pre>
</pre>


== To do ==
== To do ==
* '''Add an ability to handle more than one sentence.'''
* <s>Add an ability to handle more than one sentence.</s>
* Do more tests. MORE.
* Do more tests. MORE.
* Write docstrings and refactore the main code.
* Write docstrings and refactore the main code.

Revision as of 06:50, 15 August 2017

This is Google Summer of Code 2017 project

A repository for the whole project: https://github.com/deltamachine/shallow_syntactic_function_labeller

A workplan and progress notes can be found here: Shallow syntactic function labeller/Workplan

Description

The shallow syntactic function labeller takes a string in Apertium stream format, parses it into a sequence of morphological tags and gives it to a classifier. The classifier is a simple RNN model trained on prepared datasets which were made from parsed syntax-labelled corpora (mostly UD-treebanks). The classifier analyzes the given sequence of morphological tags, gives a sequence of labels as an output and the labeller applies these labels to the original string.

Labeller in the pipeline

In sme-nob the labeller runs between sme-nob-disam and sme-nob-pretransfer, like an original syntax module.

... | cg-proc 'sme-nob.mor.rlx.bin' | python 'sme-nob-labeller.py' | apertium-pretransfer | lt-proc -b 'sme-nob.autobil.bin' | ...

In other language pairs it may run between morphological analyzer and pretransfer.

Prerequisites

1. Python libraries:

2. Precompiled language pairs which support the labeller (sme-nob)

Installation

Currently only the test version for sme-nob pair is available.

git clone https://github.com/deltamachine/sme-nob_testpack.git
cd sme-nob_testpack

Script install_labeller.py adds all the needed files in apertium-sme-nob directory and changes all files with modes.

Arguments:

  • apertium_path: path to your apertium-sme-nob directory
  • python_path: path to current Python interpreteur (NB: if you just type "python" instead of full path, some dependencies might not work)
  • work_mode: -install for installing the labeller and changing modes, -change for just changing modes.
  • type_of_change: -lb for using the labeller in the pipeline, -cg for using the original syntax module (sme-nob.syn.rlx.bin) in the pipeline.


For example, this script will install the labeller and add it to the pipeline:

python install_labeller.py /home/user/apertium/apertium-sme-nob /home/user/anaconda3/bin/python -install -lb

And this script will backward modes changes:

python install_labeller.py /home/user/apertium/apertium-sme-nob /home/user/anaconda3/bin/python -change -cg

To do

  • Add an ability to handle more than one sentence.
  • Do more tests. MORE.
  • Write docstrings and refactore the main code.
  • Take the trash out of the github repository before the final evaluation.
  • Continue improving the perfomance of the models.