Shallow syntactic function labeller

From Apertium
Jump to navigation Jump to search

This is Google Summer of Code 2017 project

Architecture

1. The labeller takes a string in Apertium stream format with morphological tags:

^vino<n><m><sg>$ = INPUT

2. Parses it into a sequence of morphological tags:

<n><m><sg>

3. Restores the model for this language (which is in the same directory and looks like .json file or like a .pkl file)

4. The algorithm analyzes the string and gives a sequence of syntactic tags as an output.

<@nsubj>

5. The labeller applies given labels to the original string:

^vino<n><m><sg><@nsubj>$ = OUTPUT

So, in the end there will be a module itself and a file with a model.

Workplan

Week Dates To do
1 30th May — 5th June
  • Handling discrepancies between Apertium sme-nob and Sami corpus tagsets
  • Writing a script for parsing Sami corpus
2 6th June — 12th June
3 13th June — 19th June
4 20th June — 26th June
First evaluation

Ready-to-use datasets

5 27th June — 3rd July

Building the model

6 4th July — 10th July
  • Training the classifier
  • Evaluating the quality of the prototype
7 11th July — 17th July
  • Further training
  • Working on improvements of the model
8 18th July — 24th July
  • Final testing
  • Writing a script, which applies labels to the original string in Apertium stream format
Second evaluation

Well-trained model at least for North Sami

9 25th July — 31th July
  • Collecting all parts of the labeller together
  • Adding machine-learned module instead of the syntax labelling part of sme-nob CG module to test it
10 1st August — 7th August
  • Adding machine-learned module instead of the syntax labelling part of sme-nob CG module to test it
11 8th August — 14th August
  • Testing
  • Fixing bugs
12 15th August — 21th August
  • Cleaning up the code
  • Writing documentation
Final evaluation

The prototype shallow syntactic function labeller.