Shallow syntactic function labeller
This is Google Summer of Code 2017 project
Architecture
1. The labeller takes a string in Apertium stream format with morphological tags:
^vino<n><m><sg>$ = INPUT
2. Parses it into a sequence of morphological tags:
<n><m><sg>
3. Restores the model for this language (which is in the same directory and looks like .json file or like a .pkl file)
4. The algorithm analyzes the string and gives a sequence of syntactic tags as an output.
<@nsubj>
5. The labeller applies given labels to the original string:
^vino<n><m><sg><@nsubj>$ = OUTPUT
So, in the end there will be a module itself and a file with a model.
Workplan
Week | Dates | To do |
---|---|---|
1 | 30th May — 5th June |
Handling possible discrepancies between tagsets, writing a script for parsing Sami corpus. |
2 | 6th June — 12th June | |
3 | 13th June — 19th June | |
4 | 20th June — 26th June | |
First evaluation |
Ready-to-use datasets | |
5 | 27th June — 3rd July |
Building the model |
6 | 4th July — 10th July |
|
7 | 11th July — 17th July |
|
8 | 18th July — 24th July |
|
Second evaluation |
Well-trained model at least for North Sami | |
9 | 25th July — 31th July |
|
10 | 1st August — 7th August |
|
11 | 8th August — 14th August |
|
12 | 15th August — 21th August |
|
Final evaluation |
The prototype shallow syntactic function labeller. |