Latest revision as of 00:30, 15 April 2017

Project title[edit]

Chukchi morphological analyser using HFST

Contacts[edit]

Vasilisa Andriyanets
blindedbysunshine@gmail.com
github.com/basilisandr
bas_____ on irc
Moscow (GMT+3)

CCh[edit]

Link to github: [1]

Synopsis[edit]

Chukchi is a language with rich and complicated morphology and incorporation.
By now morphological parsers using regular expressions were not able to handle it properly. The platforms themselves were not very user-friendly (no documentation whatsoever).
HFST offers more possibilities than regular expressions for both analysing and constructing forms of Chukchi.
Apertium is, on the one hand, a platform that uses HFST, and on the other hand, a community that is interested in minor languages.
Chukchi is a minority language in Russia that needs a transducer-based morphological parser -- seems like a perfect match.

Deliverables[edit]

Anticipated result:

well-documented,
easy to use

morphological analyser for Chukchi that handles

nouns
verbs
incorporation (probably)

that occur in a collection of Chukchi texts.
From the other point of view it will be a simple tool for automated glossing of Chukchi texts in Russian as meta-language.

Benefits[edit]

The result of this work, if I get selected, would be of great use for linguists investigating Chukchi and an important brick for building a morphologically annotated corpus of Chukchi that could be easily updated with automated glosses.
It will also, of course, allow for future machine translation between Chukchi and Russian.

Timeline[edit]

Post-application period[edit]

Investigation time:

get to know HFST better
get a full picture on Chukchi morphology
improve skills in building finite-state transducers
make some test cases to aid further development

Community bonding period[edit]

start working with nouns

Work period[edit]

The most salient way to set weekly goals is to set the percent of the corpus (aka the collection of texts) forms coverage, so the timeline goes roughly like this:

Week 1 40% coverage of the corpus forms
Week 2 55%
Week 3 65%
Week 4 75%

Milestone #1 75% coverage of the corpus

Week 5 80%
Week 6 83%
Week 7 86%
Week 8 90%

Milestone #2 90% coverage of the corpus

Week 9 92%
Week 10 94%
Week 11 96%
Week 12 98% coverage

The corpus is not very large, so hopefully I will be able to analyse all or almost all of the forms.

Personal information[edit]

Skills and Qualifications[edit]

4 years of Fundamental and applied linguistics, almost completed Bachelor degree in linguistics at NRU HSE, Moscow, Russia.
Languages: Russian (native), English (advanced), German (intermediate), Yiddish (intermediate), Norwegian (intermediate), French (elementary)
Programming skills: Python, R, bash

Non-GSoC summer plans[edit]

I am going to write my bachelor thesis by mid June, so I will only be able to spend 10-15 hours per week.
I am also going for a conference on 9-15 July, so I will be able to spend 15-20 hours for the project that week.
Apart from that, I am going to work full-time up to 50 hours a week.

@@ Line 8: / Line 8: @@
 bas_____ on irc<br />
 Moscow (GMT+3)
+==CCh==
+Link to github: [https://github.com/BasilisAndr/chkchn/blob/master/tables]
 =Synopsis=
 Chukchi is a language with rich and complicated morphology and incorporation.<br />
-By now morphological parsers using regular expressions were not able to handle it properly <br />
+By now morphological parsers using regular expressions were not able to handle it properly. The platforms themselves were not very user-friendly (no documentation whatsoever). <br />
+HFST offers more possibilities than regular expressions for both analysing and constructing forms of Chukchi.<br />
-HFST seems to be the solution
+Apertium is, on the one hand, a platform that uses HFST, and on the other hand, a community that is interested in minor languages. <br />
+Chukchi is a minority language in Russia that needs a transducer-based morphological parser -- seems like a perfect match.
 ==Deliverables==
-Anticipated result: morphological analyser for Chukchi that handles
+Anticipated result: <br />
+*well-documented,
+*easy to use
+morphological analyser for Chukchi that handles
 * nouns
 * verbs
 * incorporation (probably)
+that occur in a collection of Chukchi texts.<br />
+From the other point of view it will be a simple tool for automated glossing of Chukchi texts in Russian as meta-language.
 ==Benefits==
-The result of this work, if it succeeds, would be of great use for linguists investigating Chukchi and an important brick for building a corpus of Chukchi.
+The result of this work, if I get selected, would be of great use for linguists investigating Chukchi and an important brick for building a morphologically annotated corpus of Chukchi that could be easily updated with automated glosses.<br />
+It will also, of course, allow for future machine translation between Chukchi and Russian.
 =Timeline=
 ==Post-application period==
+Investigation time:
-*Getting to know HFST better
+*get to know HFST better
+* get a full picture on Chukchi morphology
 *improve skills in building finite-state transducers
+* make some test cases to aid further development
 ==Community bonding period==
+* start working with nouns
-Investigation time:
-* getting the whole picture of Chukchi morphology
-* planning the architecture
 ==Work period==
+The most salient way to set weekly goals is to set the percent of the corpus (aka the collection of texts) forms coverage, so the timeline goes roughly like this:
-*'''Week 1''' nouns
-*'''Week 2'''
+*'''Week 1''' 40% coverage of the corpus forms
-*'''Week 3'''
+*'''Week 2''' 55%
-*'''Week 4'''
+*'''Week 3''' 65%
+*'''Week 4''' 75%
-'''Milestone #1''' HFST for nouns (and adjectives?)
+'''Milestone #1''' 75% coverage of the corpus
-*'''Week 5''' verbs
-*'''Week 6'''
+*'''Week 5''' 80%
-*'''Week 7'''
+*'''Week 6''' 83%
-*'''Week 8'''
+*'''Week 7''' 86%
-'''Milestone #2''' HFST for verbs?
+*'''Week 8''' 90%
+'''Milestone #2''' 90% coverage of the corpus
-*'''Week 9'''
-*'''Week 10'''
+*'''Week 9''' 92%
-*'''Week 11'''
+*'''Week 10''' 94%
-*'''Week 12''' final debugging, writing documentation
+*'''Week 11''' 96%
+*'''Week 12''' 98% coverage
+The corpus is not very large, so hopefully I will be able to analyse all or almost all of the forms.
 =Personal information=
 ==Skills and Qualifications==
-years of Fundamental and applied linguistics, (almost completed Bachelor degree in linguistics)<br />
+years of Fundamental and applied linguistics, almost completed Bachelor degree in linguistics at NRU HSE, Moscow, Russia.<br />
 '''Languages:''' Russian (native), English (advanced), German (intermediate), Yiddish (intermediate), Norwegian (intermediate), French (elementary)<br />
 '''Programming skills:''' Python, R, bash
 ==Non-GSoC summer plans==
-I am going to write my bachelor thesis by middle June, so I will only be able to spend 10-15 hours per week.<br />
+I am going to write my bachelor thesis by mid June, so I will only be able to spend 10-15 hours per week.<br />
 I am also going for a conference on 9-15 July, so I will be able to spend 15-20 hours for the project that week.<br />
 Apart from that, I am going to work full-time up to 50 hours a week.

Difference between revisions of "User:Bandrandr/proposal"

Latest revision as of 00:30, 15 April 2017

Contents

Project title[edit]

Contacts[edit]

CCh[edit]

Synopsis[edit]

Deliverables[edit]

Benefits[edit]

Timeline[edit]

Post-application period[edit]

Community bonding period[edit]

Work period[edit]

Personal information[edit]

Skills and Qualifications[edit]

Non-GSoC summer plans[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools