User:Ruthenian8/GSOC 2021 progress report
< User:Ruthenian8
Jump to navigation
Jump to search
Revision as of 06:15, 13 July 2021 by Ruthenian8 (talk | contribs)
- Title: Morphological analyzer for Bagvalal
- Proposal: proposal
- Abstract: Bagvalal is an endangered typologically rare Caucasian language from the Nakh-Daghestanian family. Its conservation and study are constrained by the lack of sufficient NLP-tools that can be used to process field data.
My proposal is to develop an fst-powered morphological analyzer for Bagvalal using all the available grammatical and lexical information. In the future this project can allow Apertium to support morphological analysis for multiple Nakh-Daghestanian languages and develop corresponding language pairs. - GitHub repo: bagvalal
Week | Intended changes | Status |
---|---|---|
Week 1 | Testing and refining the existing rules for the closed word classes (e. g. numerals, clitics and pronouns). | Complete |
Week 2 | Writing documentation and tests. | Complete |
Week 3 & 4 | Testing and refining the existing rules for the open word classes (e. g. verbs, nouns and adjectives). Writing documentation and tests. |
Complete |
Week 5 | Adding the missing adjectives and adverbs from the available dictionaries (see the Resources section above). Testing the analysis results and the model performance. |
In progress |
Week 6 | Adding the missing nouns. Testing the analysis results and the model performance. |
In progress |
Week 7 & 8 | Adding the missing verbs, participles, converbs and masdars. Testing the analysis results and the model performance. |
In progress |
Week 9 & 10 | Tokenizing the corpora. Converting the existing annotations to an appropriate format Creating word-analysis pairs. Writing documentation. |
In progress |
Week 11 | Expelling the false analyses from the model Testing and debugging. Finishing the work on the documentation |
In progress |
Week 12 | Running all the tests and debugging | In progress |
Development log: Updates coming up