Difference between revisions of "User:Ruthenian8/GSOC 2021 progress report"
Jump to navigation
Jump to search
Ruthenian8 (talk | contribs) (Created page with "* '''Title''': Morphological analyzer for Bagvalal * '''Proposal''': [https://drive.google.com/file/d/1Y05eQtFP7ioz50z2GlUdvB2Edh4lel6G/view?usp=sharing proposal] * '''Abstrac...") |
Ruthenian8 (talk | contribs) (Add table) |
||
Line 1: | Line 1: | ||
* '''Title''': Morphological analyzer for Bagvalal |
* '''Title''': Morphological analyzer for Bagvalal |
||
* '''Proposal''': [https://drive.google.com/file/d/1Y05eQtFP7ioz50z2GlUdvB2Edh4lel6G/view?usp=sharing proposal] |
* '''Proposal''': [https://drive.google.com/file/d/1Y05eQtFP7ioz50z2GlUdvB2Edh4lel6G/view?usp=sharing proposal] |
||
* '''Abstract''': Bagvalal is an endangered typologically rare Caucasian language from the Nakh-Daghestanian family. Its conservation and study are constrained by the lack of sufficient NLP-tools that can be used to process field data. <br/>My proposal is to develop an fst-powered morphological analyzer for Bagvalal using all the available grammatical and lexical information. In the future this project can allow Apertium to support morphological analysis for multiple Nakh-Daghestanian languages and develop corresponding language pairs. |
|||
⚫ | |||
* '''GitHub repo''': [https://github.com/ruthenian8/bagvalal bagvalal] |
* '''GitHub repo''': [https://github.com/ruthenian8/bagvalal bagvalal] |
||
* '''Progress''': updates coming up |
* '''Progress''': updates coming up |
||
{| class="wikitable" |
|||
|- |
|||
! scope="col"| Week |
|||
! scope="col"| Intended changes |
|||
! scope="col"| Status |
|||
|- |
|||
! scope="row"| Week 1 |
|||
| Testing and refining the existing rules for the closed word classes (e. g. numerals, clitics and pronouns). |
|||
| Complete |
|||
|- |
|||
! scope="row"| Week 2 |
|||
| Writing documentation and tests. |
|||
| In progress |
|||
|- |
|||
! scope="row"| Week 3 & 4 |
|||
| Testing and refining the existing rules for the open word classes (e. g. verbs, nouns and adjectives).<br/>Writing documentation and tests. |
|||
| In progress |
|||
|- |
|||
! scope="row"| Week 5 |
|||
| Adding the missing adjectives and adverbs from the available dictionaries (see the Resources section above).<br/>Testing the analysis results and the model performance. |
|||
| In progress |
|||
|- |
|||
! scope="row"| Week 6 |
|||
| Adding the missing nouns.<br/>Testing the analysis results and the model performance. |
|||
| In progress |
|||
|- |
|||
! scope="row"| Week 7 & 8 |
|||
| Adding the missing verbs, participles, converbs and masdars. <br/>Testing the analysis results and the model performance. |
|||
| In progress |
|||
|- |
|||
! scope="row"| Week 9 & 10 |
|||
| Tokenizing the corpora.<br/>Converting the existing annotations to an appropriate format<br/>Creating word-analysis pairs.<br/>Writing documentation. |
|||
| In progress |
|||
|- |
|||
! scope="row"| Week 11 |
|||
| Expelling the false analyses from the model<br/>Testing and debugging.<br/>Finishing the work on the documentation |
|||
| In progress |
|||
|- |
|||
! scope="row"| Week 12 |
|||
| Running all the tests and debugging |
|||
| In progress |
|||
|} |
|||
⚫ |
Revision as of 10:55, 18 June 2021
- Title: Morphological analyzer for Bagvalal
- Proposal: proposal
- Abstract: Bagvalal is an endangered typologically rare Caucasian language from the Nakh-Daghestanian family. Its conservation and study are constrained by the lack of sufficient NLP-tools that can be used to process field data.
My proposal is to develop an fst-powered morphological analyzer for Bagvalal using all the available grammatical and lexical information. In the future this project can allow Apertium to support morphological analysis for multiple Nakh-Daghestanian languages and develop corresponding language pairs. - GitHub repo: bagvalal
- Progress: updates coming up
Week | Intended changes | Status |
---|---|---|
Week 1 | Testing and refining the existing rules for the closed word classes (e. g. numerals, clitics and pronouns). | Complete |
Week 2 | Writing documentation and tests. | In progress |
Week 3 & 4 | Testing and refining the existing rules for the open word classes (e. g. verbs, nouns and adjectives). Writing documentation and tests. |
In progress |
Week 5 | Adding the missing adjectives and adverbs from the available dictionaries (see the Resources section above). Testing the analysis results and the model performance. |
In progress |
Week 6 | Adding the missing nouns. Testing the analysis results and the model performance. |
In progress |
Week 7 & 8 | Adding the missing verbs, participles, converbs and masdars. Testing the analysis results and the model performance. |
In progress |
Week 9 & 10 | Tokenizing the corpora. Converting the existing annotations to an appropriate format Creating word-analysis pairs. Writing documentation. |
In progress |
Week 11 | Expelling the false analyses from the model Testing and debugging. Finishing the work on the documentation |
In progress |
Week 12 | Running all the tests and debugging | In progress |
Development log: Updates coming up