Difference between revisions of "User:Zigfruid/GSoC Final Report"
(Created page with "==Description== This project began with a proposal originally titled "Develop a prototype machine translation system for the uzb-> kaa strategic language pair." After discussi...") |
|||
Line 2: | Line 2: | ||
This project began with a proposal originally titled "Develop a prototype machine translation system for the uzb-> kaa strategic language pair." After discussing with the mentors the best way to get the most out of Summer of Code, we decided to cover the Uzbek monolingual package as much as possible along with a couple of Uzbek-Karakalpak translations. |
This project began with a proposal originally titled "Develop a prototype machine translation system for the uzb-> kaa strategic language pair." After discussing with the mentors the best way to get the most out of Summer of Code, we decided to cover the Uzbek monolingual package as much as possible along with a couple of Uzbek-Karakalpak translations. |
||
There are several more tasks that need to be completed, for example, I analyzed about and found those words that do not correspond to the lexical rules, and |
There are several more tasks that need to be completed, for example, I analyzed about and found those words that do not correspond to the lexical rules, and it needs to add words to the uzb-kaa.dix file |
||
In general, a lot of work has been done both on packages of Turic translations into Uzbek and Karakalpak, as well as on packages of Uzbek-Karakalpak translations. The results show that the targets originally set for coverage have almost been met, but the WER / PER results need to be improved.. |
In general, a lot of work has been done both on packages of Turic translations into Uzbek and Karakalpak, as well as on packages of Uzbek-Karakalpak translations. The results show that the targets originally set for coverage have almost been met, but the WER / PER results need to be improved.. |
Revision as of 16:20, 23 August 2021
Description
This project began with a proposal originally titled "Develop a prototype machine translation system for the uzb-> kaa strategic language pair." After discussing with the mentors the best way to get the most out of Summer of Code, we decided to cover the Uzbek monolingual package as much as possible along with a couple of Uzbek-Karakalpak translations.
There are several more tasks that need to be completed, for example, I analyzed about and found those words that do not correspond to the lexical rules, and it needs to add words to the uzb-kaa.dix file
In general, a lot of work has been done both on packages of Turic translations into Uzbek and Karakalpak, as well as on packages of Uzbek-Karakalpak translations. The results show that the targets originally set for coverage have almost been met, but the WER / PER results need to be improved..
Repositories
All the contributions can be found at following repositories: https://github.com/apertium/apertium-uzb-kaa
Most of the work that had been collected at the end of GSoC program can be found here : https://apertium.projectjj.com/gsoc2021/Zigfruid.html .
I have to point out that there are still some more Pull-Requests that haven't been merged yet. Such as these PRs:
Main Work
Furthermore, there were new additions and some fixes to the Karakalpak monodix with Uzbek monodix as well. It has been improved the quality of uzb-kaa by focusing on lexical and other translation errors in example texts expanded coverage by adding common stems missing in analysis of large corpora
Future Work
- Add more words to apertium-uzb.dix file
- Add more words to apertium-kaa.dix file
- Find words that do not match the lexical rules
- Try to achieve WER < 40% on the large articles on wiki
Conclusion
It has been a great experience for me working with Apertium over the past three months.I learned a lot and gained a lot of experience, and thanks to the mentor @jonorthwash for the constant response, every time If I had a question, he always answered to my questions and helped to me with the project.