Difference between revisions of "Tatar and Bashkir/GSOC 2018"
Jump to navigation
Jump to search
(Created page with "This is the report for Google Summer of Code 2018 project — Tatar-Bashkir machine translation. ==List of commits== * The list of all my commits can be found here: https://...") |
|||
Line 30: | Line 30: | ||
* Continue improving the coverage. |
* Continue improving the coverage. |
||
* Check (and fix if necessary) the words, mostly proper nouns, which were added using auto translation. |
* Check (and fix if necessary) the words, mostly proper nouns, which were added using auto translation. |
||
* Revise the dictionaries and drop duplicates. |
|||
* Drop duplicates and sort each part in alphabetical order. |
Revision as of 01:16, 10 August 2018
This is the report for Google Summer of Code 2018 project — Tatar-Bashkir machine translation.
List of commits
- The list of all my commits can be found here: https://apertium.projectjj.com/gsoc2018/zu-ann/zu-ann.html.
- tar.gz with commits can be downloaded here: https://apertium.projectjj.com/gsoc2018/zu-ann.tar.gz.
- zip with commits can be downloaded here: https://apertium.projectjj.com/gsoc2018/zu-ann.zip.
What was done
- Lexicons in bak.lexc were changed to correspond to the ones in tat.lexc, missing lexicons and tags were added to bak.lexc and new rules were added to bak.twol.
- The stems from tat.lexc were translated into Bashkir and added to bak.lexc and bidix.
- Words from the Bashkir frequency list http://lcph.bashedu.ru/index.php?go=wikilist_lemmas were translated into Tatar and added to tat.lexc and bidix.
- Using Russian-Tatar and Russian-Bashkir dictionaries new stems were added to tat.lexc, bak.lexc and bidix.
- Using Wikidata new toponyms were added to tat.lexc, bak.lexc and bidix.
Statistics
tat.lexc | bak.lexc | bak.twol | bidix | Bilingual Coverage | |
Before | |||||
After |
Future work
- Continue improving Bashkir monolingual transducer to make more possible word forms analyzed.
- Continue improving the coverage.
- Check (and fix if necessary) the words, mostly proper nouns, which were added using auto translation.
- Revise the dictionaries and drop duplicates.