Difference between revisions of "User:Eiji"
Jump to navigation
Jump to search
Line 7: | Line 7: | ||
==Why is it that you are interested in Apertium?== |
==Why is it that you are interested in Apertium?== |
||
I am intrigued by natural language processing and its usage. Apertium is open-source and free software for machine translation, so apertium match my interest. The community here is welcoming and supportive too. |
|||
==Which of the published tasks are you interested in?== |
==Which of the published tasks are you interested in?== |
||
Tokenization for spaceless orthographies in Japanese |
|||
==What do you plan to do?== |
==What do you plan to do?== |
||
Investing the suitable tokenizer for east/south Asian languages without space and |
Investing the suitable tokenizer for east/south Asian languages without space and implementing it. |
||
Include a proposal, including |
|||
⚫ | |||
* a title, |
|||
Apertium translates European languages into other European languages mainly and my proposal for Google Summer of Code 2023 will open up the possibility for future translation in Asian languages which usually do not have space between words in sentences. |
|||
⚫ | |||
* a description of how and who it will benefit in society, |
|||
* and a detailed work plan (including, if possible, a schedule with milestones and deliverables). |
|||
== Work plan == |
|||
* Week 1: Investigating word segmentation and tokenization from paper, and summarising useful findings into a report |
|||
* Week 1: |
|||
* Week 2: Testing possible algorithms for tokenization and becoming aware of pros and cons of them |
|||
* Week 2: |
|||
* Week 3: Analyzing drawbacks of current tokenization for apertium-iii and apertium-jpn |
|||
* Week 3: |
|||
* Week 4: |
* Week 4: Producing a hybrid model |
||
* '''Deliverable #1''' |
* '''Deliverable #1''' |
||
* Week 5: |
* Week 5: Analyzing the model and improving it |
||
* Week 6: |
* Week 6: Testing the hybrid model |
||
* Week 7: |
* Week 7: Evaluation of the model |
||
* Week 8: Converting the model into a faster language |
|||
* Week 8: |
|||
* '''Deliverable #2''' |
* '''Deliverable #2''' |
||
* Week 9: Converting the hybrid model into apertium-jpn |
|||
* Week 9: |
|||
* Week 10: |
* Week 10: Testing and fixing bugs |
||
* Week 11: |
* Week 11: Continue to test |
||
* Week 12: Finalise GSOC project: Writing report and complete tests |
|||
* Week 12: |
|||
* '''Project completed''' |
* '''Project completed''' |
Revision as of 13:37, 9 March 2023
Contents
Contact Information
Name:Eiji Miyamoto E-mail address:motopon57@gmail.com University: University of Manchester IRC:thelounge72 github:https://github.com/yypy22
Why is it that you are interested in Apertium?
I am intrigued by natural language processing and its usage. Apertium is open-source and free software for machine translation, so apertium match my interest. The community here is welcoming and supportive too.
Which of the published tasks are you interested in?
Tokenization for spaceless orthographies in Japanese
What do you plan to do?
Investing the suitable tokenizer for east/south Asian languages without space and implementing it.
==reasons why Google and Apertium should sponsor it==
Apertium translates European languages into other European languages mainly and my proposal for Google Summer of Code 2023 will open up the possibility for future translation in Asian languages which usually do not have space between words in sentences.
Work plan
- Week 1: Investigating word segmentation and tokenization from paper, and summarising useful findings into a report
- Week 2: Testing possible algorithms for tokenization and becoming aware of pros and cons of them
- Week 3: Analyzing drawbacks of current tokenization for apertium-iii and apertium-jpn
- Week 4: Producing a hybrid model
- Deliverable #1
- Week 5: Analyzing the model and improving it
- Week 6: Testing the hybrid model
- Week 7: Evaluation of the model
- Week 8: Converting the model into a faster language
- Deliverable #2
- Week 9: Converting the hybrid model into apertium-jpn
- Week 10: Testing and fixing bugs
- Week 11: Continue to test
- Week 12: Finalise GSOC project: Writing report and complete tests
- Project completed