Difference between revisions of "User:Eiji"

From Apertium
Jump to navigation Jump to search
Line 13: Line 13:
Investing the suitable tokenizer for east/south Asian languages without space and implementing it.
Investing the suitable tokenizer for east/south Asian languages without space and implementing it.


==reasons why Google and Apertium should sponsor it==
==Reasons why Google and Apertium should sponsor it==
Apertium translates European languages into other European languages mainly and my proposal for Google Summer of Code 2023 will open up the possibility for future translation in Asian languages which usually do not have space between words in sentences.
Apertium translates European languages into other European languages mainly and my proposal for Google Summer of Code 2023 will open up the possibility
for future translation in Asian languages which usually do not have space between words in sentences.


== Work plan ==
== Work plan ==
* Phase1
Week 1: Investigating word segmentation and tokenization from paper, and summarising useful findings into a report
Week 2: Testing possible algorithms for tokenization and becoming aware of pros and cons of them
Week 3: Analyzing drawbacks of current tokenization for apertium-iii and apertium-jpn
Week 4: Producing a hybrid model


* Phase2
* Week 1: Investigating word segmentation and tokenization from paper, and summarising useful findings into a report
* Week 2: Testing possible algorithms for tokenization and becoming aware of pros and cons of them
* Week 3: Analyzing drawbacks of current tokenization for apertium-iii and apertium-jpn
* Week 4: Producing a hybrid model


Week 5: Analyzing the model and improving it
* '''Deliverable #1'''
Week 6: Testing the hybrid model
Week 7: Evaluation of the model
Week 8: Converting the model into a faster language


* Phase3
* Week 5: Analyzing the model and improving it
* Week 6: Testing the hybrid model
* Week 7: Evaluation of the model
* Week 8: Converting the model into a faster language


Week 9: Converting the hybrid model into apertium-jpn
* '''Deliverable #2'''
Week 10: Testing and fixing bugs

Week 11: Continue to test
* Week 9: Converting the hybrid model into apertium-jpn
* Week 10: Testing and fixing bugs
Week 12: Finalise GSOC project: Writing report and complete tests
* Week 11: Continue to test
* Week 12: Finalise GSOC project: Writing report and complete tests


* '''Project completed'''
* '''Project completed'''

Revision as of 13:41, 9 March 2023

Contact Information

   Name:Eiji Miyamoto
   E-mail address:motopon57@gmail.com
   University: University of Manchester
   IRC:thelounge72
   github:https://github.com/yypy22

Why is it that you are interested in Apertium?

   I am intrigued by natural language processing and its usage. Apertium is open-source and free software for machine translation, so apertium match my interest. The community here is welcoming and supportive too.

Which of the published tasks are you interested in?

   Tokenization for spaceless orthographies in Japanese

What do you plan to do?

   Investing the suitable tokenizer for east/south Asian languages without space and implementing it. 

Reasons why Google and Apertium should sponsor it

   Apertium translates European languages into other European languages mainly and my proposal for Google Summer of Code 2023 will open up the possibility 
   for future translation in Asian languages which usually do not have space between words in sentences. 

Work plan

  • Phase1
   Week 1: Investigating word segmentation and tokenization from paper, and summarising useful findings into a report
   Week 2: Testing possible algorithms for tokenization and becoming aware of pros and cons of them
   Week 3: Analyzing drawbacks of current tokenization for apertium-iii and apertium-jpn
   Week 4: Producing a hybrid model
  • Phase2
   Week 5: Analyzing the model and improving it
   Week 6: Testing the hybrid model
   Week 7: Evaluation of the model
   Week 8: Converting the model into a faster language
  • Phase3
   Week 9: Converting the hybrid model into apertium-jpn
   Week 10: Testing and fixing bugs
   Week 11: Continue to test
   Week 12: Finalise GSOC project: Writing report and complete tests
  • Project completed