Difference between revisions of "User:Eiji"

Revision as of 10:35, 11 March 2023

Contact Information

   Name:Eiji Miyamoto
   E-mail address:motopon57@gmail.com
   University: University of Manchester
   IRC:thelounge72
   github:https://github.com/yypy22

Why is it that you are interested in Apertium?

   I am intrigued by natural language processing and its usage. Apertium is open-source and free software for machine translation, so apertium match my interest. The community here is welcoming and supportive too.

Which of the published tasks are you interested in?

   Tokenization for spaceless orthographies in Japanese

What do you plan to do?

   Investing the suitable tokenizer for east/south Asian languages without space and implementing it.

Reasons why Google and Apertium should sponsor it

   Apertium translates European languages into other European languages mainly and my proposal for Google Summer of Code 2023 will open up the possibility 
   for future translation in Asian languages which usually do not have space between words in sentences.

Work plan

Phase1

}

Project completed

Week 1 May 29 - June 4	Investigating word segmentation and tokenization from paper, and summarising useful findings into a report	Looking up NLP conference papers on word segmentation or tokenization, and checking popular Japanese tokenizers such as MeCab as well.
Week 2: June 5 - June 11	Testing possible algorithms for tokenization and becoming aware of pros and cons of them	N-gram, Longest-match left-to-right (LRLM), Maximal matching, Viterbi
Week 3: June 12 - June 18	Analyzing drawbacks of current tokenization for apertium-iii and apertium-jpn
Week 4: June 19 - June 25	Producing a hybrid model
Week 5: June 26 - July 2	Analyzing the model and improving it
Week 6: July 3 - July 9	Testing the hybrid model
Mid-term Evaluation
Week 7: July 10 - July 16	Evaluation of the model
Week 8: July 17 - July 23	Converting the model into a faster language
Week 9: July 24 - July 30	Converting the hybrid model into apertium-jpn
Week 10: July 31 - August 6	testing and fixing bugs
Week 11: August 7 - August 13	Continue to test
Week 12: August 14 - August 20	Finalise GSOC project: Writing report and complete tests

@@ Line 18: / Line 18: @@
 == Work plan ==
+{| class="wikitable" border="1"
 * '''Phase1'''
+|-
-    Week 1: Investigating word segmentation and tokenization from paper, and summarising useful findings into a report
+| Week 1
-    Week 2: Testing possible algorithms for tokenization and becoming aware of pros and cons of them
+May 29 - June 4
-    Week 3: Analyzing drawbacks of current tokenization for apertium-iii and apertium-jpn
+|Investigating word segmentation and tokenization from paper, and summarising useful findings into a report
-    Week 4: Producing a hybrid model
+|Looking up NLP conference papers on word segmentation or tokenization, and checking popular Japanese tokenizers such as MeCab as well.
+|-
+|-
+|Week 2:
+June 5 - June 11
+|Testing possible algorithms for tokenization and becoming aware of pros and cons of them
+|N-gram, Longest-match left-to-right (LRLM), Maximal matching, Viterbi
+|-
+|-
+|Week 3:
+June 12 - June 18
+|Analyzing drawbacks of current tokenization for apertium-iii and apertium-jpn
+|-
+|-
+| Week 4:
+June 19 - June 25
+|Producing a hybrid model
+|-
+|-
-* '''Phase2'''
+| Week 5:
+June 26 - July 2
-    Week 5: Analyzing the model and improving it
+|Analyzing the model and improving it
-    Week 6: Testing the hybrid model
+|-
-    Week 7: Evaluation of the model
+|-
-    Week 8: Converting the model into a faster language
+| Week 6:
+July 3 - July 9
-* '''Phase3'''
+|Testing the hybrid model
+|-
-    Week 9: Converting the hybrid model into apertium-jpn
+|-
-    Week 10: Testing and fixing bugs
+|-
-    Week 11: Continue to test
+|Mid-term Evaluation
-    Week 12: Finalise GSOC project: Writing report and complete tests
+|-
+| Week 7:
+July 10 - July 16
+|Evaluation of the model
+|-
+|-
+| Week 8:
+July 17 - July 23
+|Converting the model into a faster language
+|-
+|-
+| Week 9:
+July 24 - July 30
+|Converting the hybrid model into apertium-jpn
+|-
+|-
+| Week 10:
+July 31 - August 6
+|testing and fixing bugs
+|-
+|-
+| Week 11:
+August 7 - August 13
+|Continue to test
+|-
+| Week 12:
+August 14 - August 20
+|Finalise GSOC project: Writing report and complete tests
+|-
+}
 * '''Project completed'''

Difference between revisions of "User:Eiji"

Revision as of 10:35, 11 March 2023

Contents

Contact Information

Why is it that you are interested in Apertium?

Which of the published tasks are you interested in?

What do you plan to do?

Reasons why Google and Apertium should sponsor it

Work plan

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools