User:Techievena/Proposal
Contents
Contact Information
Name: Abinash Senapati
IRC Nickname: Techievena
E-mail address: techievena@users.sourceforge.net
Website: https://techievena.github.io
Github: https://github.com/Techievena
Gitlab: https://gitlab.com/Techievena
Time Zone: UTC +5:30 (IST-India)
Location: Bhubaneswar, India or Bangalore, India
GSoC blog URL: https://techievena.github.io/categories/GSoC
School/Degree: B.Tech. in Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, India
Expected Graduation Year: 2019
Code Sample
I have been actively involved with open source organisations in some or other way for more than a year now. During this time I have gained a lot of experience regarding the do’s and the don’ts of the open source communities and have also helped the open source community in my university and at coala to maintain integrity and helped a number of new developers bootstrap their open source journey. I have also been a Google Code-in mentor for coala organisation during 2017-18.
I have previous experience of working with apertium and have got my commits merged into the main repository of apertium. I have improved the apertium-lint, a GSoC 2016 project of Apertium:
I have also been involved with coala for more than a year now and am a member of the aspects team of coala. I am also a member of the open source development team of gitmate. All my contribution to the open source project are listed below:
Other significant contributions I would like to mention:
- I have been a constant and daily reviewer and have helped in newcomer orientation by helping them at times when they get stuck. The link to reviews:
- I have also opened up significant number of issues in some repositories:
- I have created a NuGet package for LLVM 3.4.1 for the use of coala and am currently working to improve that as a part of an issue. Here is the link for that package.
University projects/ Personal projects
Interest in machine translation
It is quite uncomfortable to regard language just as a communication method. Huge experiences, history and views of the world human being have been soaked deeply into a language since the beginning of the world. I come from country that houses over 1600 languages of which around 150 have a sizeable speaking population even today. Most of them are on the verge of extinction and with ordered morphological rules, RBMT is the promising option for their restoration.
As I have taken up Formal Languages and Automata Theory course this semester in my university, I am familiar with the concepts of Deterministic Finite Automata, Non-deterministic Finite Automata, Weighted Transducers, Finite State Machines and other concepts essential to take up a project in machine translation.
Interest in Apertium project
Apertium is a rule-based machine translation platform. I began working on Apertium about an year ago to modify apertium-lint tool to support multiple configuration files, as it was essential for writing the ApertiumlintBear module for coala.
From then, I wanted to explore the territory of linguistics and machine translation. Apertium is quite old and a popular project and is even used by Wikimedia for translation purposes. It is a very active open-source organisation in which maintenance and development cycle are pretty regular and therefore it is a great learning platform for noobs to learn from the veterans through interaction and contribution.
Project Info
Proposal Title
Apertium: Extend lttoolbox to have the power of HFST
Lttoolbox is a toolbox in the Apertium project for lexical processing, morphological analysis and generation of words. With the current format of Apertium it is quite hard to perform complex morphological transformations with lttoolbox. This project aims on extending the capability of performing morphographemics and adding lexical weights to the lttoolbox transducer in order to enable more complex translations with the transducer. By the end of this project, lttoolbox will be sophisticated enough to support most of the language pairs.
Proposal Abstract
Proposal Description
Motivation / Benefit to the society
Currently, most languages pairs in Apertium use the lttoolbox due to its straightforward syntax and has some very useful features for validation of lemmas, but it is not suited to agglutinative languages, or languages with complex and regular morphophonology. This void thus makes it harder for new developers for they must necessarily learn working with both the platforms to improve or start a new language pair with Apertium. Here comes the necessity for having a single module in Apertium for all the language pair translations to make the project more developer friendly.
Reason for sponsorship
I am a volunteer and am working in my free time to improve Apertium. I am a student and I need to earn money during the summers one way or another. A sponsorship can reinforce my self-confidence and give me greater powers of resilience. The writing of a preprocessor for lttoolbox is a huge effort, as a big rework is required for this, but with high benefits in the area of machine translation. I would like to flip bits instead of burgers this year. Making a full job out of this means that I can focus all of my energy and dedication to this project, and this effort needs full time work to be successful.
Implementation
Road to the future
Timeline
MILESTONES
Community Bonding Period (April 23 - May 13)
Do an extensive research on the currently used architecture for morphological analysis.
Coding Phase I (May 14 - June 10)
- Week 1 (May 14 - May 20)
- Week 2 (May 21 - May 27)
- Week 3 (May 28 - June 3)
- Week 4 (June 04 - June 10)
Coding Phase II (June 11 - July 8)
- Week 5 (June 11 - June 17)
- Week 6 (June 18 - June 24)
- Week 7 (June 25 - July 1)
- Week 8 (July 2 - July 8)
Coding Phase III (July 9 - August 14)
- Week 9 (July 9 - July 15)
- Week 10 (July 16 - July 22)
- Week 11 (July 23 - July 29)
- Week 12 (July 30 - August 5)
- Week 13 (August 6 - August 14)
Coding Challenges
- Install Apertium.
- lexc2dix → hfst lexc/twolc to monodix++ converter.
Other Commitments
I consider GSoC as a full time job and do not have any other commitments as such. I only have my departmental summer course during this period which won’t take much of my time. I am going to spend most of my time coding and reading wiki and articles for my project. I am willing to contribute 40+ hours per week for my GSoC project and may extend that if required to fulfill the milestones of my timeline. In case of any unplanned schedules or short-term commitments, I will ensure that my mentors are informed about my unavailability beforehand and make suitable adjustments in my schedule to compensate for that. My university has end semester examinations in the beginning of the month of May and a short vacation from 10th May to 15th May. So, I will be a bit occupied during the Community Bonding Period, but surely I’ll ensure that it won’t affect my dedication for GSoC.