User:Vaydheesh/Proposal
Contents
- 1 GSoC Proposal : Python API/library for Apertium
- 2 Basic Details
- 3 Why am I interested in Machine Translation?
- 4 Why is it that I am interested in Apertium?
- 5 Which of the Ideas List am I interested in?
- 6 Why should Google and Apertium sponsor the project of Python API for Apertium?
- 7 How and who will benefit from this project?
- 8 Coding Challenge
- 9 Detailed project plan and workflow
- 10 Examinations
- 11 About me: Education and Experience
- 12 Non-Summer Of Code Plans
- 13 Post GSoC Plans
GSoC Proposal : Python API/library for Apertium [edit]
Basic Details[edit]
Name | Lokendra Singh |
Email Address | lokendras1998@gmail.com |
IRC Nick | loke98 |
Country & TimeZone | India (UTC + 5:30) |
Link to Gihub | https://github.com/vaydheesh |
Why am I interested in Machine Translation?[edit]
The broader perspective:
I belong to a diverse country, India, where "Every two miles the water changes, every four miles the speech". Having encountered many dilects of Hindi language such as Shauraseni, Hindustani, Braj Bhasha, Haryanvi, Bundeli, Kannauji, Awadhi, Bagheli, Chhattisgarhi, Bombay Hindi. Due to so much of variation in a language, linguistics has always fascinated me. Upon combining this with my passion of python and desire for contributing to open source community, Apertium is my choice for GSoC 2019.
Why is it that I am interested in Apertium?[edit]
During my projects on Machine Learning, I came across Natuaral Language Processing, which opened the world of Computer Linguistics for me. While browsing the list of organisations, Apertium Machine Translation caught my eye. It has a nice combination of coding challenges and linguistics. I have been using FREE softwares for past few years and now I want to start contributing to community. And Apertium seems to be the right choice to me.
Which of the Ideas List am I interested in?[edit]
Initially, I was confused between Unsupervised Learning and Python API, but I have decided upon the Python API/library for Apertium.
Why should Google and Apertium sponsor the project of Python API for Apertium?[edit]
Apertium is written in C++ which has very high performance, with high level of abstraction and is well standardized, however, it has few shortcomings. It is not so much beginner friendly and writing User-Interfaces in C++ is cumbersome. Python on the other hand, has a lot of features. Python has interpreted high-level programming environment. A python wrapper in SWIG combined with Jupyter Notebooks can provide flexibility, ease of installation, debugging, testing.
How and who will benefit from this project?[edit]
The project would bring a lot of developers at ease. Python is a high-level language with a lot of features that make it easier to grasp for developers. A lot of people like to use Python Jupyter Notebooks , and a Python module would increase the user community. Also the installation process of Apertium can be simplified by making it available on PyPI. This would also open the Apertium Library to a large user base on Microsoft Windows™. Hence I believe that if Apertium has a Python API, it would be helpful to a large community of developers, linguists, computational linguistics and all people keen on using the wide range of linguistic tools that we provide.
Coding Challenge[edit]
I've worked on Coding challenge 1, a Working installation of apertium via a setup.py file in a Windows environment. The Coding challenge was really interesting to work on. Though it seemed pretty easy, it had its own set of hidden challenges. I had to get familiar with Apertium Bash Helper Script, and the underlying binaries that it was using. I had to add Apertium Binaries to Process' Path, without permanently polluting the User's Environment Variables. Some tweaks were required in the existing code base to ensure that the Apertium-Python Module worked out of box, without creating any issues for its user.
While working on this Coding Challenge I was able to get familiar with the Apertium Code Base. In order to create this setup.py file, I had to understand the entire Apertium Python project, to ensure that all the minor tweaks were compatible with existing code, and didn't result into some unexpected errors.
As of now all the checks are completely passing, and waiting to be merged by an organisation member. Link to Pull Request
Detailed project plan and workflow[edit]
1. Tools To Be Used As suggested in the Ideas List, I plan to use SWIG. The Simplified Wrapper and Interface Generator is an open-source software tool used to connect computer programs or libraries written in C or C++ with scripting languages, in this case Python. The current implementation calls the Apertium Binaries as subprocess, which has it own share of over head, slowing down the translation process. SWIG can be used to create a wrapper on C++ files and generate modules that can be imported in python files. This shall provide us with speed of C++ and ease of usability of Python. Flowchart describing the process of generating python wrapper
2. Timeline
Goals for the various phases:
PHASE | OBJECTIVE OF PHASE |
---|---|
Community Bonding Period |
|
Phase 1 |
|
Phase 2 |
|
Phase 3 |
|
3. Bi Weekly Goals:
WEEK AND DATE | TASK EXPLANATION |
---|---|
Community Bonding Period |
|
Week 1&2, 27 May to 9 June |
|
Week 3&4, 10 June to 23 June |
|
Week 5&6, 24 June to 7 July |
|
Week 7&8, 8 July to 21 July |
|
Week 9&10, 22 July to 4 August |
|
Week 11&12, 5 August to 18 August |
|
4. Montly Deliverables
Deliverable | EXPLANATION |
---|---|
Deliverable 1 |
|
Deliverable 2 |
|
Deliverable 3 |
|
Examinations[edit]
My theory exams should be over by 4th week of May(25th May, 2019). My practical exams would be conducted in the following two weeks, i.e. 27th May, 2019 to 8th June, 2019. This might reduce my efficiency in the first two weeks of internship. Hence I plan to get the initial work started before the commencement of Coding Period(27th May, 2019), during the community bonding period. This should provide me with the head start required for timely submission of deliverables of the project. I am expecting that working on Morphological Analyzer, might take its share of time, being the first one to be implemented. To ensure sticking to my timeline I plan to work over time, allowing me to absorb the unexpected delays due to my examinations.
About me: Education and Experience[edit]
I am a Final Year student at Maharaja Agrasen Institute Of Technology, Delhi, India, pursuing B.Tech in Mechanical And Automation Engineering. I’ve worked with C++(Competetive Programming) and Python(Machine Learning and Web Scraping). And I have been using Arch Linux as my primary operating system for past 4 years. With this past experience, I am confident that I would be able to make a decent cross platform Pythonic API
Non-Summer Of Code Plans[edit]
I have my college vacations during the months of Google Summer of Code. And I would be able to devote around 40 man hours every week. I have no vacation plans.
Post GSoC Plans[edit]
1. Create SWIG wrapper for remaining lttoolbox files.
2. Convert the remaining codebase into python modules.
3. Work on the remaining portion and implement it in