Difference between revisions of "User:Arghya1998"
Arghya1998 (talk | contribs) |
Arghya1998 (talk | contribs) |
||
Line 267: | Line 267: | ||
I am a sophomore at IIIT-Hyderabad, India, pursuing my Dual-Degree in Computer Science and Computational Linguistics. I’ve worked with C++ and Python closely in a lot of projects and I take keen interest in machine learning as well.I usually love building fun applications. The details of my work experience can be found [[here]]. |
I am a sophomore at IIIT-Hyderabad, India, pursuing my Dual-Degree in Computer Science and Computational Linguistics. I’ve worked with C++ and Python closely in a lot of projects and I take keen interest in machine learning as well.I usually love building fun applications. The details of my work experience can be found [[here]]. |
||
{{#css: |
|||
#bodyContent { background-color: {{{1|yellow}}}; }<!-- Page color --> |
|||
body { |
|||
background: {{{2|navajowhite}}};<!-- Border color --> |
|||
} |
|||
}} |
|||
</span> |
</span> |
||
Revision as of 04:26, 19 March 2018
Contents
- 1 GSoC Proposal : Python API/library for Apertium
- 2 Basic Details
- 3 Why am I interested in Machine Translation?
- 4 Why is it that I am interested in Apertium?
- 5 Which of the published tasks am I interested in?
- 6 Why should Google and Apertium sponsor the project of Python API for Apertium?
- 7 How and Who will benefit from this project?
- 8 Detailed project plan and workflow
- 9 Coding Challenge
- 10 About me: Education and Experience
GSoC Proposal : Python API/library for Apertium
Basic Details
Name | Arghya Bhattacharya |
Email Address | arghya.b@research.iit.ac.in |
Alternate Email Address | arghyatiger@gmail.com |
IRC Nick | arghya[m] |
Mobile | +91 9831325363 |
TimeZone | UTC + 5:30 |
Link to Gihub | https://github.com/arghyatiger |
Link to Gitlab | https://gitlab.com/arghyatiger |
Why am I interested in Machine Translation?
The broader perspective:
Being from a diverse country like India, with over 22 officially registered languages and over 1500 mother tongue languages (150 of them are sizeable), I’ve always been curious as to how languages serve as the basic entity of communication. During my childhood, I have lived in various places in India and hence I have had the chance to closely interact with people of different lingual backgrounds and in the process I ended up learning quite a few languages including Hindi, Bengali, English, Tamil, Oriya. The language diversity in my country is fascinating, but with it comes a lot of problems in communication and I believe that efficient machine translation can aid a lot of these problems and breaking the “language barrier” across not just the country and the globe and connect people better.
Academic Interests:
I am currently pursuing my B.Tech in Computer Science + M.S by Research in Computational Linguistics Dual Degree program at IIIT-Hyderabad, India. A good portion of our academic focus is on Machine Translation and I find it a really interesting area to work on. So working with apertium will help me nurture my Computational Linguistics skills as well as give me a chance to give back to the community with some solid contribution.
Why is it that I am interested in Apertium?
Being a student, with primary academic focus on Computational Linguistics, Apertium happens to be one of the important tools that I use for my university assignments.The Apertium projects provide a nice blend of linguistic and coding tasks and that makes the projects interesting to me. Also as a part of the long-term goal of contributing to the community, I think contributions to Apertium would make a significant impact on the Computational Linguistics community all around the globe and that further motivates me to work for Apertium
Which of the published tasks am I interested in?
To me, all the published tasks seem to be interesting and hence it was difficult to choose only one. But I have been able to narrow down to the project which I like the most. It is called Python API/library for Apertium
Why should Google and Apertium sponsor the project of Python API for Apertium?
The Apertium code base is primarily written in C++. While C++ has a fairly high performance, supports low-level systems programming and is fairly available everywhere and reasonably well standardized, however, there are a few shortcomings to it as well. Some of them include the non-interactiveness of C++, the compile/debug/nap cycle and the endless difficulties in extending and modifying the modules. Also, Once the development of a module is done with, certain improvements like writing User-Interfaces and systems integration become really cumbersome in C++. Python, on the other hand, has a lot of features that c++ doesn’t have. Python has an interpreted high-level programming environment. And hence a python wrapper can provide flexibility, interactivity to Apertium’s code base. Also a lot of other features like ease of debugging, ease of testing, and rapid prototyping.
How and Who will benefit from this project?
The project would bring a lot of developers at ease as python is a high-level language with a lot of features that make it easier to grasp for developers, and would increase the scalability of apertium in the future, also a lot of people like to use jupyter notebooks and python, and hence I believe that if apertium has a python API, it would be helpful to a large community of developers, linguists, computational linguistics and all people keen on using a wide range of linguistic tools.
Detailed project plan and workflow
1. Detailed Project Goal:
The Goal of the project is to create structured python wrappers for the core modules of apertium, namely:
a.) The modules should be python importable, the pythonic usage would be as follows:
from apertium.lttoolbox import transducer
b.) The modules should be nested
apertium.lttoolbox.transducer
c.) The internal usage of the functions should be as follows:
import apertium.transducer.internal t = apertium.transducer.internal.Transducer().insertSingleTransduction()
2. Tool to be used:
For the project, I plan on using SWIG to bind the C++ code. SWIG is a software development tool that simplifies the task of interfacing different languages to C and C++ programs. SWIG is a compiler that takes C declarations and creates the wrappers needed to access those declarations from other languages. Among the other options that I explored for the project are Pyrex, ctypes, SIP, Boost.python.But for projects of the scale of this one, SWIG seems to be the most convenient due to a lot of features explained later in the proposal.
3. Timeline :
Goals for the various phases:
PHASE | OBJECTIVE OF PHASE |
---|---|
Community Bonding Period |
|
Phase 1 |
|
Phase 2 |
|
Phase 3 |
|
Week-Wise Goals:
Phase And Brief Description | Duration | Task Explanation | Deliverable for the week |
---|---|---|---|
COMMUNITY BONDING PERIOD | START: April 23rd
END: May 13th |
|
|
Week ONE : Lttoolbox setup | START : May 14th
END : May 20th |
|
|
WEEK TWO: Variable handling in SWIG for Lttoolbox module | START : May 21st
END : May 27th |
|
|
WEEK THREE: Templating and Object Handling for Lttoolbox module | START : May 28th
END : June 3rd |
|
|
WEEK FOUR:Testing and improving cross language polymorphism, Making the module more pythonistic, Exception Handling | START : June 4th
END : June 10th |
|
|
WEEK FIVE: Apertium setup | START : June 11th
END : June 17th |
|
|
WEEK SIX: Variable handling in SWIG for Apertium module | START : June 18th
END : June 24th |
|
|
WEEK SEVEN: Templating and Object Handling for Lttoolbox module | START : June 25th
END : July 1st |
|
|
WEEK EIGHT: Testing and improving cross language polymorphism, Making the module more pythonistic, Exception Handling | START : July 2nd
END : July 8th |
|
|
WEEK NINE: Extensive alpha testing of modules built | START : July 9th
END : July 15th |
|
|
WEEK TEN: Finishing Documentation | START : July 16th
END : July 22nd |
|
|
WEEK ELEVEN: Beta testing and changes(if any) | START : July 23rd
END : July 29th |
|
|
WEEK TWELVE: Deciding on the library structure and making module pip installable | START : July 30th
END : August 5th |
|
|
WEEK THIRTEEN: Final reviews and bug report | START : August 6th
END : August 14th |
|
|
Coding Challenge
1.)Make the Transducer model python importable
About me: Education and Experience
I am a sophomore at IIIT-Hyderabad, India, pursuing my Dual-Degree in Computer Science and Computational Linguistics. I’ve worked with C++ and Python closely in a lot of projects and I take keen interest in machine learning as well.I usually love building fun applications. The details of my work experience can be found here.