Difference between revisions of "User:Arghya1998"
Jump to navigation
Jump to search
Arghya1998 (talk | contribs) |
Arghya1998 (talk | contribs) (Blanked the page) |
||
(27 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<span style="font-family:Courier;"> |
|||
=== GSoC Proposal : Python API/library for Apertium === |
|||
== Basic Details == |
|||
{|class="wikitable" style="width:100%" |
|||
|- |
|||
| |
|||
Name |
|||
| |
|||
Arghya Bhattacharya |
|||
|- |
|||
| |
|||
EMail Address |
|||
Alternate EMail Address |
|||
| |
|||
[[arghya.b@research.iiit.ac.in]] |
|||
arghyatiger@gmail.com |
|||
|- |
|||
| |
|||
IRC nick |
|||
| |
|||
arghya |
|||
|- |
|||
| |
|||
Mobile |
|||
| |
|||
+91 9831325363 |
|||
|- |
|||
| |
|||
TimeZone |
|||
| |
|||
UTC + 5:30 |
|||
|- |
|||
| |
|||
Link to Github |
|||
| |
|||
[[https://github.com/arghyatiger/]] |
|||
|} |
|||
== Why am I interested in Machine Translation ? == |
|||
'''The broader perspective:''' |
|||
Being from a diverse country like India, with over 22 officially registered languages and over 1500 mother tongue languages (150 of them are sizeable), I’ve always been curious as to how |
|||
languages serve as the basic entity of interaction. As a kid, I’ve lived in various places in India and hence i’ve had the chance to closely interact with people of different lingual |
|||
backgrounds and in the process i ended up learning quite a fewlanguages including Hindi, Bengali, English, Tamil, Oriya. The language diversity in my country is fascinating, but with it comes |
|||
a lot of problems and i believe <nowiki>Insert non-formatted text here</nowiki>that efficient machine translation can aid solving a lot of these problems and breaking the “language barrier” |
|||
across the country and the globe and connect people better. |
|||
'''Academic Interests:''' |
|||
I am currently pursuing my B.Tech in Computer Science + M.S by Research in Computational Linguistics Dual Degree program at IIIT-Hyderabad, India. A good portion of our academic focus is on Machine Translation and I really find it an interesting area to work on. So working with apertium will help me nurture my Computational Linguistics skills as well as give me a chance to help the community with whatever contribution i’m capable of making. |
|||
== Why is it that I am interested in Apertium ? == |
|||
Being a student, with primary academic focus on Computational Linguistics, Apertium happens to be one of the important tools that I use for my university assignments.The Apertium projects have a nice blend of Linguistic and Coding tasks and that makes the projects interesting to me. Also as a part of the long term goal of contributing to the community, I think contributions to Apertium would make a significant impact on the Computational Linguistics community all around the globe and that further motivates me to work for Apertium |
|||
== Which of the published tasks am I interested in? == |
|||
To me all the published tasks seem to be interesting and hence it becomes difficult to choose only one. But I have been able to narrow down to the project called Python API/library for Apertium |
|||
== Why should Google and Apertium sponsor the project of Python API for Apertium ? == |
|||
The Apertium code base is primarily written in C++. While C++ has a fairly high performance, supports low level systems programming and is fairly available everywhere and reasonably well standardized, however, there are a few shortcomings to it as well. Some of them include the non-interactiveness of c++, the compile/debug/nap cycle and the endless difficulties in extending and modifying the modules. Also, Once the development of a module is done with, certain improvements like writing User-Interfaces and systems integration become really cumbersome in C++. Python on the other hand has a lot of features that c++ doesn’t have. Python has a interpreted high level programming environment. And hence a python wrapper can provide flexibility, interactivity to Apertium’s code base. Also a lot of other features like ease of debugging, ease of testing, and rapid prototyping. |
|||
== How and Who will benefit from this project? == |
|||
The project would bring a lot of developers at ease as python is a high level language with a lot of features that make it easier to grasp for developers, and would increase the scalability of apertium in the future, also a lot of people like to use jupyter notebooks and python, and hence I believe that if apertium has a python API, it would be helpful to a large community of developers, linguists, computational linguistics and all people keen on using a wide range of linguistic tools. |
|||
== Detailed project plan and workflow == |
|||
'''1. Detailed Project Goal:''' |
|||
The Goal of the project is to create structured python wrappers for the core modules of apertium, namely: |
|||
* [[Apertium/Lttoolbox]] |
|||
* [[Apertium/Apertium]] |
|||
a.) The modules should be python importable,the pythonic usage would be as follows: |
|||
* from apertium.lttoolbox import trasducer |
|||
b.) The modules should be nested |
|||
* apertium.lttoolbox.transducer |
|||
c.) The internal usage of the functions should be as follows: |
|||
* import apertium.transducer.internal |
|||
* t = apertium.transducer.internal.Transducer().insertSingleTransduction() |
|||
'''2. Tool to be used: ''' |
|||
For the project, I plan on using SWIG to bind the C++ code. SWIG is a software development tool that simplifies the task of interfacing different languages to C and C++ programs. SWIG is a compiler that takes C declarations and creates the wrappers needed to access those declarations from other languages. Among the other options that I explored for the project are Pyrex, ctypes, SIP, Boost.python.But for projects of the scale of this one, SWIG seems to be the most convenient due to a lot of features explained later in the proposal. |
|||
'''3. Timeline :''' |
|||
Goals for the various phases: |
|||
{|class="wikitable" style="width:100%" |
|||
|- |
|||
| |
|||
<center>'''PHASE'''</center> |
|||
| |
|||
<center>'''OBJECTIVE'''</center> |
|||
|- |
|||
| |
|||
<center>COMMUNITY BONDING PERIOD</center> |
|||
| |
|||
<center>Good Understanding of all the modules, all the intricacies of binding each module and a detailed report of the modules</center> |
|||
|- |
|||
| |
|||
<center>CODING PHASE 1</center> |
|||
| |
|||
<center>Binding/Testing the Lttoolbox Module</center> |
|||
|- |
|||
| |
|||
<center>CODING PHASE 2</center> |
|||
| |
|||
<center>Binding/Testing the Apertium Module</center> |
|||
|- |
|||
| |
|||
<center>CODING PHASE 3</center> |
|||
| |
|||
<center>Documentation of usage of the python modules and library organization of the modules made in previous phases</center> |
|||
|} |
|||
Week-Wise Goals: |
|||
{|class="wikitable" |
|||
| |
|||
<center>'''TIME PERIOD'''</center> |
|||
| |
|||
<center>'''TASK PLAN'''</center> |
|||
|- |
|||
| |
|||
<center>COMMUNITY BONDING PERIOD</center> |
|||
<center>DATES:</center> |
|||
* START : April 23rd |
|||
* END : May 13th |
|||
| |
|||
* Playing around with the lttoolbox and apertium modules and using every function and understanding all the flags and arguments of the functions. |
|||
* Reading up on the details of SWIG. |
|||
* Taking inputs from various apertium users on what would be the ideal implementation that they would want. |
|||
|- |
|||
| |
|||
<center>WEEK ONE:</center> |
|||
* Lttoolbox setup |
|||
<center>DATES:</center> |
|||
* START : May 14th |
|||
* END : May 20th |
|||
| |
|||
* Setting up Disutils for the lttoolbox module and making the basic layout importable in Python. |
|||
|- |
|||
| |
|||
<center>WEEK TWO:</center> |
|||
* Variable handling in SWIG for Lttoolbox module |
|||
<center>DATES:</center> |
|||
* START : May 21st |
|||
* END : May 27th |
|||
| |
|||
* Making explicit declarations of Constants and Enumerations of the module in SWIG interface |
|||
* Testing all pointer based data manipulation for any errors. (A common problem that might occur with swig bindings) |
|||
* Looking for Data Members that need to be made read-only and making necessary changes in the interface file |
|||
* Identifying Static Class members,Python classes had no support for static methods and no version of Python supports static member variables in a manner that SWIG can utilize. Therefore, SWIG generates wrappers that try to work around some of these issues , but the other issues have to be taken care of manually. |
|||
* Resolving namespace problem of SWIG manually(occurs if there are multiple namespaces) |
|||
|- |
|||
| |
|||
<center>WEEK THREE:</center> |
|||
* Templating and Object Handling for Lttoolbox module |
|||
<center>DATES:</center> |
|||
* START : May 28th |
|||
* END : June 3rd |
|||
| |
|||
* In order to create wrappers, one has to tell SWIG to create wrappers for a particular template instantiation. Hence all the templates have to be explicitly declared specific to the data being manipulated in them. |
|||
* C++ Reference Counted Objects: Referencing and Dereferencing of objects have to be taken care of so that no error occurs, another place where SWIG isn’t smart enough. |
|||
* Handling C++ overloaded functions: Overloading support is not quite as flexible as in C++. Sometimes there are methods that SWIG can't disambiguate, if such errors appear then they have to be taken care of manually in the interface file of the wrapper. |
|||
|- |
|||
| |
|||
<center>WEEK FOUR:</center> |
|||
* Testing and improving cross language polymorphism |
|||
* Making the module more Pythonistic |
|||
* Exception Handling |
|||
<center>DATES:</center> |
|||
* START : June 4th |
|||
* END : June 10th |
|||
| |
|||
* Implement Director Classes: No mechanism exists to pass method calls down the inheritance chain from C++ to Python. In particular, if a C++ class has been extended in Python, these extensions will not be visible from C++ code. Virtual method calls from C++ are thus not able access the lowest implementation in the inheritance chain. There exists a feature implemented in SWIG called directors, The job of the directors is to route method calls correctly, either to C++ implementations higher in the inheritance chain or to Python implementations lower in the inheritance chain. |
|||
* Writing c++ helper functions: Sometimes the SWIG module misses bits of functionality because there is no easy way to construct and manipulate a suitable datatype, for those cases c++ helper functions need to be written. |
|||
* Writing High Level Python function to provide a high-level Python interface built on top of low-level helper functions. |
|||
* Error Handling: If C++ throws an erro then it is better to convert it into a python exception. |
|||
|- |
|||
| |
|||
<center>WEEK FIVE: </center> |
|||
* Apertium setup |
|||
<center>DATES:</center> |
|||
* START : June 11th |
|||
* END : June 17th |
|||
| |
|||
* Ref : week1 |
|||
|- |
|||
| |
|||
<center>WEEK SIX:</center> |
|||
* Variable handling in SWIG for Apertium module |
|||
<center>DATES:</center> |
|||
* START : June 18th |
|||
* END : June 24th |
|||
| |
|||
* Ref : week2 |
|||
|- |
|||
| |
|||
<center>WEEK SEVEN:</center> |
|||
* Templating and Object Handling for Lttoolbox module |
|||
<center>DATES:</center> |
|||
* START : June 25th |
|||
* END : July 1st |
|||
| |
|||
* Ref : week3 |
|||
|- |
|||
| |
|||
<center>WEEK EIGHT:</center> |
|||
* Testing and improving cross language polymorphism |
|||
* Making the module more pythonistic |
|||
* Exception Handling |
|||
<center>DATES:</center> |
|||
* START : July 2nd |
|||
* END : July 8th |
|||
| |
|||
* Ref : week4 |
|||
|- |
|||
| |
|||
<center>WEEK NINE:</center> |
|||
* Extensive alpha testing of modules built |
|||
<center>DATES:</center> |
|||
* START : July 9th |
|||
* END : July 15th |
|||
| |
|||
* Testing the modules built and starting the documentation. |
|||
|- |
|||
| |
|||
<center>WEEK TEN:</center> |
|||
* Finishing Documentation |
|||
<center>DATES:</center> |
|||
* START : July 16th |
|||
* END : July 22nd |
|||
| |
|||
* Finishing the documentation of the module and distribute for Beta testing |
|||
|- |
|||
| |
|||
<center>WEEK ELEVEN:</center> |
|||
* Beta testing and changes(if any) |
|||
<center>DATES:</center> |
|||
* START : July 23rd |
|||
* END : July 29th |
|||
| |
|||
* Taking reviews of beta testing and implementing changes if any. |
|||
|- |
|||
| |
|||
<center>WEEK TWELVE:</center> |
|||
* Deciding on the library structure |
|||
* Making module pip installable |
|||
<center>DATES:</center> |
|||
* START : July 30th |
|||
* END : August 5th |
|||
| |
|||
* Making the super wrapper for the modules. |
|||
* Making the module pip installable |
|||
* Update Documentation |
|||
|- |
|||
| |
|||
<center>WEEK THIRTEEN: </center> |
|||
* Final reviews and bug report analysis |
|||
<center>DATES:</center> |
|||
* START : August 6th |
|||
* END : August 14th |
|||
| |
|||
* Analyse and make bug report for the bugs in the code. |
|||
* Make Final documentation |
|||
* Release Final Module |
|||
|} |
|||
== Coding Challenge == |
|||
1.)Make the Transducer model python importable |
|||
== About me: Education and Experience == |
|||
I am a sophomore at IIIT-Hyderabad, India, pursing my Dual-Degree in Computer Science and Computational Linguistics. I’ve worked with C++ and Python closely in a lot of projects and I take keen interest in machine learning as well.I usually love building fun applications. The details of my work experience can be found [[here]]. |
|||
</span> |