Difference between revisions of "User:Arghya1998"

From Apertium
Jump to navigation Jump to search
(Blanked the page)
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<span style="font-family:Courier;">
 
=== <center> GSoC Proposal : Python API/library for Apertium </center> ===
 
 
== Basic Details ==
 
 
 
{| class="wikitable" style="width:100%"
 
| Name
 
| Arghya Bhattacharya
 
|-
 
| Email Address
 
| arghya.b@research.iit.ac.in
 
|-
 
| Alternate Email Address
 
| arghyatiger@gmail.com
 
|-
 
| IRC Nick
 
| arghya[m]
 
|-
 
| Mobile
 
| +91 9831325363
 
|-
 
| TimeZone
 
| UTC + 5:30
 
|-
 
| Link to Gihub
 
| https://github.com/arghyatiger
 
|-
 
| Link to Gitlab
 
| https://gitlab.com/arghyatiger
 
|}
 
 
== Why am I interested in Machine Translation? ==
 
 
'''The broader perspective:'''
 
 
Being from a diverse country like India, with over 22 officially registered languages and over 1500 mother tongue languages (150 of them are sizeable), I&rsquo;ve always been curious as to how
 
languages serve as the basic entity of communication. During my childhood, I have lived in various places in India and hence I have had the chance to closely interact with people of different lingual backgrounds and in the process I ended up learning quite a few languages including Hindi, Bengali, English, Tamil, Oriya. The language diversity in my country is fascinating, but with it comes a lot of problems in communication and I believe that efficient machine translation can aid a lot of these problems and breaking the &ldquo;language barrier&rdquo; across not just the country and the globe and connect people better.
 
 
 
'''Academic Interests:'''
 
 
I am currently pursuing my B.Tech in Computer Science + M.S by Research in Computational Linguistics Dual Degree program at IIIT-Hyderabad, India. A good portion of our academic focus is on Machine Translation and I find it a really interesting area to work on. So working with apertium will help me nurture my Computational Linguistics skills as well as give me a chance to give back to the community with some solid contribution.
 
 
 
 
== Why is it that I am interested in Apertium? ==
 
 
Being a student, with primary academic focus on Computational Linguistics, Apertium happens to be one of the important tools that I use for my university assignments.The Apertium projects provide a nice blend of linguistic and coding tasks and that makes the projects interesting to me. Also as a part of the long-term goal of contributing to the community, I think contributions to Apertium would make a significant impact on the Computational Linguistics community all around the globe and that further motivates me to work for Apertium
 
 
 
 
== Which of the published tasks am I interested in? ==
 
 
To me, all the published tasks seem to be interesting and hence it was difficult to choose only one. But I have been able to narrow down to the project which I like the most. It is called Python API/library for Apertium
 
 
 
 
== Why should Google and Apertium sponsor the project of Python API for Apertium? ==
 
 
The Apertium code base is primarily written in C++. While C++ has a fairly high performance, supports low-level systems programming and is fairly available everywhere and reasonably well standardized, however, there are a few shortcomings to it as well. Some of them include the non-interactiveness of C++, the compile/debug/nap cycle and the endless difficulties in extending and modifying the modules. Also, Once the development of a module is done with, certain improvements like writing User-Interfaces and systems integration become really cumbersome in C++. Python, on the other hand, has a lot of features that c++ doesn&rsquo;t have. Python has an interpreted high-level programming environment. And hence a python wrapper can provide flexibility, interactivity to Apertium&rsquo;s code base. Also a lot of other features like ease of debugging, ease of testing, and rapid prototyping.
 
 
 
 
== How and Who will benefit from this project? ==
 
 
The project would bring a lot of developers at ease as python is a high-level language with a lot of features that make it easier to grasp for developers, and would increase the scalability of apertium in the future, also a lot of people like to use jupyter notebooks and python, and hence I believe that if apertium has a python API, it would be helpful to a large community of developers, linguists, computational linguistics and all people keen on using a wide range of linguistic tools.
 
 
 
 
== Detailed project plan and workflow ==
 
 
'''1. Detailed Project Goal:'''
 
 
The Goal of the project is to create structured python wrappers for the core modules of apertium, namely:
 
 
* [[Apertium/Lttoolbox]]
 
* [[Apertium/Apertium]]
 
 
a.) The modules should be python importable, the pythonic usage would be as follows:
 
from apertium.lttoolbox import transducer
 
b.) The modules should be nested
 
apertium.lttoolbox.transducer
 
c.) The internal usage of the functions should be as follows:
 
import apertium.transducer.internal
 
t = apertium.transducer.internal.Transducer().insertSingleTransduction()
 
 
'''2. Tool to be used: '''
 
 
For the project, I plan on using SWIG to bind the C++ code. SWIG is a software development tool that simplifies the task of interfacing different languages to C and C++ programs. SWIG is a compiler that takes C declarations and creates the wrappers needed to access those declarations from other languages. Among the other options that I explored for the project are Pyrex, ctypes, SIP, Boost.python.But for projects of the scale of this one, SWIG seems to be the most convenient due to a lot of features explained later in the proposal.
 
 
'''3. Timeline :'''
 
 
Goals for the various phases:
 
{| class="wikitable" style="width:100%"
 
! PHASE
 
! OBJECTIVE OF PHASE
 
|- style="background-color:#d2f7b2;"
 
| Community Bonding Period
 
|
 
* Good Understanding of all the modules, all the intricacies of binding each module and a detailed report of the modules
 
|- style="background-color:#ccd3ff;"
 
| Phase 1
 
|
 
* Binding/Testing the Lttoolbox Module
 
|- style="background-color:#f2c1f2;"
 
| Phase 2
 
|
 
* Binding/Testing the Apertium Module
 
|- style="background-color:#f9c7d0;"
 
| Phase 3
 
|
 
* Documentation of usage of the python modules and library organization of the modules made in previous phases
 
|}
 
 
 
Week-Wise Goals:
 
 
{| class="wikitable"
 
! Phase And Brief Description
 
! Duration
 
! Task Explanation
 
! Deliverable for the week
 
|-style="background-color:#ccffcc;"
 
| COMMUNITY BONDING PERIOD
 
| START: April 23rd
 
END: May 13th
 
|
 
* Playing around with the lttoolbox and apertium modules and using every function and understanding all the flags and arguments of the functions.
 
* Reading up on the details of SWIG.
 
* Taking inputs from various apertium users on what would be the ideal implementation that they would want.
 
|
 
|-style="background-color:#b3ffb3;"
 
| Week ONE : Lttoolbox setup
 
| START : May 14th
 
END : May 20th
 
|
 
* Making explicit declarations of Constants and Enumerations of the module in SWIG interface
 
* Testing all pointer based data manipulation for any errors. (A common problem that might occur with swig bindings)
 
* Looking for Data Members that need to be made read-only and making necessary changes in the interface file,
 
* Identifying Static Class members, Python classes had no support for static methods and no version of Python supports static member variables in a manner that SWIG can utilize. Therefore, SWIG generates wrappers that try to work around some of these issues, but the other issues have to be taken care of manually.
 
* Resolving namespace problem of SWIG manually(occurs if there are multiple namespaces)
 
|
 
* First importable wrapper of Lttoolbox module
 
|-style="background-color:#80ff80;"
 
| WEEK TWO: Variable handling in SWIG for Lttoolbox module
 
| START : May 21st
 
END : May 27th
 
|
 
* Making explicit declarations of Constants and Enumerations of the module in SWIG interface
 
* Testing all pointer based data manipulation for any errors. (A common problem that might occur with swig bindings)
 
* Looking for Data Members that need to be made read-only and making necessary changes in the interface file
 
* Identifying Static Class members: Python classes had no support for static methods and no version of Python supports static member variables in a manner that SWIG can utilize. Therefore, SWIG generates wrappers that try to work around some of these issues, but the other issues have to be taken care of manually.
 
* Resolving namespace problem of SWIG manually(occurs if there are multiple namespaces)
 
|
 
* Second version of the wrapper with all data type usage support
 
|-style="background-color:#99ff99;"
 
| WEEK THREE: Templating and Object Handling for Lttoolbox module
 
| START : May 28th
 
END : June 3rd
 
|
 
* In order to create wrappers, one has to tell SWIG to create wrappers for a particular template instantiation. Hence all the templates have to be explicitly declared specific to the data being manipulated in them,.
 
* C++ Reference Counted Objects: Referencing and Dereferencing of objects have to be taken care of so that no error occurs, another place where SWIG isn’t smart enough.
 
* Handling C++ overloaded functions: Overloading support is not quite as flexible as in C++. Sometimes there are methods that SWIG can't disambiguate, if such errors appear then they have to be taken care of manually in the interface file of the wrapper.
 
|
 
* Third version of the wrapper with all functions importable from python.
 
|-style="background-color:#66ff66;"
 
| WEEK FOUR:Testing and improving cross language polymorphism, Making the module more pythonistic, Exception Handling
 
| START : June 4th
 
END : June 10th
 
|
 
* Implement Director Classes: No mechanism exists to pass method calls down the inheritance chain from C++ to Python. In particular, if a C++ class has been extended in Python, these extensions will not be visible from C++ code. Virtual method calls from C++ are thus not able access the lowest implementation in the inheritance chain. There exists a feature implemented in SWIG called directors, The job of the directors is to route method calls correctly, either to C++ implementations higher in the inheritance chain or to Python implementations lower in the inheritance chain.
 
* Writing c++ helper functions: Sometimes the SWIG module misses bits of functionality because there is no easy way to construct and manipulate a suitable datatype, for those cases c++ helper functions need to be written.
 
* Writing High-Level Python function to provide a high-level Python interface built on top of low-level helper functions.Error Handling: If C++ throws an error then it is better to convert it into a python exception.
 
|
 
* Fourth and final version with input functions and all helper functions written in python.
 
|-style="background-color: #4dff4d;"
 
| WEEK FIVE: Apertium setup
 
| START : June 11th
 
END : June 17th
 
|
 
* Ref Week 1
 
|
 
* First version of the apertium module that is python importable
 
|-style="background-color:#1aff1a;"
 
| WEEK SIX: Variable handling in SWIG for Apertium module
 
| START : June 18th
 
END : June 24th
 
|
 
* Ref Week 2
 
|
 
* Second version of the apertium wrapper
 
|-style="background-color:#00e600;"
 
| WEEK SEVEN: Templating and Object Handling for Lttoolbox module
 
| START : June 25th
 
END : July 1st
 
|
 
* Ref Week 3
 
|
 
* Third version of the apertium wrapper with all functions importable from python
 
|-style="background-color:#00cc00;"
 
| WEEK EIGHT: Testing and improving cross language polymorphism, Making the module more pythonistic, Exception Handling
 
| START : July 2nd
 
END : July 8th
 
|
 
* Ref Week 4
 
|
 
* Fourth and final version with input functions and all helper functions written in python.
 
|-style="background-color: #00b300;"
 
| WEEK NINE: Extensive alpha testing of modules built
 
| START : July 9th
 
END : July 15th
 
|
 
* Testing the modules built and starting the documentation.
 
|
 
*Version 1 Documentation written
 
* Tests written for the lttoolbox module
 
|-style="background-color:#009900;"
 
| WEEK TEN: Finishing Documentation
 
| START : July 16th
 
END : July 22nd
 
|
 
* Finishing the documentation of the module and distribute for Beta testing
 
|
 
* Tests written for apertium module
 
* Documentation version 2.
 
|-style="background-color:#008000;"
 
| WEEK ELEVEN: Beta testing and changes(if any)
 
| START : July 23rd
 
END : July 29th
 
|
 
* Taking reviews of beta testing and implementing changes if any.
 
|
 
* Review Fix Version of wapper realease
 
|-style="background-color:#006600;"
 
| WEEK TWELVE: Deciding on the library structure and making module pip installable
 
| START : July 30th
 
END : August 5th
 
|
 
* Making the super wrapper for the modules.
 
* Making the module pip installable
 
* Update Documentation
 
|
 
* One wrapper with the 2 created wrappers inside it
 
* Pip Installable module
 
|-style="background-color:#004d00;"
 
| WEEK THIRTEEN: Final reviews and bug report
 
| START : August 6th
 
END : August 14th
 
|
 
* Analyse and make bug report for the bugs in the code.
 
* Make Final documentation
 
* Release Final Module
 
|
 
* Final Release of the wrapper.
 
|}
 
 
== Coding Challenge ==
 
 
1.)Make the Transducer model python importable
 
 
 
 
== About me: Education and Experience ==
 
 
I am a sophomore at IIIT-Hyderabad, India, pursuing my Dual-Degree in Computer Science and Computational Linguistics. I&rsquo;ve worked with C++ and Python closely in a lot of projects and I take keen interest in machine learning as well.I usually love building fun applications. The details of my work experience can be found [[here]].
 
 
</span>
 
 
[[Category:GSoC_2018_student_proposals]]
 

Latest revision as of 09:19, 27 March 2018