Difference between revisions of "User:Arghya1998"
Jump to navigation
Jump to search
Arghya1998 (talk | contribs) |
Arghya1998 (talk | contribs) (Blanked the page) |
||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | <span style="font-family:Courier;background-color:#b3ffb3;"> |
||
− | === <center> GSoC Proposal : Python API/library for Apertium </center> === |
||
− | |||
− | == Basic Details == |
||
− | |||
− | |||
− | {| class="wikitable" style="width:100%" |
||
− | | Name |
||
− | | Arghya Bhattacharya |
||
− | |- |
||
− | | Email Address |
||
− | | arghya.b@research.iit.ac.in |
||
− | |- |
||
− | | Alternate Email Address |
||
− | | arghyatiger@gmail.com |
||
− | |- |
||
− | | IRC Nick |
||
− | | arghya[m] |
||
− | |- |
||
− | | Mobile |
||
− | | +91 9831325363 |
||
− | |- |
||
− | | TimeZone |
||
− | | UTC + 5:30 |
||
− | |- |
||
− | | Link to Gihub |
||
− | | https://github.com/arghyatiger |
||
− | |- |
||
− | | Link to Gitlab |
||
− | | https://gitlab.com/arghyatiger |
||
− | |} |
||
− | |||
− | == Why am I interested in Machine Translation? == |
||
− | |||
− | '''The broader perspective:''' |
||
− | |||
− | Being from a diverse country like India, with over 22 officially registered languages and over 1500 mother tongue languages (150 of them are sizeable), I’ve always been curious as to how |
||
− | languages serve as the basic entity of communication. During my childhood, I have lived in various places in India and hence I have had the chance to closely interact with people of different lingual backgrounds and in the process I ended up learning quite a few languages including Hindi, Bengali, English, Tamil, Oriya. The language diversity in my country is fascinating, but with it comes a lot of problems in communication and I believe that efficient machine translation can aid a lot of these problems and breaking the “language barrier” across not just the country and the globe and connect people better. |
||
− | |||
− | |||
− | '''Academic Interests:''' |
||
− | |||
− | I am currently pursuing my B.Tech in Computer Science + M.S by Research in Computational Linguistics Dual Degree program at IIIT-Hyderabad, India. A good portion of our academic focus is on Machine Translation and I find it a really interesting area to work on. So working with apertium will help me nurture my Computational Linguistics skills as well as give me a chance to give back to the community with some solid contribution. |
||
− | |||
− | |||
− | |||
− | == Why is it that I am interested in Apertium? == |
||
− | |||
− | Being a student, with primary academic focus on Computational Linguistics, Apertium happens to be one of the important tools that I use for my university assignments.The Apertium projects provide a nice blend of linguistic and coding tasks and that makes the projects interesting to me. Also as a part of the long-term goal of contributing to the community, I think contributions to Apertium would make a significant impact on the Computational Linguistics community all around the globe and that further motivates me to work for Apertium |
||
− | |||
− | |||
− | |||
− | == Which of the published tasks am I interested in? == |
||
− | |||
− | To me, all the published tasks seem to be interesting and hence it was difficult to choose only one. But I have been able to narrow down to the project which I like the most. It is called Python API/library for Apertium |
||
− | |||
− | |||
− | |||
− | == Why should Google and Apertium sponsor the project of Python API for Apertium? == |
||
− | |||
− | The Apertium code base is primarily written in C++. While C++ has a fairly high performance, supports low-level systems programming and is fairly available everywhere and reasonably well standardized, however, there are a few shortcomings to it as well. Some of them include the non-interactiveness of C++, the compile/debug/nap cycle and the endless difficulties in extending and modifying the modules. Also, Once the development of a module is done with, certain improvements like writing User-Interfaces and systems integration become really cumbersome in C++. Python, on the other hand, has a lot of features that c++ doesn’t have. Python has an interpreted high-level programming environment. And hence a python wrapper can provide flexibility, interactivity to Apertium’s code base. Also a lot of other features like ease of debugging, ease of testing, and rapid prototyping. |
||
− | |||
− | |||
− | |||
− | == How and Who will benefit from this project? == |
||
− | |||
− | The project would bring a lot of developers at ease as python is a high-level language with a lot of features that make it easier to grasp for developers, and would increase the scalability of apertium in the future, also a lot of people like to use jupyter notebooks and python, and hence I believe that if apertium has a python API, it would be helpful to a large community of developers, linguists, computational linguistics and all people keen on using a wide range of linguistic tools. |
||
− | |||
− | |||
− | |||
− | == Detailed project plan and workflow == |
||
− | |||
− | '''1. Detailed Project Goal:''' |
||
− | |||
− | The Goal of the project is to create structured python wrappers for the core modules of apertium, namely: |
||
− | |||
− | * [[Apertium/Lttoolbox]] |
||
− | * [[Apertium/Apertium]] |
||
− | |||
− | a.) The modules should be python importable, the pythonic usage would be as follows: |
||
− | from apertium.lttoolbox import transducer |
||
− | b.) The modules should be nested |
||
− | apertium.lttoolbox.transducer |
||
− | c.) The internal usage of the functions should be as follows: |
||
− | import apertium.transducer.internal |
||
− | t = apertium.transducer.internal.Transducer().insertSingleTransduction() |
||
− | |||
− | '''2. Tool to be used: ''' |
||
− | |||
− | For the project, I plan on using SWIG to bind the C++ code. SWIG is a software development tool that simplifies the task of interfacing different languages to C and C++ programs. SWIG is a compiler that takes C declarations and creates the wrappers needed to access those declarations from other languages. Among the other options that I explored for the project are Pyrex, ctypes, SIP, Boost.python.But for projects of the scale of this one, SWIG seems to be the most convenient due to a lot of features explained later in the proposal. |
||
− | |||
− | '''3. Timeline :''' |
||
− | |||
− | Goals for the various phases: |
||
− | {| class="wikitable" style="width:100%" |
||
− | ! PHASE |
||
− | ! OBJECTIVE OF PHASE |
||
− | |- style="background-color:#d2f7b2;" |
||
− | | Community Bonding Period |
||
− | | |
||
− | * Good Understanding of all the modules, all the intricacies of binding each module and a detailed report of the modules |
||
− | |- style="background-color:#ccd3ff;" |
||
− | | Phase 1 |
||
− | | |
||
− | * Binding/Testing the Lttoolbox Module |
||
− | |- style="background-color:#f2c1f2;" |
||
− | | Phase 2 |
||
− | | |
||
− | * Binding/Testing the Apertium Module |
||
− | |- style="background-color:#f9c7d0;" |
||
− | | Phase 3 |
||
− | | |
||
− | * Documentation of usage of the python modules and library organization of the modules made in previous phases |
||
− | |} |
||
− | |||
− | |||
− | Week-Wise Goals: |
||
− | |||
− | {| class="wikitable" style="background-color:#b3ffb3;" |
||
− | ! Phase And Brief Description |
||
− | ! Duration |
||
− | ! Task Explanation |
||
− | ! Deliverable for the week |
||
− | |-style="background-color:#ccffcc;" |
||
− | | COMMUNITY BONDING PERIOD |
||
− | | |
||
− | * START:April 23rd |
||
− | * END:May 13th |
||
− | | |
||
− | * Playing around with the lttoolbox and apertium modules and using every function and understanding all the flags and arguments of the functions. |
||
− | * Reading up on the details of SWIG. |
||
− | * Taking inputs from various apertium users on what would be the ideal implementation that they would want. |
||
− | | |
||
− | |-style="background-color:#b3ffb3;" |
||
− | | Week ONE : Lttoolbox setup |
||
− | | |
||
− | * START:May 14th |
||
− | * END:May 20th |
||
− | | |
||
− | * Making explicit declarations of Constants and Enumerations of the module in SWIG interface |
||
− | * Testing all pointer based data manipulation for any errors. (A common problem that might occur with swig bindings) |
||
− | * Looking for Data Members that need to be made read-only and making necessary changes in the interface file, |
||
− | * Identifying Static Class members, Python classes had no support for static methods and no version of Python supports static member variables in a manner that SWIG can utilize. Therefore, SWIG generates wrappers that try to work around some of these issues, but the other issues have to be taken care of manually. |
||
− | * Resolving namespace problem of SWIG manually(occurs if there are multiple namespaces) |
||
− | | |
||
− | * First importable wrapper of Lttoolbox module |
||
− | |-style="background-color:#80ff80;" |
||
− | | WEEK TWO: Variable handling in SWIG for Lttoolbox module |
||
− | | |
||
− | * START:May 21st |
||
− | * END:May 27th |
||
− | | |
||
− | * Making explicit declarations of Constants and Enumerations of the module in SWIG interface |
||
− | * Testing all pointer based data manipulation for any errors. (A common problem that might occur with swig bindings) |
||
− | * Looking for Data Members that need to be made read-only and making necessary changes in the interface file |
||
− | * Identifying Static Class members: Python classes had no support for static methods and no version of Python supports static member variables in a manner that SWIG can utilize. Therefore, SWIG generates wrappers that try to work around some of these issues, but the other issues have to be taken care of manually. |
||
− | * Resolving namespace problem of SWIG manually(occurs if there are multiple namespaces) |
||
− | | |
||
− | * Second version of the wrapper with all data type usage support |
||
− | |-style="background-color:#99ff99;" |
||
− | | WEEK THREE: Templating and Object Handling for Lttoolbox module |
||
− | | |
||
− | * START:May 28th |
||
− | * END:June 3rd |
||
− | | |
||
− | * In order to create wrappers, one has to tell SWIG to create wrappers for a particular template instantiation. Hence all the templates have to be explicitly declared specific to the data being manipulated in them,. |
||
− | * C++ Reference Counted Objects: Referencing and Dereferencing of objects have to be taken care of so that no error occurs, another place where SWIG isn’t smart enough. |
||
− | * Handling C++ overloaded functions: Overloading support is not quite as flexible as in C++. Sometimes there are methods that SWIG can't disambiguate, if such errors appear then they have to be taken care of manually in the interface file of the wrapper. |
||
− | | |
||
− | * Third version of the wrapper with all functions importable from python. |
||
− | |-style="background-color:#66ff66;" |
||
− | | WEEK FOUR:Testing and improving cross language polymorphism, Making the module more pythonistic, Exception Handling |
||
− | | |
||
− | * START:June 4th |
||
− | * END:June 10th |
||
− | | |
||
− | * Implement Director Classes: No mechanism exists to pass method calls down the inheritance chain from C++ to Python. In particular, if a C++ class has been extended in Python, these extensions will not be visible from C++ code. Virtual method calls from C++ are thus not able access the lowest implementation in the inheritance chain. There exists a feature implemented in SWIG called directors, The job of the directors is to route method calls correctly, either to C++ implementations higher in the inheritance chain or to Python implementations lower in the inheritance chain. |
||
− | * Writing c++ helper functions: Sometimes the SWIG module misses bits of functionality because there is no easy way to construct and manipulate a suitable datatype, for those cases c++ helper functions need to be written. |
||
− | * Writing High-Level Python function to provide a high-level Python interface built on top of low-level helper functions.Error Handling: If C++ throws an error then it is better to convert it into a python exception. |
||
− | | |
||
− | * Fourth and final version with input functions and all helper functions written in python. |
||
− | |-style="background-color: #4dff4d;" |
||
− | | WEEK FIVE: Apertium setup |
||
− | | |
||
− | * START:June 11th |
||
− | * END:June 17th |
||
− | | |
||
− | * Ref Week 1 |
||
− | | |
||
− | * First version of the apertium module that is python importable |
||
− | |-style="background-color:#1aff1a;" |
||
− | | WEEK SIX: Variable handling in SWIG for Apertium module |
||
− | | |
||
− | * START:June 18th |
||
− | * END:June 24th |
||
− | | |
||
− | * Ref Week 2 |
||
− | | |
||
− | * Second version of the apertium wrapper |
||
− | |-style="background-color:#00e600;" |
||
− | | WEEK SEVEN: Templating and Object Handling for Lttoolbox module |
||
− | | |
||
− | * START:June 25th |
||
− | * END:July 1st |
||
− | | |
||
− | * Ref Week 3 |
||
− | | |
||
− | * Third version of the apertium wrapper with all functions importable from python |
||
− | |-style="background-color:#00cc00;" |
||
− | | WEEK EIGHT: Testing and improving cross language polymorphism, Making the module more pythonistic, Exception Handling |
||
− | | |
||
− | * START:July 2nd |
||
− | * END:July 8th |
||
− | | |
||
− | * Ref Week 4 |
||
− | | |
||
− | * Fourth and final version with input functions and all helper functions written in python. |
||
− | |-style="background-color: #00b300;" |
||
− | | WEEK NINE: Extensive alpha testing of modules built |
||
− | | |
||
− | * START:July 9th |
||
− | * END:July 15th |
||
− | | |
||
− | * Testing the modules built by writing unit-tests for the functions in the modules |
||
− | * Starting the documentation of the modules, since there are a lot of funcntions and the way swig deals with python is a little different than raw python, proper documentation of all the modules and their usages is required |
||
− | | |
||
− | * Version 1 Documentation written |
||
− | * Tests written for the lttoolbox module |
||
− | |-style="background-color:#009900;" |
||
− | | WEEK TEN: Finishing Documentation |
||
− | | |
||
− | * START:July 16th |
||
− | * END:July 22nd |
||
− | | |
||
− | * Finishing the documentation of the module |
||
− | * Distribute for Beta testing, so that end users validate the usability, functionality, compatibility, and reliability |
||
− | | |
||
− | * Tests written for apertium module |
||
− | * Documentation version 2. |
||
− | |-style="background-color:#008000;" |
||
− | | WEEK ELEVEN: Beta testing and changes(if any) |
||
− | | |
||
− | * START:July 23rd |
||
− | * END:July 29th |
||
− | | |
||
− | * Taking reviews of beta testing and implementing changes if any. |
||
− | | |
||
− | * Review Fix Version of wapper realease |
||
− | |-style="background-color:#006600;" |
||
− | | WEEK TWELVE: Deciding on the library structure and making module pip installable |
||
− | | |
||
− | * START:July 30th |
||
− | * END:August 5th |
||
− | | |
||
− | * Making the super wrapper for the modules. |
||
− | * Making the module pip installable by writing scripts and uploading to PyPI |
||
− | * Update Documentation |
||
− | | |
||
− | * One wrapper with the 2 created wrappers inside it |
||
− | * Pip Installable module |
||
− | |-style="background-color:#004d00;" |
||
− | | WEEK THIRTEEN: Final reviews and bug report |
||
− | | |
||
− | * START:August 6th |
||
− | * END:August 14th |
||
− | | |
||
− | * Analyse and make bug report for the bugs in the code. |
||
− | * Make Final documentation |
||
− | * Release Final Module |
||
− | | |
||
− | * Final Release of the wrapper. |
||
− | |} |
||
− | |||
− | == Coding Challenge == |
||
− | |||
− | 1.)Make the Transducer model python importable |
||
− | |||
− | |||
− | |||
− | == About me: Education and Experience == |
||
− | |||
− | I am a sophomore at IIIT-Hyderabad, India, pursuing my Dual-Degree in Computer Science and Computational Linguistics. I’ve worked with C++ and Python closely in a lot of projects and I take keen interest in machine learning as well.I usually love building fun applications. The details of my work experience can be found [[here]]. |
||
− | |||
− | </span> |
||
− | |||
− | [[Category:GSoC_2018_student_proposals]] |