https://wiki.apertium.org/w/api.php?action=feedcontributions&user=Vaydheesh&feedformat=atomApertium - User contributions [en]2024-03-29T01:58:39ZUser contributionsMediaWiki 1.34.1https://wiki.apertium.org/w/index.php?title=User:Vaydheesh/GSoC2019Report&diff=70405User:Vaydheesh/GSoC2019Report2019-08-26T16:43:34Z<p>Vaydheesh: Created page with "== Python API/Library for Apertium == For this [https://summerofcode.withgoogle.com/projects/#5948355719462912 project], I coded swig wrappers to be used in the [https://githu..."</p>
<hr />
<div>== Python API/Library for Apertium ==<br />
For this [https://summerofcode.withgoogle.com/projects/#5948355719462912 project], I coded swig wrappers to be used in the [https://github.com/apertium/apertium-python/ Apertium Python].<br />
<br />
==== Mentors ====<br />
[http://wiki.apertium.org/wiki/User:Sushain Sushain Cherivirala], Shoutout to various aperitum members that helped throughout the project<br />
[http://wiki.apertium.org/wiki/User:Tino_Didriksen Tino Didriksen],<br />
[http://wiki.apertium.org/wiki/User:Unhammer Kevin Brubeck Unhammer],<br />
[http://wiki.apertium.org/wiki/User:Francis_Tyers Francis Tyers],<br />
<br />
=== Documentation ===<br />
<br />
The Documentation in form of markdown for Apertium Python is available at [https://github.com/apertium/apertium-python/blob/master/README.md README] and on [https://apertium-python.readthedocs.io/en/latest/ Read The Docs]<br />
<br />
<br />
== Work Done during GSoC 2019 ==<br />
1. SWIG wrapper for various binaries called by apertium mode files i.e. lt-proc, lrx-proc, apertium-transfer, apertium-interchunk, apertium-postchunk, apertium-pretransfer, apertium-tagger, cg-proc and their implementation in apertium-python with a subprocess fallback in absence of wrappers. The initialized wrapper objects are also stored for optimising the successive calls.<br />
<br />
2. Installer for project on Ubuntu and Windows by executing `python setup.py install` which internally installs `aperitum-all-dev`, aperitum-eng & aperitum-en-es language package and also installs the wrapper binaries on Ubuntu.<br />
<br />
3. Added support for tagger<br />
<br />
4. Updated the documentation with changes in apertium python<br />
<br />
==== Code for the module ====<br />
<br />
Main Repository: https://github.com/apertium/apertium-python<br />
<br />
==== Changes made to Apertium Code ====<br />
<br />
https://apertium.projectjj.com/gsoc2019/Vaydheesh/Vaydheesh.html<br />
<br />
1. apertium python<br />
- https://github.com/apertium/apertium-python/pull/42<br />
- https://github.com/apertium/apertium-python/pull/46<br />
- https://github.com/apertium/apertium-python/pull/53<br />
- https://github.com/apertium/apertium-python/pull/56<br />
- https://github.com/apertium/apertium-python/pull/57<br />
- https://github.com/apertium/apertium-python/pull/58<br />
- https://github.com/apertium/apertium-python/pull/59<br />
- https://github.com/apertium/apertium-python/pull/62<br />
- https://github.com/apertium/apertium-python/pull/63<br />
<br />
2. lttoolbox<br />
- https://github.com/apertium/lttoolbox/pull/53<br />
- https://github.com/apertium/lttoolbox/pull/58<br />
- https://github.com/apertium/lttoolbox/pull/64<br />
- https://github.com/apertium/lttoolbox/pull/67<br />
- https://github.com/apertium/lttoolbox/pull/69<br />
<br />
3. apertium core tools<br />
- https://github.com/apertium/apertium/pull/51<br />
- https://github.com/apertium/apertium/pull/52<br />
- https://github.com/apertium/apertium/pull/54<br />
- https://github.com/apertium/apertium/pull/56<br />
<br />
4. apertium lex tools<br />
- https://github.com/apertium/apertium-lex-tools/pull/22<br />
- https://github.com/apertium/apertium-lex-tools/pull/26<br />
- https://github.com/apertium/apertium-lex-tools/pull/28<br />
<br />
5. Constrain Grammar 3<br />
SWIG Wrapper for cg-proc https://github.com/TinoDidriksen/cg3/pull/37<br />
<br />
<br />
==== Experience ====<br />
Overall, it was a wonderful and satisfying journey. I had a great learning experience and a great time coding.<br />
Debugging the wrapper had its own set of challenges. I got stuck in the debugging task for a some time during the GSoC period. Hadn't been there the help from my mentor and other member of Apertium, I don't think I could have fixed those bugs.<br />
Fortunately all these issues got fixed and I was able to complete the wrapping process within time<br />
<br />
<br />
==== Open Pull Request ====<br />
<br />
1. https://github.com/apertium/apertium-python/pull/64<br />
<br />
<br />
== TODO ==<br />
<br />
1. Install wrapper on windows during ''python setup.py install''<br />
<br />
2. Optimise the wrapper caching process<br />
<br />
3. Other issues mentioned in [https://github.com/apertium/apertium-python/issues Apertium Python Issues]<br />
<br />
<br />
== Endnote ==<br />
Thanks a lot to all members of apertium. I was very fortunate to get this opportunity to work with this wonderful organisation. My mentor Sushain Cherivirala & the apertium members are very helpful and this project wouldn't be possible without their constant help and guidance. I would really like to thank all of them.</div>Vaydheeshhttps://wiki.apertium.org/w/index.php?title=User:Vaydheesh&diff=70376User:Vaydheesh2019-08-26T08:44:41Z<p>Vaydheesh: </p>
<hr />
<div>[[IRC]] nick: vaydheesh8<br />
<br />
[https://www.linkedin.com/in/singh-lokendra/ LinkedIn] Lokendra Singh</div>Vaydheeshhttps://wiki.apertium.org/w/index.php?title=User:Vaydheesh/Proposal&diff=69630User:Vaydheesh/Proposal2019-04-09T13:17:00Z<p>Vaydheesh: </p>
<hr />
<div>== <center>GSoC Proposal : Python API/library for Apertium </center> ==<br />
<br />
== Basic Details ==<br />
<br />
<br />
{| class="wikitable" style="width:100%"<br />
| Name<br />
| Lokendra Singh<br />
|-<br />
| Email Address<br />
| lokendras1998@gmail.com<br />
|-<br />
| IRC Nick<br />
| loke98<br />
|-<br />
| Country & TimeZone<br />
| India (UTC + 5:30)<br />
|-<br />
| Link to Gihub<br />
| https://github.com/vaydheesh<br />
|}<br />
<br />
<br />
<br />
== Why am I interested in Machine Translation? ==<br />
<br />
'''The broader perspective:'''<br />
<br />
I belong to a diverse country, India, where "Every two miles the water changes, every four miles the speech". Having encountered many dilects of Hindi language such as Shauraseni, Hindustani, Braj Bhasha, Haryanvi, Bundeli, Kannauji, Awadhi, Bagheli, Chhattisgarhi, Bombay Hindi. Due to so much of variation in a language, linguistics has always fascinated me. Upon combining this with my passion of python and desire for contributing to open source community, Apertium is my choice for GSoC 2019.<br />
<br />
<br />
<br />
== Why is it that I am interested in Apertium? ==<br />
<br />
During my projects on Machine Learning, I came across Natuaral Language Processing, which opened the world of Computer Linguistics for me. While browsing the list of organisations, Apertium Machine Translation caught my eye. It has a nice combination of coding challenges and linguistics. I have been using FREE softwares for past few years and now I want to start contributing to community. And Apertium seems to be the right choice to me. <br />
<br />
<br />
<br />
== Which of the Ideas List am I interested in? ==<br />
<br />
Initially, I was confused between Unsupervised Learning and Python API, but I have decided upon the '''Python API/library for Apertium'''.<br />
<br />
<br />
<br />
== Why should Google and Apertium sponsor the project of Python API for Apertium? ==<br />
<br />
Apertium is written in C++ which has very high performance, with high level of abstraction and is well standardized, however, it has few shortcomings. It is not so much beginner friendly and writing User-Interfaces in C++ is cumbersome. Python on the other hand, has a lot of features. Python has interpreted high-level programming environment. A python wrapper in SWIG combined with Jupyter Notebooks can provide flexibility, ease of installation, debugging, testing. <br />
<br />
<br />
<br />
== How and who will benefit from this project? ==<br />
<br />
The project would bring a lot of developers at ease. Python is a high-level language with a lot of features that make it easier to grasp for developers. A lot of people like to use Python Jupyter Notebooks , and a Python module would increase the user community. Also the installation process of Apertium can be simplified by making it available on PyPI. This would also open the Apertium Library to a large user base on Microsoft Windows™. Hence I believe that if Apertium has a Python API, it would be helpful to a large community of developers, linguists, computational linguistics and all people keen on using the wide range of linguistic tools that we provide.<br />
<br />
<br />
== Coding Challenge ==<br />
I've worked on '''Coding challenge 1''', a Working installation of apertium via a setup.py file in a Windows environment.<br />
The Coding challenge was really interesting to work on. Though it seemed pretty easy, it had its own set of hidden challenges. I had to get familiar with Apertium Bash Helper Script, and the underlying binaries that it was using. I had to add Apertium Binaries to Process' Path, without permanently polluting the User's Environment Variables. Some tweaks were required in the existing code base to ensure that the Apertium-Python Module worked out of box, without creating any issues for its user.<br />
<br />
While working on this Coding Challenge I was able to get familiar with the Apertium Code Base. In order to create this setup.py file, I had to understand the entire Apertium Python project, to ensure that all the minor tweaks were compatible with existing code, and didn't result into some unexpected errors.<br />
<br />
As of now all the checks are completely passing, and waiting to be merged by an organisation member. Link to [https://github.com/apertium/apertium-python/pull/38 Pull Request]<br />
<br />
<br />
== Detailed project plan and workflow ==<br />
<br />
'''1. Tools To Be Used'''<br />
As suggested in the Ideas List, I plan to use SWIG. The Simplified Wrapper and Interface Generator is an open-source software tool used to connect computer programs or libraries written in C or C++ with scripting languages, in this case Python. The current implementation calls the Apertium Binaries as subprocess, which has it own share of over head, slowing down the translation process. SWIG can be used to create a wrapper on C++ files and generate modules that can be imported in python files. This shall provide us with speed of C++ and ease of usability of Python.<br />
Flowchart describing the process of generating python wrapper<br />
[[File:SWIG.png]]<br />
<br />
'''2. Timeline'''<br />
<br />
Goals for the various phases:<br />
<br />
{| class="wikitable" style="width:100%"<br />
! PHASE<br />
! OBJECTIVE OF PHASE<br />
|-<br />
| Community Bonding Period<br />
| <br />
* Understanding the lttoolbox, and other dependencies of Apertium Python<br />
|-<br />
| Phase 1<br />
|<br />
* Create SWIG Interface files and shared libraries from them for Morphological Analysis and Generation<br />
|-<br />
| Phase 2<br />
| <br />
* Create SWIG Interface files and shared libraries from them for Performing Translations, alongwith a setup.py for various Linux Distributions<br />
|-<br />
| Phase 3<br />
| <br />
* Publish the package on PyPI, with Jupyter Notebooks, and Documentation on Apertium Wiki<br />
|}<br />
<br />
<br />
'''3. Bi Weekly Goals:'''<br />
<br />
<br />
{| class="wikitable" <br />
! WEEK AND DATE<br />
! TASK EXPLANATION<br />
|-<br />
| Community Bonding Period<br />
| <br />
* The current code has the unnecessary overhead of calling the lttoolbox binaries, which has its cost. The process can be made faster by importing the C++ code as libraries in the python module, to reduce the time taken in computation. I plan to work with lttoolbox maintainers and understand the working of lttoolbox, to smoothen the process of generating Interface swig files.<br />
* Understanding various data types and arguments used in the code.<br />
* Reading SWIG Documentatin to handle various cases while generating Interface file<br />
|-<br />
| Week 1&2, 27 May to 9 June<br />
|<br />
* Analysis, Generation and Translation are working well with the current implementation. I plan to implement each of these features individually.<br />
* Write the Interface files for C++ files required for Analyzing.<br />
* Generate shared libraries from Interface files.<br />
* Generate python module, ''apertium.analysis'' from Interface files.<br />
* Write unittest for python module generated.<br />
|-<br />
| Week 3&4, 10 June to 23 June<br />
|<br />
* Repeat the above steps for the Morphological Generation, ''apertium.generation''<br />
|-<br />
| Week 5&6, 24 June to 7 July<br />
| <br />
* After getting in touch with the codebase, implementing the same for Translation, ''apertium.translation''.<br />
* Making a super wrapper for the Analyzer, Generator, Translator, ''apertium.__init__''.<br />
|-<br />
| Week 7&8, 8 July to 21 July<br />
|<br />
* Write python script(to make cross platform) to generate shared libraries.<br />
* I plan to use g++ on GNU/Linux and mingw on windows to generate shared libraries and DLL files respectively.<br />
* Modify setup.py to make it compatible with various distros, (Debian and RedHat)<br />
* ''./setup.py install'' will install the Apertium Package, depending upon the Disto being used by user<br />
|-<br />
| Week 9&10, 22 July to 4 August<br />
|<br />
* Prepare Jupyter Notebooks for users.<br />
* Publish the code on PyPI, to make it pip installable.<br />
|-<br />
| Week 11&12, 5 August to 18 August<br />
|<br />
* Create documentaion and tutorials for the prepared codebase on apertium wiki.<br />
* Add usage in markdown and include it in README.md.<br />
* Taking reviews of alpha testing and make necessary changes.<br />
* Fix errors reported by users.<br />
|}<br />
<br />
'''4. Montly Deliverables'''<br />
<br />
{| class="wikitable" style="width:100%"<br />
! Deliverable <br />
! EXPLANATION<br />
|-<br />
| Deliverable 1<br />
| <br />
* Pythonic wrapper for both Morphological Analyzer and Morphological Generator<br />
|-<br />
| Deliverable 2<br />
|<br />
* Pythonic script to automate the build process<br />
* Pythonic Wrapper for Performing Translation and a super wrapper for Analyzer and Morphological and Translation.<br />
|-<br />
| Deliverable 3<br />
|<br />
* Cross platform setup.py<br />
* Pip installable apertium-python<br />
* Documentatin and tutorials on apertium wiki and Usage either in Markdown, with examples in form of Jupyter Notebook<br />
|}<br />
<br />
<br />
== Examinations==<br />
<br />
My theory exams should be over by 4th week of May(25th May, 2019). My practical exams would be conducted in the following two weeks, i.e. 27th May, 2019 to 8th June, 2019. This might reduce my efficiency in the first two weeks of internship. Hence I plan to get the initial work started before the commencement of Coding Period(27th May, 2019), during the community bonding period. This should provide me with the head start required for timely submission of deliverables of the project. I am expecting that working on Morphological Analyzer, might take its share of time, being the first one to be implemented. To ensure sticking to my timeline I plan to work over time, allowing me to absorb the unexpected delays due to my examinations.<br />
<br />
== About me: Education and Experience ==<br />
<br />
I am a Final Year student at Maharaja Agrasen Institute Of Technology, Delhi, India, pursuing B.Tech in Mechanical And Automation Engineering. I&rsquo;ve worked with C++(Competetive Programming) and Python(Machine Learning and Web Scraping). And I have been using Arch Linux as my primary operating system for past 4 years. With this past experience, I am confident that I would be able to make a decent cross platform Pythonic API<br />
<br />
<br />
<br />
== Non-Summer Of Code Plans ==<br />
<br />
I have my college vacations during the months of Google Summer of Code. And I would be able to devote around 40 man hours every week. I have no vacation plans.<br />
<br />
<br />
<br />
== Post GSoC Plans ==<br />
1. Create SWIG wrapper for remaining lttoolbox files.<br />
<br />
2. Convert the remaining codebase into python modules.<br />
<br />
3. Work on the remaining portion and implement it in</div>Vaydheeshhttps://wiki.apertium.org/w/index.php?title=File:SWIG.png&diff=69629File:SWIG.png2019-04-09T13:13:23Z<p>Vaydheesh: Vaydheesh uploaded a new version of &quot;File:SWIG.png&quot;</p>
<hr />
<div>Flowchart describing the process of generating python wrapper for C++ files</div>Vaydheeshhttps://wiki.apertium.org/w/index.php?title=User:Vaydheesh/Proposal&diff=69628User:Vaydheesh/Proposal2019-04-09T13:11:53Z<p>Vaydheesh: /* Detailed project plan and workflow */</p>
<hr />
<div>== <center>GSoC Proposal : Python API/library for Apertium </center> ==<br />
<br />
== Basic Details ==<br />
<br />
<br />
{| class="wikitable" style="width:100%"<br />
| Name<br />
| Lokendra Singh<br />
|-<br />
| Email Address<br />
| lokendras1998@gmail.com<br />
|-<br />
| IRC Nick<br />
| loke98<br />
|-<br />
| Country & TimeZone<br />
| India (UTC + 5:30)<br />
|-<br />
| Link to Gihub<br />
| https://github.com/vaydheesh<br />
|}<br />
<br />
<br />
<br />
== Why am I interested in Machine Translation? ==<br />
<br />
'''The broader perspective:'''<br />
<br />
I belong to a diverse country, India, where "Every two miles the water changes, every four miles the speech". Having encountered many dilects of Hindi language such as Shauraseni, Hindustani, Braj Bhasha, Haryanvi, Bundeli, Kannauji, Awadhi, Bagheli, Chhattisgarhi, Bombay Hindi. Due to so much of variation in a language, linguistics has always fascinated me. Upon combining this with my passion of python and desire for contributing to open source community, Apertium is my choice for GSoC 2019.<br />
<br />
<br />
<br />
== Why is it that I am interested in Apertium? ==<br />
<br />
During my projects on Machine Learning, I came across Natuaral Language Processing, which opened the world of Computer Linguistics for me. While browsing the list of organisations, Apertium Machine Translation caught my eye. It has a nice combination of coding challenges and linguistics. I have been using FREE softwares for past few years and now I want to start contributing to community. And Apertium seems to be the right choice to me. <br />
<br />
<br />
<br />
== Which of the Ideas List am I interested in? ==<br />
<br />
Initially, I was confused between Unsupervised Learning and Python API, but I have decided upon the '''Python API/library for Apertium'''.<br />
<br />
<br />
<br />
== Why should Google and Apertium sponsor the project of Python API for Apertium? ==<br />
<br />
Apertium is written in C++ which has very high performance, with high level of abstraction and is well standardized, however, it has few shortcomings. It is not so much beginner friendly and writing User-Interfaces in C++ is cumbersome. Python on the other hand, has a lot of features. Python has interpreted high-level programming environment. A python wrapper in SWIG combined with Jupyter Notebooks can provide flexibility, ease of installation, debugging, testing. <br />
<br />
<br />
<br />
== How and who will benefit from this project? ==<br />
<br />
The project would bring a lot of developers at ease. Python is a high-level language with a lot of features that make it easier to grasp for developers. A lot of people like to use Python Jupyter Notebooks , and a Python module would increase the user community. Also the installation process of Apertium can be simplified by making it available on PyPI. This would also open the Apertium Library to a large user base on Microsoft Windows™. Hence I believe that if Apertium has a Python API, it would be helpful to a large community of developers, linguists, computational linguistics and all people keen on using the wide range of linguistic tools that we provide.<br />
<br />
<br />
== Coding Challenge ==<br />
I've worked on '''Coding challenge 1''', a Working installation of apertium via a setup.py file in a Windows environment.<br />
The Coding challenge was really interesting to work on. Though it seemed pretty easy, it had its own set of hidden challenges. I had to get familiar with Apertium Bash Helper Script, and the underlying binaries that it was using. I had to add Apertium Binaries to Process' Path, without permanently polluting the User's Environment Variables. Some tweaks were required in the existing code base to ensure that the Apertium-Python Module worked out of box, without creating any issues for its user.<br />
<br />
While working on this Coding Challenge I was able to get familiar with the Apertium Code Base. In order to create this setup.py file, I had to understand the entire Apertium Python project, to ensure that all the minor tweaks were compatible with existing code, and didn't result into some unexpected errors.<br />
<br />
As of now all the checks are completely passing, and waiting to be merged by an organisation member. Link to [https://github.com/apertium/apertium-python/pull/38 Pull Request]<br />
<br />
<br />
== Detailed project plan and workflow ==<br />
<br />
'''1. Tools To Be Used'''<br />
As suggested in the Ideas List, I plan to use SWIG. The Simplified Wrapper and Interface Generator is an open-source software tool used to connect computer programs or libraries written in C or C++ with scripting languages, in this case Python. The current implementation calls the Apertium Binaries as subprocess, which has it own share of over head, slowing down the translation process. SWIG can be used to create a wrapper on C++ files and generate modules that can be imported in python files. This shall provide us with speed of C++ and ease of usability of Python.<br />
Flowchart describing the process of generating python wrapper<br />
[[File:SWIG.png]]<br />
<br />
'''2. Timeline'''<br />
<br />
Goals for the various phases:<br />
<br />
{| class="wikitable" style="width:100%"<br />
! PHASE<br />
! OBJECTIVE OF PHASE<br />
|-<br />
| Community Bonding Period<br />
| <br />
* Understanding the lttoolbox, and other dependencies of Apertium Python<br />
|-<br />
| Phase 1<br />
|<br />
* Create SWIG Interface files and shared libraries from them for Morphological Analysis and Generation<br />
|-<br />
| Phase 2<br />
| <br />
* Create SWIG Interface files and shared libraries from them for Performing Translations, alongwith a setup.py for various Linux Distributions<br />
|-<br />
| Phase 3<br />
| <br />
* Publish the package on PyPI, with Jupyter Notebooks, and Documentation on Apertium Wiki<br />
|}<br />
<br />
<br />
'''3. Bi Weekly Goals:'''<br />
<br />
<br />
{| class="wikitable" <br />
! WEEK AND DATE<br />
! TASK EXPLANATION<br />
|-<br />
| Community Bonding Period<br />
| <br />
* The current code has the unnecessary overhead of calling the lttoolbox binaries, which has its cost. The process can be made faster by importing the C++ code as libraries in the python module, to reduce the time taken in computation. I plan to work with lttoolbox maintainers and understand the working of lttoolbox, to smoothen the process of generating Interface swig files.<br />
* Understanding various data types and arguments used in the code.<br />
* Reading SWIG Documentatin to handle various cases while generating Interface file<br />
|-<br />
| Week 1&2, 27 May to 9 June<br />
|<br />
* Analysis, Generation and Translation are working well with the current implementation. I plan to implement each of these features individually.<br />
* Write the Interface files for C++ files required for Analyzing.<br />
* Generate shared libraries from Interface files.<br />
* Generate python module, ''apertium.analysis'' from Interface files.<br />
* Write unittest for python module generated.<br />
|-<br />
| Week 3&4, 10 June to 23 June<br />
|<br />
* Repeat the above steps for the Morphological Generation, ''apertium.generation''<br />
|-<br />
| Week 5&6, 24 June to 7 July<br />
| <br />
* After getting in touch with the codebase, implementing the same for Translation, ''apertium.translation''.<br />
* Making a super wrapper for the Analyzer, Generator, Translator, ''apertium.__init__''.<br />
|-<br />
| Week 7&8, 8 July to 21 July<br />
|<br />
* Write python script(to make cross platform) to generate shared libraries.<br />
* I plan to use g++ on GNU/Linux and mingw on windows to generate shared libraries and DLL files respectively.<br />
* Modify setup.py to make it compatible with various distros, (Debian and RedHat)<br />
* ''./setup.py install'' will install the Apertium Package, depending upon the Disto being used by user<br />
|-<br />
| Week 9&10, 22 July to 4 August<br />
|<br />
* Prepare Jupyter Notebooks for users.<br />
* Publish the code on PyPI, to make it pip installable.<br />
|-<br />
| Week 11&12, 5 August to 18 August<br />
|<br />
* Create documentaion and tutorials for the prepared codebase on apertium wiki.<br />
* Add usage in markdown and include it in README.md.<br />
* Taking reviews of alpha testing and make necessary changes.<br />
* Fix errors reported by users.<br />
|}<br />
<br />
'''4. Montly Deliverables'''<br />
<br />
{| class="wikitable" style="width:100%"<br />
! Deliverable <br />
! EXPLANATION<br />
|-<br />
| Deliverable 1<br />
| <br />
* Pythonic wrapper for both Morphological Analyzer and Morphological Generator<br />
|-<br />
| Deliverable 2<br />
|<br />
* Pythonic script to automate the build process<br />
* Pythonic Wrapper for Performing Translation and a super wrapper for Analyzer and Morphological and Translation.<br />
|-<br />
| Deliverable 3<br />
|<br />
* Cross platform setup.py<br />
* Pip installable apertium-python<br />
* Documentatin and tutorials on apertium wiki and Usage either in Markdown, with examples in form of Jupyter Notebook<br />
|}<br />
<br />
<br />
== Examinations<br />
My theory exams should be over by 4th week of May(25th May, 2019). My practical exams would be conducted in the following two weeks, i.e. 27th May, 2019 to 8th June, 2019. This might reduce my efficiency in the first two weeks of internship. Hence I plan to get the initial work started before the commencement of Coding Period(27th May, 2019), during the community bonding period. This should provide me with the head start required for timely submission of deliverables of the project. I am expecting that working on Morphological Analyzer, might take its share of time, being the first one to be implemented. To ensure sticking to my timeline I plan to work over time, allowing me to absorb the unexpected delays due to my examinations.<br />
<br />
== About me: Education and Experience ==<br />
<br />
I am a Final Year student at Maharaja Agrasen Institute Of Technology, Delhi, India, pursuing B.Tech in Mechanical And Automation Engineering. I&rsquo;ve worked with C++(Competetive Programming) and Python(Machine Learning and Web Scraping). And I have been using Arch Linux as my primary operating system for past 4 years. With this past experience, I am confident that I would be able to make a decent cross platform Pythonic API<br />
<br />
<br />
<br />
== Non-Summer Of Code Plans ==<br />
<br />
I have my college vacations during the months of Google Summer of Code, I would be able to devote around 40 man hours every week. I have no vacation plans.</div>Vaydheeshhttps://wiki.apertium.org/w/index.php?title=File:SWIG.png&diff=69627File:SWIG.png2019-04-09T13:10:13Z<p>Vaydheesh: Vaydheesh uploaded a new version of &quot;File:SWIG.png&quot;</p>
<hr />
<div>Flowchart describing the process of generating python wrapper for C++ files</div>Vaydheeshhttps://wiki.apertium.org/w/index.php?title=File:SWIG.png&diff=69626File:SWIG.png2019-04-09T13:04:40Z<p>Vaydheesh: Flowchart describing the process of generating python wrapper for C++ files</p>
<hr />
<div>Flowchart describing the process of generating python wrapper for C++ files</div>Vaydheeshhttps://wiki.apertium.org/w/index.php?title=User:Vaydheesh/Proposal&diff=69618User:Vaydheesh/Proposal2019-04-09T12:34:00Z<p>Vaydheesh: Created page with "== <center>GSoC Proposal : Python API/library for Apertium </center> == == Basic Details == {| class="wikitable" style="width:100%" | Name | Lokendra Singh |- | Email Addr..."</p>
<hr />
<div>== <center>GSoC Proposal : Python API/library for Apertium </center> ==<br />
<br />
== Basic Details ==<br />
<br />
<br />
{| class="wikitable" style="width:100%"<br />
| Name<br />
| Lokendra Singh<br />
|-<br />
| Email Address<br />
| lokendras1998@gmail.com<br />
|-<br />
| IRC Nick<br />
| loke98<br />
|-<br />
| Country & TimeZone<br />
| India (UTC + 5:30)<br />
|-<br />
| Link to Gihub<br />
| https://github.com/vaydheesh<br />
|}<br />
<br />
<br />
<br />
== Why am I interested in Machine Translation? ==<br />
<br />
'''The broader perspective:'''<br />
<br />
I belong to a diverse country, India, where "Every two miles the water changes, every four miles the speech". Having encountered many dilects of Hindi language such as Shauraseni, Hindustani, Braj Bhasha, Haryanvi, Bundeli, Kannauji, Awadhi, Bagheli, Chhattisgarhi, Bombay Hindi. Due to so much of variation in a language, linguistics has always fascinated me. Upon combining this with my passion of python and desire for contributing to open source community, Apertium is my choice for GSoC 2019.<br />
<br />
<br />
<br />
== Why is it that I am interested in Apertium? ==<br />
<br />
During my projects on Machine Learning, I came across Natuaral Language Processing, which opened the world of Computer Linguistics for me. While browsing the list of organisations, Apertium Machine Translation caught my eye. It has a nice combination of coding challenges and linguistics. I have been using FREE softwares for past few years and now I want to start contributing to community. And Apertium seems to be the right choice to me. <br />
<br />
<br />
<br />
== Which of the Ideas List am I interested in? ==<br />
<br />
Initially, I was confused between Unsupervised Learning and Python API, but I have decided upon the '''Python API/library for Apertium'''.<br />
<br />
<br />
<br />
== Why should Google and Apertium sponsor the project of Python API for Apertium? ==<br />
<br />
Apertium is written in C++ which has very high performance, with high level of abstraction and is well standardized, however, it has few shortcomings. It is not so much beginner friendly and writing User-Interfaces in C++ is cumbersome. Python on the other hand, has a lot of features. Python has interpreted high-level programming environment. A python wrapper in SWIG combined with Jupyter Notebooks can provide flexibility, ease of installation, debugging, testing. <br />
<br />
<br />
<br />
== How and who will benefit from this project? ==<br />
<br />
The project would bring a lot of developers at ease. Python is a high-level language with a lot of features that make it easier to grasp for developers. A lot of people like to use Python Jupyter Notebooks , and a Python module would increase the user community. Also the installation process of Apertium can be simplified by making it available on PyPI. This would also open the Apertium Library to a large user base on Microsoft Windows™. Hence I believe that if Apertium has a Python API, it would be helpful to a large community of developers, linguists, computational linguistics and all people keen on using the wide range of linguistic tools that we provide.<br />
<br />
<br />
== Coding Challenge ==<br />
I've worked on '''Coding challenge 1''', a Working installation of apertium via a setup.py file in a Windows environment.<br />
The Coding challenge was really interesting to work on. Though it seemed pretty easy, it had its own set of hidden challenges. I had to get familiar with Apertium Bash Helper Script, and the underlying binaries that it was using. I had to add Apertium Binaries to Process' Path, without permanently polluting the User's Environment Variables. Some tweaks were required in the existing code base to ensure that the Apertium-Python Module worked out of box, without creating any issues for its user.<br />
<br />
While working on this Coding Challenge I was able to get familiar with the Apertium Code Base. In order to create this setup.py file, I had to understand the entire Apertium Python project, to ensure that all the minor tweaks were compatible with existing code, and didn't result into some unexpected errors.<br />
<br />
As of now all the checks are completely passing, and waiting to be merged by an organisation member. Link to [https://github.com/apertium/apertium-python/pull/38 Pull Request]<br />
<br />
<br />
== Detailed project plan and workflow ==<br />
<br />
'''1. Tools To Be Used'''<br />
As suggested in the Ideas List, I plan to use SWIG. The Simplified Wrapper and Interface Generator is an open-source software tool used to connect computer programs or libraries written in C or C++ with scripting languages, in this case Python. The current implementation calls the Apertium Binaries as subprocess, which has it own share of over head, slowing down the translation process. SWIG can be used to create a wrapper on C++ files and generate modules that can be imported in python files. This shall provide us with speed of C++ and ease of usability of Python.<br />
<br />
'''2. Timeline'''<br />
<br />
Goals for the various phases:<br />
<br />
{| class="wikitable" style="width:100%"<br />
! PHASE<br />
! OBJECTIVE OF PHASE<br />
|-<br />
| Community Bonding Period<br />
| <br />
* Understanding the lttoolbox, and other dependencies of Apertium Python<br />
|-<br />
| Phase 1<br />
|<br />
* Create SWIG Interface files and shared libraries from them for Morphological Analysis and Generation<br />
|-<br />
| Phase 2<br />
| <br />
* Create SWIG Interface files and shared libraries from them for Performing Translations, alongwith a setup.py for various Linux Distributions<br />
|-<br />
| Phase 3<br />
| <br />
* Publish the package on PyPI, with Jupyter Notebooks, and Documentation on Apertium Wiki<br />
|}<br />
<br />
<br />
'''3. Bi Weekly Goals:'''<br />
<br />
<br />
{| class="wikitable" <br />
! WEEK AND DATE<br />
! TASK EXPLANATION<br />
|-<br />
| Community Bonding Period<br />
| <br />
* The current code has the unnecessary overhead of calling the lttoolbox binaries, which has its cost. The process can be made faster by importing the C++ code as libraries in the python module, to reduce the time taken in computation. I plan to work with lttoolbox maintainers and understand the working of lttoolbox, to smoothen the process of generating Interface swig files.<br />
* Understanding various data types and arguments used in the code.<br />
* Reading SWIG Documentatin to handle various cases while generating Interface file<br />
|-<br />
| Week 1&2, 27 May to 9 June<br />
|<br />
* Analysis, Generation and Translation are working well with the current implementation. I plan to implement each of these features individually.<br />
* Write the Interface files for C++ files required for Analyzing.<br />
* Generate shared libraries from Interface files.<br />
* Generate python module, ''apertium.analysis'' from Interface files.<br />
* Write unittest for python module generated.<br />
|-<br />
| Week 3&4, 10 June to 23 June<br />
|<br />
* Repeat the above steps for the Morphological Generation, ''apertium.generation''<br />
|-<br />
| Week 5&6, 24 June to 7 July<br />
| <br />
* After getting in touch with the codebase, implementing the same for Translation, ''apertium.translation''.<br />
* Making a super wrapper for the Analyzer, Generator, Translator, ''apertium.__init__''.<br />
|-<br />
| Week 7&8, 8 July to 21 July<br />
|<br />
* Write python script(to make cross platform) to generate shared libraries.<br />
* I plan to use g++ on GNU/Linux and mingw on windows to generate shared libraries and DLL files respectively.<br />
* Modify setup.py to make it compatible with various distros, (Debian and RedHat)<br />
* ''./setup.py install'' will install the Apertium Package, depending upon the Disto being used by user<br />
|-<br />
| Week 9&10, 22 July to 4 August<br />
|<br />
* Prepare Jupyter Notebooks for users.<br />
* Publish the code on PyPI, to make it pip installable.<br />
|-<br />
| Week 11&12, 5 August to 18 August<br />
|<br />
* Create documentaion and tutorials for the prepared codebase on apertium wiki.<br />
* Add usage in markdown and include it in README.md.<br />
* Taking reviews of alpha testing and make necessary changes.<br />
* Fix errors reported by users.<br />
|}<br />
<br />
'''4. Montly Deliverables'''<br />
<br />
{| class="wikitable" style="width:100%"<br />
! Deliverable <br />
! EXPLANATION<br />
|-<br />
| Deliverable 1<br />
| <br />
* Pythonic wrapper for both Morphological Analyzer and Morphological Generator<br />
|-<br />
| Deliverable 2<br />
|<br />
* Pythonic script to automate the build process<br />
* Pythonic Wrapper for Performing Translation and a super wrapper for Analyzer and Morphological and Translation.<br />
|-<br />
| Deliverable 3<br />
|<br />
* Cross platform setup.py<br />
* Pip installable apertium-python<br />
* Documentatin and tutorials on apertium wiki and Usage either in Markdown, with examples in form of Jupyter Notebook<br />
|}<br />
<br />
<br />
== Examinations<br />
My theory exams should be over by 4th week of May(25th May, 2019). My practical exams would be conducted in the following two weeks, i.e. 27th May, 2019 to 8th June, 2019. This might reduce my efficiency in the first two weeks of internship. Hence I plan to get the initial work started before the commencement of Coding Period(27th May, 2019), during the community bonding period. This should provide me with the head start required for timely submission of deliverables of the project. I am expecting that working on Morphological Analyzer, might take its share of time, being the first one to be implemented. To ensure sticking to my timeline I plan to work over time, allowing me to absorb the unexpected delays due to my examinations.<br />
<br />
<br />
== About me: Education and Experience ==<br />
<br />
I am a Final Year student at Maharaja Agrasen Institute Of Technology, Delhi, India, pursuing B.Tech in Mechanical And Automation Engineering. I&rsquo;ve worked with C++(Competetive Programming) and Python(Machine Learning and Web Scraping). And I have been using Arch Linux as my primary operating system for past 4 years. With this past experience, I am confident that I would be able to make a decent cross platform Pythonic API<br />
<br />
<br />
<br />
== Non-Summer Of Code Plans ==<br />
<br />
I have my college vacations during the months of Google Summer of Code, I would be able to devote around 40 man hours every week. I have no vacation plans.</div>Vaydheeshhttps://wiki.apertium.org/w/index.php?title=User:Vaydheesh&diff=69609User:Vaydheesh2019-04-09T10:13:01Z<p>Vaydheesh: </p>
<hr />
<div>[[IRC]] nick: loke98<br />
<br />
[https://www.linkedin.com/in/singh-lokendra/ LinkedIn] Lokendra Singh</div>Vaydheeshhttps://wiki.apertium.org/w/index.php?title=User:Vaydheesh&diff=69442User:Vaydheesh2019-04-07T14:44:55Z<p>Vaydheesh: Created page with "IRC nick: loke98 Lokendra Singh"</p>
<hr />
<div>[[IRC]] nick: loke98<br />
<br />
Lokendra Singh</div>Vaydheesh