Difference between revisions of "User:Niks/Application"
Line 64: | Line 64: | ||
Login and authorisation feature. |
Login and authorisation feature. |
||
*Prior to developing GUI, I have built a basic python program for the coding challenge. Link: https://github.com/binayneekhra/coding-challege-for-Assimilation-toolkit.git |
*Prior to developing GUI, I have built a basic python program for the coding challenge. Link: [https://github.com/binayneekhra/coding-challege-for-Assimilation-toolkit.git] |
||
*I have read the paper suggested by Mikel (Peeking through the language barrier: the development of open-source gisting system for Basque to English based on apertium.org). I have also read H.Somes and E.Wild paper on 'Evaluating Machine Translation: the Cloze Procedure Revisited' |
*I have read the paper suggested by Mikel (Peeking through the language barrier: the development of open-source gisting system for Basque to English based on apertium.org). I have also read H.Somes and E.Wild paper on 'Evaluating Machine Translation: the Cloze Procedure Revisited' |
Revision as of 18:53, 21 March 2014
Contents
- 1 Contact Information
- 2 Why is it you are interested in machine translation?
- 3 Why is it that you are interested in the Apertium project?
- 4 Which of the published tasks are you interested in? What do you plan to do ?
- 5 Proposal
- 6 Workplan
- 7 Skills and qualification
- 8 Non-Summer-of-Code plans for the summer
- 9 References
Contact Information
- Name: Binay Neekhra
- E-mail: neekhra.binay@gmail.com
- Time Zone: UTC+05.30
Other contact information can be provided to the mentor.
Why is it you are interested in machine translation?
Languages are great tools for sharing ideas. A large credit of human progress goes to language evaluation and the way the ideas are shared. There is large diversity in natural languages. Very often people are excluded from the vast ocean of knowledge if it is not available in the language they use. For the same reason, sharing ideas with a broader set of people also becomes difficult. Machine Translation can help in lowering the language barrier more economically while taking lesser time than traditional human translators. Also machine translation comes under AI-complete problems. I find it exciting to make contribution to push the boundaries of MT systems and reduce the gap between AI completeness and existing systems. This motivated me to pursue my research in Machine Translation.
Why is it that you are interested in the Apertium project?
I am interested in languages, and my current work is in machine translation. I also want to learn about the open-source development and how it works. Apertium is a good platform for me to learn and contribute. I find the Apertium community very open and supportive for learning. I also believe that MT systems segment should be more fragmented and large corporations should not control the entire MT pool. Apertium is addressing this very issue. It will provide me the opportunity to work in real world open-source projects, learning from some great people around the globe, working in the area of MT.
Which of the published tasks are you interested in? What do you plan to do ?
I am interested in developing Apertium assimilation evaluation toolkit.
Proposal
- Title: Apertium assimilation evaluation toolkit
Why Google and Apertium should sponsor it
Online MT systems are used mainly for assimilation purposes. A good example is Google Translate which provides a billion translations a day for around 200 million users. However it is difficult to measure how much user has understood and how close it is to the original meaning. The evaluation of MT systems for assimilation purposes need more attention. Using cloze tests in the reference translation is a new approach and it is safe to say that the score correlates with the usefulness of MT systems. Hence it can also be used for comparing MT systems. Unlike other evaluating procedures, this method measures usefulness of MT systems without involving much cost and effort. I am interested in implementing it with more features, which should be useful for better analysis of MT.
How and who it will benefit in society
Assimilation evaluation toolkit will help people working in MT, to find the usefulness of the MT systems from the perspective of end-user without involving much cost and effort. Languages’ primary objective is to communicate ideas. If user can get the gisting of the original text from the machine translation system, we can say that the system is helpful. It can also be used to compare different MT systems’ performance with respect to assimilation evaluation. The feedback can further be used for improving the systems. This method can further be expanded to gamification for more user engagement.
Workplan
I am interested in developing text based and web-based interface for assimilation evaluation.
I want to implement following features: 1. Option to select what % of words to be masked 2. Add a stopword list, i.e. a list of words to not pick holes from. For every language, not include these words in the holes. e.g. the word ‘and’ Date and numbers. 3. Privileged users can add new MT systems for evaluation 4. Admins can add new language pair 5. The system can use synonym list to look up for similar words (acceptable answers) or have binary evaluation. For proper name, figure or date, it may be difficult for the user to guess the correct output, these fields may be handled separately, in which case a plausible but wrong guess may be acceptable. 6. Comparing the performance of different MT systems 7. Measure time taken by the user to fill the holes 8. Offer different hints to the user. 9. Controlling the length of a user session 10. Toolkit instructions available in all targeted language, users can chose their preferred language. 11. In my opinion, the end-users and toolkit admins are one of the most important resources. So the usability should be given considerable weightage. I am thinking of showing results in visual format e.g. bar charts, pie charts, graphs etc.
In the paper, H.Somers and Wild mention that "we feel confident that the exact-answer scoring method is adequate, and that allowing near synonyms and so on does not give a different result". I feel that in the case of gisting, however, using 'acceptable answers' will be significant. It is also reflected in the results obtained in “Peeking through....on apertium.org” paper. Implementing this feature will require to find the distance between words (whether words are same, closely related, or non-related) If time permits, I want to include some kind of gamification in the toolkit, for better user engagement. The idea is inspired from Prof. Dr Luis Ahn of Carnegie Mellon Unversity, who has developed a game for word sense disambiguation mentioned in his paper “Word sense disambiguation via Human Computation”.
Work I have done
- I have implemented a web-based application as part of my coding challenge. Link :[1]
The source-code can be accessed with administrator password “toolkit” in ‘Site Building option’. The code is also hosted on Github. Link [2] I have added following features. Users can add the new test cases. 'View Records' show the status of the test sentences, which of them has been evaluated and for which hint level. Results page contains analysis of evaluated test cases. For each case, it contains, table of masked words and the corresponding user guesses, hint level, accuracy, and other details. Login and authorisation feature.
- Prior to developing GUI, I have built a basic python program for the coding challenge. Link: [3]
- I have read the paper suggested by Mikel (Peeking through the language barrier: the development of open-source gisting system for Basque to English based on apertium.org). I have also read H.Somes and E.Wild paper on 'Evaluating Machine Translation: the Cloze Procedure Revisited'
- I have gone through the Apertium documentation and modules specification of Apertium in brief. I have installed Apertium and running it for Basque-English and Esperanto-English pair.
week | dates | goals |
---|---|---|
community bonding period 21 April - 19 May |
| |
1 | 19 - 24 May |
|
2 | 25 - 31 May |
|
3 | 1 - 7 June |
|
4 | 8 - 14 June |
|
Deliverable #1 |
web based and text-based interface for binary evaluation, with feature of adding a new language pair, adding new MT systems for authorized users, site admins can ‘add new language pair’, ‘add new MT systems’. | |
5 | 15 - 21 June |
|
6 | midterm eval
23 - 29 June |
|
7 | 30 June - 5 July |
|
8 | 14 - 20 July |
|
Deliverable #2 |
| |
9 | 21 - 27 July |
- Project completed
Released version of web-based and text-based interface of Apertium Assimilation evaluation toolkit
Skills and qualification
I am pursuing Bachelor of Technology and Master of Science(by research) in Computer Science and Engineering at International Institute of Information Technology, Hyderabad, India. I am pursuing my M.S. in Language Technology Research Centre, IIIT-H. My research interests are in the area of Machine Translation, Machine Learning. In my free time, I like to study more about theoretical aspects of computer science.
Programming skills:: Python, C++, C, Java, Octave I am well versed with HTML5, CSS, XML, GUI development with Python and Java Frameworks : web2py , Django and Ruby on Rails I have done following courses which are related to this project: Programming, Data Structures, Algorithms, Software Engineering, Databases, Software Systems and Design, Artificial Intelligence, Computational Linguistics and Natural Language Processing.
My particular focus for this project is to develop the toolkit with high standards of usability for both end-users and site admins.
Operating systems : Ubuntu, Fedora and windows 7 Relevant work done : Research Assistant May 2011-July 2011 IIIT-H Designed an interactive Virtual Memory Simulation Application using Java. Github link: [4] Designed a mini course portal and an online Music Library using web2py I have also written a term paper on “Estimation of Complexity in Hindi Sentences”
I am new to open-source development. I have been using open-source softwares for more than 4 years. I want to be a part of the community, understand the basic workflow and contribute to it I am excited about this project because of its application in understanding end-user’s perspective, evaluating and comparing MT systems. I am confident that I will accomplish the task well with my hard work and honest effort .
Non-Summer-of-Code plans for the summer
I wish to attend Indian classical music concerts around 8th-14th June. Summer of Code would be my main focus for the whole summer. I am keeping 40-50 hours per week for this project. If I attend the concerts, I will put extra hours to work on the project.