Difference between revisions of "Top tips for GSOC applications"

From Apertium
Jump to navigation Jump to search
Line 65: Line 65:
 
'''Apertium GSOC 2019'''
 
'''Apertium GSOC 2019'''
   
== '''Morphological Analyzer of Braj Language''' ==
+
== '''Template''' ==
   
 
== '''Contact Information''' ==
 
== '''Contact Information''' ==

Revision as of 20:19, 25 March 2019

Writing your GSOC application

Here are the main tips to help you in writing your GSOC application with Apertium.

  • Be realistic
    • We're more likely to accept ideas which are realistic than ones which are "way out there". But if you have a "way out there" idea, don't panic! We're still interested, but we'll try to find a subset of it which is achievable in the time scale available.
  • Be appropriate
    • Demonstrate you have a knowledge of Apertium, how it works and the problem it has that you'd like to solve. The Apertium 2.0: Official documentation is considered an essential reading.
  • Have a plan
    • Three months may seem like a long time, but it isn't. Show you have a definite plan with dates and deliverables, split into weeks is probably best. Don't forget to leave time for getting familiar with the platform — this should be ideally before, or in the community bonding period — and for documentation. Anyone thinking of working on a language pair should make sure that they read about testvoc and other quality controls, and factor those in. If you know of any breaks or absences beforehand, be upfront about them and plan around them.
  • Get in contact ASAP!
    • We get a lot of proposals: some good, most bad. Get in contact with your potential mentor as soon as possible by sending your proposal to the mailing list, and asking for feedback. Be responsive to feedback. Refine your application based on feedback. If the mentors remember you, your chances of being picked are higher.
  • Read the Ideas Page!
    • If you find yourself asking 'do you have any Java/Python/Fortran/x86 assembler projects...' -- you didn't read the ideas page. Read the ideas page.
  • Do the coding challenge
    • Every idea will have a coding challenge to perform, this is basically a test to see if you have the required skills to do the project or if you can acquire them in a short amount of time.

Other tips

We're not saying that following the advice below will automatically get you a mentor, but going through it will give you a pretty good chance!

  • Join IRC: even if you're idling or don't say anything, you'll discover more about how Apertium works.
  • Subscribe to the apertium-stuff mailing list.
  • Create a github username (if you don't already have one).
  • Ask for an account on the Wiki.

Then:

  • First install Apertium and a language pair; read through the new language pair HOWTO. This might even give you some more ideas!
    • You typically want language data from GitHub, core tools from our repo (official Debian packages are out of date though). See Installation.
  • When you think of Apertium, think Wikipedia (Be bold!) or think Nike (Just Do It!). Preferably, both.
  • Rule 1 here: Ask questions! Keep asking. The more you ask, the better.
  • Rule 2: No questions are stupid. We have all been new to Apertium once, we have all needed to ask questions. Asking them is proof to us that you are serious.
  • Even better: Write your questions, and a summary of the answers you get, on this wiki. A good summary shows us that you have understood what we told you.
  • Browse the wiki again, especially Apertium New Language Pair HOWTO.
  • Update the wiki so the next reader won't encounter the same problems as you did.
  • Play with some language pairs.
  • In a language pair of your own choice, try to edit the files, break stuff, and then make it work again — and then tell us about it.
  • If you think you know the problem better than the mentor does, it could be that you have misunderstood it. Read more about Apertium before making assumptions based on your existing experience.
  • While your code is compiling, look through the GsoC student guide from FLOSS manuals
  • Ask for an account on this Wiki, that way we can work collaboratively on applications. You should ask on our mailing list or on the IRC channel you just joined and an admin will create it for you.

Frequently asked questions

Do I first have to do the coding challenge and only then I get selected?

The way it works is this: First you need to find a mentor, then you need to write a proposal, then you need to submit the proposal to the Google Mélange site. After this, we read and evaluate the proposals, and we rank them. Then Google tells us how many slots we got, and we take the top n ranked slots, where n is the number of slots we got.

You don't have to do the coding challenge, but it will help you with (a) finding a mentor, and (b) writing your proposal. You are unlikely to be able to write a good proposal without knowing something about Apertium -- which the coding challenge will help you with. And by asking questions, hanging out on IRC, you will get to know the mentors, increasing the chances of finding one who is interested in your proposal.

Is it possible to see applications from previous years?

Some of the applications from previous years are available on our Wiki under the category Category:Student proposals for the Google Summer of Code.

How can I get an account on the Wiki?

There are a couple of options, either email our mailing list (apertium-stuff@lists.sourceforge.net) or come on IRC and ask one of the mentors to add you an account. In both cases you will need an email address and a username.

Apertium GSOC 2019

Template

Contact Information

Name – Neerav Mathur

Location – Agra, Uttar Pradesh, India -282001

E-mail – nmathur54@gmail.com

Mobile no. - +919719009548

Github - https://github.com/ommathur54

Time Zone - UTC +5.30

Skills And Experience

My name is Neerav Mathur. I am 4th semester post graduate student at K.M.Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University Agra, Uttar Pradesh, India. I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate Mohit also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. As part of this, I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform. I am working on Digital Dictionary for Sema language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.


University Courses:

Programing (Python, Xml, HTML)

Computer Tools for Linguistics Research

Linguistics Courses (Phonetics, Morphology, Syntax, Semantics, Field work, Sign Language, Machine Translation)

Theories of Machine Translation and Machine Translation (practical)


Technical Skills:

Programing Language – Python.

Web Design – HTML .

Databases – MySQL.

Languages – Hindi (Native), Braj, English.


Why Interest In Machine Translation ?

I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. During my fourth semester I attend Hands-on workshop on machine Translation where I get information about MT systems (Apertium) for under resourced languages and how morphological Analyzer help in increasing the performance of MT in rule based system. From there my interest get rised for MT.

Why Interest In Apertium ?

This organization works on things which are very interesting for me as a linguist & computational linguist: (rule-based) machine translation, languages, NLP and so on. I get more interested with Apertium when I get information in MT workshop that Apertium works with all kind of languages which is very important for support for all languages. Also Apertium community is very friendly and very helpful to new members, members here are always ready to help us ( Apertium community reply frequently on what ever query do we have ). It encourages me to work with Apertium.

Task And Plan ?

Task - I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform.


Reason Why Apertium And Google Sponsor It ?

Braj is one of the most under-resourced Indian language. Which is spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India. It is spoken by 1,556,314 native speaker (according 2011 census) . There is no Braj translator present online or offline and there is no rule-based translator with morphological analyzer. So I believe we can improve the quality of translation by applying rule-based model (Apertium).


Description Of How And Who It Will Benefit In Society ?

Firstly, Morphological Analyzers will help in increasing the performance of Machine translation in rule based system, especially for morphologically rich languages. I want to work on this project to make morphological analyzer for developing English - Braj MT system. Secondly, no such work has been done on MT for Braj Language, so my work will contribute to reduce the human work and improve the translation for Braj Language.

Work Plan

Week 1 - Preparing linguistic rule for Morphological analyzer.

Week 2 - Preparing linguistic rule for Morphological analyzer.

Week 3 - Tokenizing the data.

Week 4 - Prepare tag set.

Deliverable #1

Submit the Tokenized and prepared tagset

Week 5 - Preparing the affix list validate (Prepared Suffix list ) in corpus

Week 6 – Writing the program to develop Braj morphological analyzer.

Week 7 - Writing the program to develop Braj morphological analyzer.

Week 8 - Train and test the model.


Deliverable #2

Submit the program and trained, test model


Week 9 - Test the model with different domain of word.

Week 10 - Fixing the occurring error in model.

Week 11 -Again train and test the model.

Week 12 - Evaluation of results or model.


Project Completed Submission of project