Difference between revisions of "User:Aida/Application"
Jump to navigation
Jump to search
Line 94: | Line 94: | ||
# adding transfer rules |
# adding transfer rules |
||
# 500-word evaluation, WER ~30% |
# 500-word evaluation, WER ~30% |
||
# trimmed coverage 51% |
|||
| {{Workeval5|0}} |
| {{Workeval5|0}} |
||
| |
| |
||
Line 103: | Line 103: | ||
!style="text-align: right"| 23 - 29 June |
!style="text-align: right"| 23 - 29 June |
||
| |
| |
||
# total |
# total 2200 stems in dix |
||
# clean testvoc for {{tag|num}} {{tag|post}} |
# clean testvoc for {{tag|num}} {{tag|post}} |
||
# adding transfer rules |
|||
# trimmed coverage 53% |
|||
| {{Workeval5|0}} |
| {{Workeval5|0}} |
||
| |
| |
||
# - |
|||
# stems in dix: 408 |
|||
| |
| |
||
* did not show up —[[User:Firespeaker|Firespeaker]] 02:28, 2 July 2013 (UTC) |
|||
|- |
|- |
||
! 3 |
! 3 |
||
!style="text-align: right"| 30 - 6 July |
!style="text-align: right"| 30 - 6 July |
||
| |
| |
||
# total |
# total 2900 stems in dix |
||
# clean testvoc for {{tag|cnjcoo}} {{tag|cnjadv}} {{tag|cnjsub}} |
# clean testvoc for {{tag|cnjcoo}} {{tag|cnjadv}} {{tag|cnjsub}} |
||
# adding transfer rules |
|||
# trimmed coverage 55% |
|||
| {{Workeval5|2}} |
| {{Workeval5|2}} |
||
| |
| |
||
# -- |
|||
# stems in dix: 508 |
|||
# -- |
|||
# trimmed coverage: 59.5%,51.5% |
|||
| |
| |
||
* trimmed coverage good |
|||
* too narrow a focus on a single corpus |
|||
* number of stems too low |
|||
* no new WER text |
|||
* no testvoc |
|||
—[[User:Firespeaker|Firespeaker]] 20:43, 8 July 2013 (UTC) |
|||
|- |
|- |
||
! 4 |
! 4 |
||
!style="text-align: right"| 7 - 13 July |
!style="text-align: right"| 7 - 13 July |
||
| |
| |
||
# total |
# total 3600 stems in dix |
||
# clean testvoc for {{tag|adv}} |
# clean testvoc for {{tag|adv}} |
||
# adding transfer rules |
|||
# trimmed coverage 59% |
|||
|{{Workeval5|2}} |
|{{Workeval5|2}} |
||
|rowspan="2"| |
|rowspan="2"| |
||
# -- |
|||
# stems in dix: 2574 |
|||
# -- |
|||
# trimmed coverage: 69.3%,63.8% |
|||
# azattyq_24455849 WER: 14.78% |
|||
# completed most of TODO-list |
|||
|rowspan="2"| |
|rowspan="2"| |
||
* good progress on adding stems |
|||
* fixed little things as directed |
|||
* good progress on post-editing process |
|||
* didn't make good progress on reducing WER |
|||
* still no testvoc |
|||
* committed once every 3 or 4 days; '''should be committing every day''' |
|||
* poor communication with mentors; needs to be around more often |
|||
—[[User:Firespeaker|Firespeaker]] 22:16, 22 July 2013 (UTC) |
|||
|- |
|- |
||
! 5 |
! 5 |
||
!style="text-align: right"| 14 - 20 July |
!style="text-align: right"| 14 - 20 July |
||
| |
| |
||
# total |
# total 4200 stems in dix |
||
# clean testvoc for {{tag|prn}} {{tag|det}} |
# clean testvoc for {{tag|prn}} {{tag|det}} |
||
# adding transfer rules |
|||
# trimmed coverage 63% |
|||
|{{Workeval5|3}} |
|{{Workeval5|3}} |
||
|- |
|- |
||
Line 163: | Line 153: | ||
!style="text-align: right"| 21 - 27 July |
!style="text-align: right"| 21 - 27 July |
||
| |
| |
||
# total |
# total 4900 stems in dix |
||
# clean testvoc for {{tag|adj}} {{tag|adj}}{{tag|advl}} |
# clean testvoc for {{tag|adj}} {{tag|adj}}{{tag|advl}} |
||
# adding transfer rules |
|||
# trimmed coverage 68% |
|||
|{{Workeval5|3}} |
|{{Workeval5|3}} |
||
|rowspan="3"| |
|rowspan="3"| |
||
# -- |
|||
# stems in dix: 5552 |
|||
# trimmed coverage: 72%,67% |
|||
# azattyq_24455849 WER: 18.01% |
|||
|rowspan="2"| |
|rowspan="2"| |
||
* good improvement in dix |
|||
** should be checking for errors (e.g., extra spaces) |
|||
* not much progress with WER text |
|||
** simple lrx and t1x should be enough here |
|||
* still no indication of progress with testvoc |
|||
* better communication and commit frequency, but could still improve |
|||
—[[User:Firespeaker|Firespeaker]] 18:21, 1 August 2013 (UTC) |
|||
|- |
|- |
||
! 7 |
! 7 |
||
!style="text-align: right"| 28 - 3 August |
!style="text-align: right"| 28 - 3 August |
||
| |
| |
||
# total |
# total 5600 stems in dix |
||
# adding transfer rules |
|||
# trimmed coverage 70% |
|||
|{{Workeval5|2}} |
|{{Workeval5|2}} |
||
|- |
|- |
||
!colspan="2" style="text-align: right"| |
!colspan="2" style="text-align: right"| 4 - 11 August |
||
| |
| |
||
# total |
# total 6300 stems in dix |
||
# 500-word evaluation, WER ~ |
# 500-word evaluation, WER ~30% |
||
# trimmed coverage 72% |
|||
|{{Workeval5|2}} |
|{{Workeval5|2}} |
||
| |
| |
||
* midterm TODO list goals only partially attained |
|||
* overall progress has been mediocre |
|||
* among the lowest-performing students |
|||
* noticeable improvement in the last few weeks |
|||
* needs to improve more to pass the final |
|||
—[[User:Firespeaker|Firespeaker]] 18:26, 1 August 2013 (UTC) |
|||
|- |
|- |
||
! 8 |
! 8 |
Revision as of 19:36, 14 March 2014
- Name: Sundetova Aida
- E-mail address: sun27aida@gmail.com
- Other information that may be useful to contact you: nick on the #apertium channel: Aida
- Why is it you are interested in machine translation?
- I have started to learn machine translation in 2012 when I joined to a project, which included developing machine translation from English to Kazakh. Before it I was really interested in artificial intelligence and automation of processes. I continued developing english-kazakh machine translation on Apertium and tried to know more about Apertium free/open-source machine translation platform. Knowledge of the languages, programming and my target to do translation better helped me to learn new programming language as XML and improve my working skills.
- Why is it that they are interested in the Apertium project?
- At the first, Apertium is free/open-source machine translation platform, which means that developers from other countries like me can join and start to do translations for new language pair. Apertium uses Unix “pipelines” which are very useful for fast diagnosis and debugging and I can use additional modules between existing modules, like using HFST(Helsinki finite-state transducer) for morphological analysis and generation for Kazakh language.
- Which of the published tasks are you interested in? What do you plan to do?
- I plan to improve “Apertium English-kazakh” to reach a good translation quality. I already develop this pair, but it doesn't have corpora and enough vocabulary to show adequate translation. My target is to make vocabulary coverage bigger than now by using corpora from news and wikipedia and come close to working translator.
- Include a proposal, including
- a title, --what title??
- reasons why Google and Apertium should sponsor it,
- a description of how and who it will benefit in society,
- English to Kazakh machine translation are very important, because Kazakh is Turkic language, so transfer rules, which I write for this pair can be useful for another English – Turkic Languages pairs.
- and a detailed work plan (including, if possible, a brief schedule with milestones and deliverables). Include time needed to think, to program, to document and to disseminate.
- I plan to work more on vocabulary, and add transfer rules is it needed.
- List your skills and give evidence of your qualifications. Tell us what is your current field of study,
major, etc. Convince us that you can do the work. In particular we would like to know whether you have programmed before in open-source projects.
- I have developed a English to Kazakh machine translation on Apertium since 2012 and Kazakh to English since 2013. I have great experience in writing and correcting transfer rules, lexical selection and constraint grammar rules, also adding vocabulary in monolingual and bilingual dictionaries. I study on 4th grade of bachelor's degree in Information Systems and I will graduate in the summer 2014. I know programming languages: C, C++, C#, HTML, XML, and I have basic knowledge of PHP. In addition, I can work with databases and know SQL. My mother language is Kazakh and I also speak English and Russian.
- List any non-Summer-of-Code plans you have for the Summer, especially employment, if you are applying for internships, and class-taking. Be specific about schedules and time commitments. we would like to be sure you have at least 30 free hours a week to develop for our project.
- Before 20 of June I have final exams, but I will have about 3 hours a day to work on project. After graduating from the university I will be free and can spend 30 hours a week on Apertium.
Contents
Major goals
- Good WER
- Clean testvoc
- 8'000 stems in bidix (~700 stems per week, or ~100 per day)
- Additional rules:transfer, lexical, constraint grammar(~10 per week,or ~2 per day)
Schedule
Timeline
See GSoC 2014 Timeline for complete timeline. Important coding dates follow:
- April 22nd: begin working on project
- June 27th - August 17th: midterm evaluations
- August 18th: 'pencils down' date
- August 22th: final evaluation
Workplan
week | dates | goals | eval | accomplishments | notes |
---|---|---|---|---|---|
post-application period 23 March - 17 April |
|
|
| ||
community bonding period 17 April - 1 June |
note: should be in IRC every day |
|
| ||
1 | 1 - 22 June |
|
|||
2 | 23 - 29 June |
|
|
||
3 | 30 - 6 July |
|
|
||
4 | 7 - 13 July |
|
|
||
5 | 14 - 20 July |
|
|||
6 | 21 - 27 July |
|
|
||
7 | 28 - 3 August |
|
|||
4 - 11 August |
|
||||
8 | 4 - 10 August |
|
|
||
9 | 11 - 17 August |
|
|||
10 | 18 - 24 August |
|
|||
11 | 25 - 31 August |
|
|
||
12 | 1 - 7 September |
|
|
—Firespeaker 07:29, 10 September 2013 (UTC) | |
13 | 8 - 15 September |
|
|
—Firespeaker 20:06, 22 September 2013 (UTC) | |
pencils-down week final evaluation 16 - 23 September |
|
|
| ||
Final evaluation |
|
Tips and Tricks
Adding stems quickly
- Add top stems from frequency lists of unknown forms
- Use spectie's dix-entries-to-be-checked script