Difference between revisions of "Apertium-kaz-tat/paper"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) (→Ilnar) |
Firespeaker (talk | contribs) (→Ilnar) |
||
Line 19: | Line 19: | ||
*** What was it that you noticed with -ғалы/-гелі (and its correspondent in Tatar)? |
*** What was it that you noticed with -ғалы/-гелі (and its correspondent in Tatar)? |
||
** Find some exemplary bidix entries for figure 2. |
** Find some exemplary bidix entries for figure 2. |
||
** New example for table 3 |
|||
** |
*** maybe Kazakh equivalent of original sentence: "Ауа райы бүгін өте/әбден жақсы, жылы." |
||
*** maybe "Ол енді ол дыбысты анығырақ ести бастады" (some good ambiguity). Unfortunately, current output is "Ул иңне ул тавышны аныграк ишетә башлады". Could we fix this? |
|||
=== Fran === |
=== Fran === |
Revision as of 06:15, 18 April 2013
We're submitting a paper on apertium-kaz-tat to MT Summit 2013. DEADLINE: APRIL 20.
Contents
TODO
Ideal benchmarks:
- document rules in the rlx with example sentences
- more like 100-150 (currently ~40) disambiguation rules in -kaz
Ilnar
- Development corpus (lots and lots of text)
Work on increasing coverage (via lexc) and trimmed coverage (via dix) to 90%- Work on making sure testvoc passes
- add rules — disambigation (CG), lexical selection, and transfer.
- Test corpus (about 10 pages; don't base rules on this text!)
- Make a gold standard translation/correct some tests for error-rate testing
- Paper
Add affiliation to paper- Help JNW come up with some more contrastive stuff
(see below / FIXME: Ilnars in paper)- Tatar equivalent of барайын деп жатырмын "I'm planning on going" ?
- What was it that you noticed with -ғалы/-гелі (and its correspondent in Tatar)?
- Find some exemplary bidix entries for figure 2.
- New example for table 3
- maybe Kazakh equivalent of original sentence: "Ауа райы бүгін өте/әбден жақсы, жылы."
- maybe "Ол енді ол дыбысты анығырақ ести бастады" (some good ambiguity). Unfortunately, current output is "Ул иңне ул тавышны аныграк ишетә башлады". Could we fix this?
Fran
- Delegate out error-rate testing tasks
new version of Table 2
JNW
- Work on last few issues in -tat twol
Write up background- Contrastive analysis of Kazakh and Tatar
phonological differences (a generalised summary, 2 or 3 small specific examples)orthographical differences (a generalised summary, 1 or 2 small specific examples)lexical and morphological differences (2 or 3 specific examples)morphotactic differences (2 or 3 specific examples)- syntactic differences (2 or 3 specific examples)
- Coverage stuff
- divide corpora into 10 pieces and run coverage for each to get stddev
Over-all
1 2 3 3.1 3.2 3.3 3.4 4 4.1 4.2 4.3 4.4 4.5 5 5.1 6 Acknowledgements References