User:Eden/GSoC progress

From Apertium
Jump to navigation Jump to search

Community Bonding Period

  • Find Swahili-Lingala resources
  • Update Lingala lexc transducer to lexd
  • New lexd transducer for Swahili
  • Keep track of coverage for Lin and Swa transducers
  • Get familiar with apertium-recursive
  • Set up swa-lin pair using apertium-recursive
  • Update GSOC progress page
  • James and Marry story + Wikipedia article in Swahili and Lingala.

Goals

  • By first evaluation: have story about kids or similar text to WER/PER of around 20% (work with all stages of translation, focus on "lowest-hanging fruit" relevant to the text)
  • By second evaluation: increase [trimmed] coverage to around 90% (work focused on lexicons, adding from frequency lists)
  • By final evaluation: work to get clean testvoc (work focused on transfer, making sure everything is dealt with one way or other)

Status table

Week Stems naïve coverage WER,PER Progress
dates swa lin swa-lin swa lin swa-lin swa→lin lin→swa Evaluation Notes
0 (community bonding) May 4 - May 31 86 1,444 26
1 June 1 - June 7 86 1,444 26
2 May 8 - June 14 170 1,444 26 Number of stems in lin transducer comes from prev. estimates. Manually counted stems in swa transducer
3 June 15 - June 21
4 June 22 - June 28
5 June 29 - July 5
6 July 6 - July 12
7 July 13 - July 19
8 July 20 - July 26
9 July 27 - Aug 2
10 July 3 - Aug 9
11 Aug 10 - Aug 16
12 Aug 17 - Aug 23

Work

  • June 8 - June 14

- verb, noun, adjective morphotatics in swa transducer

  • June 15 - June 21

- add missing verb TAM(continuative, reciprocal,causative)(
- more subsections in 'Verb Morphotatics'
- add stems in swa transducer
- start writing transfer rules

Notes

  • To count stems in lexc, try:
 grep -E ":\w+.*;" apertium-lin.lin.lexc | grep -v "[<>]" | wc -l
  • To count stems in the bidix, try this:
 grep "<p" apertium-eng-lin.eng-lin.dix  | wc -l
  • To get WER and PER use apertium-eval-translator-line