Jump to navigation Jump to search
'English=Lingala' pair wasn't part of the idea list for GSOC19. I'm planning to work on 'English-Swahili' pair first as part of GSOC and then the 'English-Lingala' pair.
- It doesn't have to be on the list, you can propose your own. Also, I think there is a fairly well developed Lingala transducer. But none for Swahili. - Francis Tyers (talk) 13:34, 10 March 2019 (CET)
- You're right. I noticed a couple of things: running the code, the morphological analyzer works perfectly(lin.automorf.bin). But I do not see a mono-dictionary file where I would be implementing and adding vocabulary and paradigms(like apertium-lin.lin.dix). Everything seems to be encoded in 'apertium-lin.lin.lexc' and I can't find any documentation on how to use it. And lasty, what's a transducer? Is it the mono-dictionary?
- A copy-paste from the IRC logs:
[17:52:48] <selimcan> begiak, tell eden With spectie and firespeaker we'd discussed which finite-state framework will work better for Swahili and decided that it will need to be based on HFST. See wiki.apertium.org for more info on it. [17:52:48] <begiak> selimcan: I'll pass that on when eden is around.
- Would you recommend working on Swahili or Lingala? Like you said, the Lingala transducer is fairly well developed, and I would only need to work on disambiguation, transfer rules, and the bi-lingual dictionary. For Swahili, I will have to build a transducer from scratch on top of everything else. Which one do you think is more feasible given the short time period of GSOC? (And yes, HFST is more appropriate given the complex morphology of both languages).