User talk:Eden

From Apertium
Revision as of 09:03, 12 March 2019 by Eden (talk | contribs)
Jump to navigation Jump to search

Why Swahili instead of Lingala? - Francis Tyers (talk) 23:55, 9 March 2019 (CET)

'English=Lingala' pair wasn't part of the idea list for GSOC19. I'm planning to work on 'English-Swahili' pair first as part of GSOC and then the 'English-Lingala' pair.

It doesn't have to be on the list, you can propose your own. Also, I think there is a fairly well developed Lingala transducer. But none for Swahili. - Francis Tyers (talk) 13:34, 10 March 2019 (CET)
You're right. I noticed a couple of things: running the code, the morphological analyzer works perfectly(lin.automorf.bin). But I do not see a mono-dictionary file where I would be implementing and adding vocabulary and paradigms(like apertium-lin.lin.dix). Everything seems to be encoded in 'apertium-lin.lin.lexc' and I can't find any documentation on how to use it. And lasty, what's a transducer? Is it the mono-dictionary?
A copy-paste from the IRC logs:
[17:52:48] <selimcan> begiak, tell eden With spectie and firespeaker we'd discussed which finite-state framework will work better for Swahili and decided that it will need to be based on HFST. See for more info on it.
[17:52:48] <begiak> selimcan: I'll pass that on when eden is around.

selimcan (talk) 05:18, 11 March 2019 (CET)

Would you recommend working on Swahili or Lingala? Like you said, the Lingala transducer is fairly well developed, and I would only need to work on disambiguation, transfer rules, and the bi-lingual dictionary. For Swahili, I will have to build a transducer from scratch on top of everything else. Which one do you think is more feasible given the short time period of GSOC? (And yes, I HFST is more appropriate to deal with the complex morphology of both languages).