User talk:Eden
Jump to navigation
Jump to search
Why Swahili instead of Lingala? - Francis Tyers (talk) 23:55, 9 March 2019 (CET)
'English=Lingala' pair wasn't part of the idea list for GSOC19. I'm planning to work on 'English-Swahili' pair first as part of GSOC and then the 'English-Lingala' pair.
- It doesn't have to be on the list, you can propose your own. Also, I think there is a fairly well developed Lingala transducer. But none for Swahili. - Francis Tyers (talk) 13:34, 10 March 2019 (CET)
- You're right. I noticed a couple of things: running the code, the morphological analyzer works perfectly(lin.automorf.bin). But I do not see a mono-dictionary file where I would be implementing and adding vocabulary and paradigms(like apertium-lin.lin.dix). Everything seems to be encoded in 'apertium-lin.lin.lexc' and I can't find any documentation on how to use it. And lasty, what's a transducer? Is it the mono-dictionary?
- A copy-paste from the IRC logs:
[17:52:48] <selimcan> begiak, tell eden With spectie and firespeaker we'd discussed which finite-state framework will work better for Swahili and decided that it will need to be based on HFST. See wiki.apertium.org for more info on it. [17:52:48] <begiak> selimcan: I'll pass that on when eden is around.
selimcan (talk) 05:18, 11 March 2019 (CET)
- Would you recommend working on Swahili or Lingala? Like you said, the Lingala transducer is fairly well developed, and I would only need to work on disambiguation, transfer rules, and the bi-lingual dictionary. For Swahili, I will have to build a transducer from scratch on top of everything else. Which one do you think is more feasible given the short time period of GSOC? (And yes, I HFST is more appropriate to deal with the complex morphology of both languages).