Ideas for Google Summer of Code/Morphological analyser

From Apertium
< Ideas for Google Summer of Code
Revision as of 15:27, 5 April 2021 by Francis Tyers (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Implement a transducer-based morphological analyser/generator for a new language.

Tentative requirements:

  • a morphology for your language
  • a dictionary (digital helps; expected 10k lexemes)
  • corpus of text (50k+ tokens which can be made public)

These can be adjusted with the consent of the mentor.

Coding challenge:

Take an excerpt of 200-300 tokens from your corpus and implement an analyser for it. The analyser should completely analyse at least one sentence, and you should aim for as close to complete coverage of the excerpt as possible.

Present your coding challenge in IRC or on the mailing list and ask for feedback.