Ideas for Google Summer of Code/Backpropagation

From Apertium
< Ideas for Google Summer of Code
Revision as of 14:54, 3 February 2021 by Popcorndude (talk | contribs) (Created page with "This project is kind of like backpropagation but the each layer of the network/pipeline is different and this probably needs to be taken into account. You could probably get...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This project is kind of like backpropagation but the each layer of the network/pipeline is different and this probably needs to be taken into account.

You could probably get a lot of the information by stepping through the pipeline in reverse, but there's also some need to pay attention to what each module does.

Some thoughts:

  • For comparison, you'll need to analyze the reference sentence, if you get unknowns that's a target monodix issue
  • If you have the right words in the wrong order, that's probably transfer
  • If the correct word was present after bidix but not after lexical selection, that's the issue
  • Similarly for morphological disambiguation
  • A gender issue on pronouns or subject agreement on verbs might be anaphora (might have to restrict this logic to only pairs that use apertium-anaphora and words with anaphora attached to them)
  • If it's structurally correct but the wrong lemma, probably bidix
  • Unknown word mark? Analyzer

Also, problems may have more than one plausible solution!

Coding Challenge[edit]

Write a script that takes a bilingual corpus and runs it through the full pipeline and just up to bidix and identify words with lexical selection errors (that is, the desired lemma+POS was in the bidix output but not the final output) and print them out with a 3 words of context on either side.