Ideas for Google Summer of Code/Backpropagation

This project is kind of like backpropagation but the each layer of the network/pipeline is different and this probably needs to be taken into account.

You could probably get a lot of the information by stepping through the pipeline in reverse, but there's also some need to pay attention to what each module does.

Some thoughts:

For comparison, you'll need to analyze the reference sentence, if you get unknowns that's a target monodix issue
If you have the right words in the wrong order, that's probably transfer
If the correct word was present after bidix but not after lexical selection, that's the issue
Similarly for morphological disambiguation
A gender issue on pronouns or subject agreement on verbs might be anaphora (might have to restrict this logic to only pairs that use apertium-anaphora and words with anaphora attached to them)
If it's structurally correct but the wrong lemma, probably bidix
Unknown word mark? Analyzer

Also, problems may have more than one plausible solution!

Coding Challenge[edit]

Write a script that takes a bilingual corpus and runs it through the full pipeline and just up to bidix and identify words with lexical selection errors (that is, the desired lemma+POS was in the bidix output but not the final output) and print them out with a 3 words of context on either side.

Ideas for Google Summer of Code/Backpropagation

Coding Challenge[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools