Difference between revisions of "Ideas for Google Summer of Code/Dictionary induction from parallel corpora"

Latest revision as of 18:17, 21 March 2024

Coding Challenge[edit]

Write a script that reads two parallel corpora, applies the appropriate monolingual taggers and some word-aligner (eflomal is pretty straightforward if you don't know where to begin), and then prints a list of paired words.

$ cat eng.txt
The cat ate the fish.
$ cat spa.txt
El gato comió el pez.
$ alignment-script apertium-eng/ eng.txt apertium-spa/ spa.txt
the<det><def><mf><sp> - el<det><def><m><sg>
cat<n><sg> - gato<n><m><sg>
...

Difference between revisions of "Ideas for Google Summer of Code/Dictionary induction from parallel corpora"

Latest revision as of 18:17, 21 March 2024

Coding Challenge[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools