Difference between revisions of "Speeding up monodix creation"

From Apertium
Jump to navigation Jump to search
Line 17: Line 17:
<spectie> align it with the swedish side
<spectie> align it with the swedish side
<spectie> then read off the alignments, taking the surface forms from the right side and the tags from the left side
<spectie> then read off the alignments, taking the surface forms from the right side and the tags from the left side

another variation of this without parallel corpora might be to use extract and then use a bilingual dictionary and comparable corpus
to disambiguate the possibilities.
</pre>
</pre>
[[Category:Documentation]]
[[Category:Documentation]]

Revision as of 16:00, 10 April 2008

This page outlines some ideas for increasing the speed at which monolingual dictionaries (analysers) can be created.

Extract

Tag transfer

Try this at some point:

<spectie> you have an aligned corpus
<spectie> polish--czech, czech--slovak, danish--swedish
<spectie> and you have an analyser for polish, czech or danish
<spectie> you want to make an analyser for swedish
<spectie> you make templates from the paradigms in the danish analyser
<spectie> tag the danish of the corpus
<spectie> that you have
<spectie> align it with the swedish side
<spectie> then read off the alignments, taking the surface forms from the right side and the tags from the left side

another variation of this without parallel corpora might be to use extract and then use a bilingual dictionary and comparable corpus
to disambiguate the possibilities.