Ideas for Google Summer of Code/Complex multiwords
< Ideas for Google Summer of Code
Jump to navigation
Jump to search
Revision as of 13:32, 14 March 2013 by Francis Tyers (talk | contribs)
Write a bidirectional module for specifying complex multiword units, for example dirección general and zračna luka. Although in the Romance languages it is not a big problem, as soon as you start to get to languages with cases (e.g. Serbo-Croatian, Slovenian, Russian, German, Icelandic, etc.) the problem comes that you can't define a multiword of adj nom because the adjective has a lot of inflection.
The module should be bidirectional, that is, it should be able to be used for both analysing and for generating these multiwords. So, given an input of:
- Prijem ambulantnih pacijenata se obavlja na prijemnom šalteru u prizemlju zgrade za dijagnostički imidžing.
- Outpatient admission takes place at the reception desk on the ground floor of the diagnostic imaging building.
Default translation of "ambulantni" would be "ambulatory", but in this case we want to translate "ambulantnih pacijenata" as "outpatients"
Coding challenge
- Write a stream processor (see Apertium stream format) for the output of lt-proc that parses character by character, respecting superblanks.