Difference between revisions of "Ideas for Google Summer of Code/Complex multiwords"
Jump to navigation
Jump to search
(5 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
The module should be bidirectional, that is, it should be able to be used for both analysing and for generating these multiwords. So, given an input of: |
The module should be bidirectional, that is, it should be able to be used for both analysing and for generating these multiwords. So, given an input of: |
||
* ''Prijem ambulantnih pacijenata se obavlja na prijemnom šalteru u prizemlju zgrade za dijagnostički imidžing.'' |
|||
*Još jedna očekivana posledica biće porast potrošnje za zdravstvo da bi se zadovoljila veća tražnje osoba te starosne grupe za ambulantnim uslugama. |
|||
* Outpatient admission takes place at the reception desk on the ground floor of the diagnostic imaging building. |
|||
*Another expected consequence is that health care spending will have to grow to meet the higher demands of people in this group for outpatient services. |
|||
Default translation of "ambulantni" would be "ambulatory", but in this case we want to translate " |
Default translation of "ambulantni" would be "ambulatory", but in this case we want to translate "ambulantnih pacijenata" as "outpatients" |
||
==Tasks== |
|||
* Work out a way to do unification in finite-state transducers, perhaps using some kind of flags. |
|||
* Make a program that can both analyse and generate multiwords, ensuring that they agree for given morphological features. |
|||
==Coding challenge== |
==Coding challenge== |
||
* Extract a set of test sentences from a corpus for an existing language pair into which you would like to incorporate the module. |
|||
* Write a stream processor (see [[Apertium stream format]]) for the output of lt-proc that parses character by character, respecting [[superblanks]]. |
* Write a stream processor (see [[Apertium stream format]]) for the output of lt-proc that parses character by character, respecting [[superblanks]]. |
||
==Frequently asked questions== |
==Frequently asked questions== |
||
* none yet, ''[[contact|ask us]] something!'' :) |
|||
==See also== |
==See also== |
Latest revision as of 23:56, 5 April 2013
Write a bidirectional module for specifying complex multiword units, for example dirección general and zračna luka. Although in the Romance languages it is not a big problem, as soon as you start to get to languages with cases (e.g. Serbo-Croatian, Slovenian, Russian, German, Icelandic, etc.) the problem comes that you can't define a multiword of adj nom because the adjective has a lot of inflection.
The module should be bidirectional, that is, it should be able to be used for both analysing and for generating these multiwords. So, given an input of:
- Prijem ambulantnih pacijenata se obavlja na prijemnom šalteru u prizemlju zgrade za dijagnostički imidžing.
- Outpatient admission takes place at the reception desk on the ground floor of the diagnostic imaging building.
Default translation of "ambulantni" would be "ambulatory", but in this case we want to translate "ambulantnih pacijenata" as "outpatients"
Tasks[edit]
- Work out a way to do unification in finite-state transducers, perhaps using some kind of flags.
- Make a program that can both analyse and generate multiwords, ensuring that they agree for given morphological features.
Coding challenge[edit]
- Extract a set of test sentences from a corpus for an existing language pair into which you would like to incorporate the module.
- Write a stream processor (see Apertium stream format) for the output of lt-proc that parses character by character, respecting superblanks.
Frequently asked questions[edit]
- none yet, ask us something! :)