Difference between revisions of "Ideas for Google Summer of Code/Complex multiwords"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
|  (Created page with '{{TOCD}}  ==Coding challenge==  ==Frequently asked questions==  ==Previous GSOC projects==  Complex multiwords') | |||
| Line 1: | Line 1: | ||
| {{TOCD}} | {{TOCD}} | ||
| Write a bidirectional module for specifying complex multiword units, for example ''dirección general'' and ''zračna luka''. Although in the Romance languages it is not a big problem, as soon as you start to get to languages with cases (e.g. Serbo-Croatian, Slovenian, German, etc.) the problem comes that you can't define a multiword of adj nom because the adjective has a lot of inflection.  | |||
| The module should be bidirectional, that is, it should be able to be used for both analysing and for generating these multiwords. | |||
| ==Coding challenge== | ==Coding challenge== | ||
| * Write a stream processor (see [[Apertium stream format]]) for the output of apertium-tagger -p -g that parses character by character, respecting [[superblanks]].  | |||
| ==Frequently asked questions== | ==Frequently asked questions== | ||
| ==Previous GSOC projects== | ==Previous GSOC projects== | ||
| ==See also== | |||
| * [[Multiwords]] | |||
| [[Category:Ideas for Google Summer of Code|Complex multiwords]] | [[Category:Ideas for Google Summer of Code|Complex multiwords]] | ||
Revision as of 15:39, 4 March 2012
Write a bidirectional module for specifying complex multiword units, for example dirección general and zračna luka. Although in the Romance languages it is not a big problem, as soon as you start to get to languages with cases (e.g. Serbo-Croatian, Slovenian, German, etc.) the problem comes that you can't define a multiword of adj nom because the adjective has a lot of inflection.
The module should be bidirectional, that is, it should be able to be used for both analysing and for generating these multiwords.
Coding challenge
- Write a stream processor (see Apertium stream format) for the output of apertium-tagger -p -g that parses character by character, respecting superblanks.

