Difference between revisions of "Compounds"

From Apertium
Jump to navigation Jump to search
Line 10: Line 10:


* Koehn, P. and Knight, K. (2003) "[http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/compound2003.pdf Empirical Methods for Compound Splitting]". ''11th Conference of the European Chapter of the Association for Computational Linguistics'', (EACL2003).
* Koehn, P. and Knight, K. (2003) "[http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/compound2003.pdf Empirical Methods for Compound Splitting]". ''11th Conference of the European Chapter of the Association for Computational Linguistics'', (EACL2003).
* Brown, R. (2002) "[http://www.eamt.org/archive/tmi2002/conference/02_brown.pdf Corpus-Driven Splitting of Compound Words]". ''TMI 2002''

Revision as of 09:45, 10 June 2007

Some languages (in Indo-European particularly Germanic languages) like to make long compound words with low frequency that are unlikely to be found in dictionaries.

  • Afrikaans: footboodskaap, foot+boodskaap ("error message"), (cf. groeteboodskap, "greeting message")
  • Dutch : "hulpagina" (help page), "woordbetekenis" (meaning of a word),
  • German: Kontaktlinsenverträglichkeitstest, Kontakt+linsen+verträglichkeits+test ("contact-lens compatibility test")

Perhaps there could be some method of attempting to resolve unknown compound words into their constituent parts.

Further reading