From Apertium
Revision as of 08:27, 24 July 2007 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

Some languages (in Indo-European particularly Germanic languages) like to make long compound words with low frequency that are unlikely to be found in dictionaries.

  • Afrikaans: footboodskaap, foot+boodskaap ("error message"), (cf. groeteboodskap, "greeting message")
  • Dutch : "hulpagina" (help page), "woordbetekenis" (meaning of a word),
  • German: Kontaktlinsenverträglichkeitstest, Kontakt+linsen+verträglichkeits+test ("contact-lens compatibility test")

Perhaps there could be some method of attempting to resolve unknown compound words into their constituent parts.

Outstanding questions

  • Where would compound processing go in the pipeline? Presumably after initial analysis? e.g. in between lt-proc and apertium-tagger.

Further reading