User talk:Irene/proposal

From Apertium
Revision as of 00:46, 28 May 2017 by Irene (talk | contribs) (idea)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

stage 1: MORPHOLOGICAL ANALYSIS

  1. Re-examine every multiword for whether or not it can be discontinuous, e.g. call (something) off, cheer (someone) up
    • parsing for multiwords can be done with a language-independent search for the words marked with , but I think determining whether or not a specific word can be discontinuous has to be done by hand.
    • different than what i originally proposed..
  2. If a word is separable, then tag it as so (introduce a new tag symbol).
    • the tag should contain information about which categories of words (np, vp) can split them. this will be useful when it comes to chunking, and achieving this alleviates the need for some of the hacks that we're currently using
    • maybe this calls for creating a section in the paradigm definitions, since many follow the same pattern: call (something) off, cheer (someone) up, take (it) out are all verb-np-preposition

stage 2: CHUNKING

  1. if the appropriate "chunk" is sandwiched between the separable word, then reorder the sentence accordingly
    • inter-chunk stage
    • maybe this could be done with a grep
    • check for false positives: take the thing out of the box does not use take out, as in take out the trash

Links

Multiwords
Talk:Multiwords
Talk:Ideas for Google Summer of Code/Discontiguous multiwords