User talk:Irene/proposal
Jump to navigation
Jump to search
stage 1: MORPHOLOGICAL ANALYSIS
- Re-examine every multiword for whether or not it can be discontinuous, e.g. call (something) off, cheer (someone) up
- parsing for multiwords can be done with a language-independent search for the words marked with , but I think determining whether or not a specific word can be discontinuous has to be done by hand.
- different than what i originally proposed..
- If a word is separable, then tag it as so (introduce a new tag symbol).
- the tag should contain information about which categories of words (np, vp) can split them. this will be useful when it comes to chunking, and achieving this alleviates the need for some of the hacks that we're currently using
- maybe this calls for creating a section in the paradigm definitions, since many follow the same pattern: call (something) off, cheer (someone) up, take (it) out are all verb-np-preposition
stage 2: CHUNKING
- if the appropriate "chunk" is sandwiched between the separable word, then reorder the sentence accordingly
- inter-chunk stage
- maybe this could be done with a grep
- check for false positives: take the thing out of the box does not use take out, as in take out the trash
Links
Multiwords
Talk:Multiwords
Talk:Ideas for Google Summer of Code/Discontiguous multiwords