Difference between revisions of "User talk:Irene/proposal"

From Apertium
Jump to navigation Jump to search
(idea)
 
(→‎updated proposal: new section)
 
Line 17: Line 17:
 
[[Talk:Multiwords]] <br/>
 
[[Talk:Multiwords]] <br/>
 
[[Talk:Ideas for Google Summer of Code/Discontiguous multiwords]]
 
[[Talk:Ideas for Google Summer of Code/Discontiguous multiwords]]
  +
  +
== updated proposal ==
  +
  +
# tagging
  +
#* tag every discontinuous word for what can split it (e.g. "take out" -> can be split by a np/sn)
  +
#* where to insert the tagging?
  +
# transfer stage
  +
#* sequences of discontinuous stuff
  +
# pseudo:
  +
#* if one of the sequences is encountered, then look into tags to see if it could be an instance of a discontinuous word
  +
#* if it is, then check whether or not it is a [[Ideas_for_Google_Summer_of_Code/Discontiguous_multiwords#Coding_challenge | real discontinuous word]]
  +
#* if it's real, then do re-ordering. if not, then do nothing.

Latest revision as of 04:07, 30 May 2017

stage 1: MORPHOLOGICAL ANALYSIS

  1. Re-examine every multiword for whether or not it can be discontinuous, e.g. call (something) off, cheer (someone) up
    • parsing for multiwords can be done with a language-independent search for the words marked with , but I think determining whether or not a specific word can be discontinuous has to be done by hand.
    • different than what i originally proposed..
  2. If a word is separable, then tag it as so (introduce a new tag symbol).
    • the tag should contain information about which categories of words (np, vp) can split them. this will be useful when it comes to chunking, and achieving this alleviates the need for some of the hacks that we're currently using
    • maybe this calls for creating a section in the paradigm definitions, since many follow the same pattern: call (something) off, cheer (someone) up, take (it) out are all verb-np-preposition

stage 2: CHUNKING

  1. if the appropriate "chunk" is sandwiched between the separable word, then reorder the sentence accordingly
    • inter-chunk stage
    • maybe this could be done with a grep
    • check for false positives: take the thing out of the box does not use take out, as in take out the trash

Links[edit]

Multiwords
Talk:Multiwords
Talk:Ideas for Google Summer of Code/Discontiguous multiwords

updated proposal[edit]

  1. tagging
    • tag every discontinuous word for what can split it (e.g. "take out" -> can be split by a np/sn)
    • where to insert the tagging?
  2. transfer stage
    • sequences of discontinuous stuff
  3. pseudo:
    • if one of the sequences is encountered, then look into tags to see if it could be an instance of a discontinuous word
    • if it is, then check whether or not it is a real discontinuous word
    • if it's real, then do re-ordering. if not, then do nothing.