Task ideas for Google Code-in/Manually disambiguate text
Words can have more than one possible interpretation, for example, "tie" in English can be a noun denoting an item of clothing "she put on her tie" or it can be a verb "they tie a knot". We call this ambiguity — the word "tie" is ambiguous between being a noun and a verb. When humans read texts they automatically choose the appropriate interpretation based on the context. Sometimes ambiguity can lead to jokes or misunderstandings when the intended meaning is less frequent. For example, "How do you make a turtle fast?" — "Take away her food." This is funny because fast can mean either "quick" or "not eat".
Objective
The objective of this task is to take some text of your choosing and manually disambiguate it. That is, for each ambiguous word you choose the appropriate interpretation in context.
Note that your chosen text should be open-source so that copyright isn't an issue. Open-source text can be obtained from places such as Wikipedia and Project Gutenberg. Some notable open-source licenses that you may find text under are CC BY, CC BY-SA, CC BY-NC, CC BY-SA-NC, the GFDL, and any form of public domain (including CC0 and the Unlicense).
How to do it
First, pass your text through your chosen language's morphological analyser. This will result in a huge jumble of random words, but don't worry! The next step is to clean that up.
Next, pass that output through cg-conv to make it tidy. Finally, you are able to start disambiguating.
The full command for processing your text is
cat texts/file.txt | apertium -d . xxx-morph | cg-conv -a > texts/formatted_file.txt
where xxx
should be replaced with the corresponding language code (e.g. eng
) and file.txt
with your text file.
Here is a video tutorial made by a GCI student in 2018 if you're still confused
Example
Let's take the example, "How do you make a turtle fast?"... You will receive morphologically annotated text in the following format. Your task is to add a ;
before each of the lines which do not contain appropriate interpretations.
Input | Output |
---|---|
"<How>" "how" preadv "how" adv itg "<do>" "do" vaux inf "do" vaux pres "<you>" "prpers" prn subj p2 mf sp "prpers" prn obj p2 mf sp "<make>" "make" n sg "make" vblex inf "make" vblex pres "make" vblex imp "<a>" "a" det ind sg "<turtle>" "turtle" n sg "<fast>" "fast" adv "fast" adj sint "fast" n sg "fast" vblex inf "fast" vblex pres "fast" vblex imp "<?>" "?" sent |
"<How>" "how" preadv ; "how" adv itg "<do>" "do" vaux inf ; "do" vaux pres "<you>" "prpers" prn subj p2 mf sp ; "prpers" prn obj p2 mf sp "<make>" ; "make" n sg "make" vblex inf ; "make" vblex pres ; "make" vblex imp "<a>" "a" det ind sg "<turtle>" "turtle" n sg "<fast>" ; "fast" adv ; "fast" adj sint ; "fast" n sg "fast" vblex inf ; "fast" vblex pres ; "fast" vblex imp "<?>" "?" sent |
Language specific guidelines
For specific guidelines on how you make the decision as to which analysis or interpretation is most appropriate in a given context please see the language-specific guidelines below:
More Info
For a more detailed tutorial, see this video