Task ideas for Google Code-in/Manually disambiguate text

From Apertium
Jump to navigation Jump to search

Words can have more than one possible interpretation, for example, "tie" in English can be a noun denoting an item of clothing "she put on her tie" or it can be a verb "they tie a knot". We call this ambiguity — the word "tie" is ambiguous between being a noun and a verb. When humans read texts they automatically choose the appropriate interpretation based on the context. Sometimes ambiguity can lead to jokes or misunderstandings when the intended meaning is less frequent. For example, "How do you make a turtle fast?" — "Take away her food." This is funny because fast can mean either "quick" or "not eat".

Objective[edit]

The objective of this task is to take some text of your choosing and manually disambiguate it. That is, for each ambiguous word you choose the appropriate interpretation in context.

Note that your chosen text should be open-source so that copyright isn't an issue. Open-source text can be obtained from places such as Wikipedia and Project Gutenberg. Some notable open-source licenses that you may find text under are CC BY, CC BY-SA, CC BY-NC, CC BY-SA-NC, the GFDL, and any form of public domain (including CC0 and the Unlicense).

How to do it[edit]

First, pass your text through your chosen language's morphological analyser. This will result in a huge jumble of random words, but don't worry! The next step is to clean that up.

Next, pass that output through cg-conv to make it tidy. Finally, you are able to start disambiguating.

The full command for processing your text is

cat texts/file.txt | apertium -d . xxx-morph | cg-conv -a > texts/formatted_file.txt

where xxx should be replaced with the corresponding language code (e.g. eng) and file.txt with your text file.

Here is a video tutorial made by a GCI student in 2018 if you're still confused

Example[edit]

Let's take the example, "How do you make a turtle fast?"... You will receive morphologically annotated text in the following format. Your task is to add a ; before each of the lines which do not contain appropriate interpretations.

Input Output
"<How>"
	"how" preadv
	"how" adv itg
"<do>"
	"do" vaux inf
	"do" vaux pres
"<you>"
	"prpers" prn subj p2 mf sp
	"prpers" prn obj p2 mf sp
"<make>"
	"make" n sg
	"make" vblex inf
	"make" vblex pres
	"make" vblex imp
"<a>"
	"a" det ind sg
"<turtle>"
	"turtle" n sg
"<fast>"
	"fast" adv
	"fast" adj sint
	"fast" n sg
	"fast" vblex inf
	"fast" vblex pres
	"fast" vblex imp
"<?>"
	"?" sent
"<How>"
	"how" preadv
;	"how" adv itg
"<do>"
	"do" vaux inf
;	"do" vaux pres
"<you>"
	"prpers" prn subj p2 mf sp
;	"prpers" prn obj p2 mf sp
"<make>"
;	"make" n sg
	"make" vblex inf
;	"make" vblex pres
;	"make" vblex imp
"<a>"
	"a" det ind sg
"<turtle>"
	"turtle" n sg
"<fast>"
;	"fast" adv
;	"fast" adj sint
;	"fast" n sg
	"fast" vblex inf
;	"fast" vblex pres
;	"fast" vblex imp
"<?>"
	"?" sent

Language specific guidelines[edit]

For specific guidelines on how you make the decision as to which analysis or interpretation is most appropriate in a given context please see the language-specific guidelines below:

More Info[edit]

For a more detailed tutorial, see this video