Task ideas for Google Code-in/Manually disambiguate text

From Apertium
< Task ideas for Google Code-in
Revision as of 11:18, 11 December 2019 by Andiqu (talk | contribs) (Fixed a typo)
Jump to navigation Jump to search

Words can have more than one possible interpretation, for example, "tie" in English can be a noun denoting an item of clothing "she put on her tie" or it can be a verb "they tie a knot". We call this ambiguity — the word "tie" is ambiguous between being a noun and a verb. When humans read texts they automatically choose the appropriate interpretation based on the context. Sometimes ambiguity can lead to jokes or misunderstandings when the intended meaning is less frequent. For example, "How do you make a turtle fast?" — "Take away her food." This is funny because fast can mean either "quick" or "not eat".

Objective

The objective of this task is to take some text of your choosing and manually disambiguate it. That is, for each ambiguous word you choose the appropriate interpretation in context.

Note that your chosen text should be public domain so that licensing isn't an issue. Public domain text can be obtained from places such as Wikipedia and Project Gutenberg.

How to do it

First, pass your text through your chosen language's morphological analyser. This will result in a huge jumble of random words, but don't worry! The next step is to clean that up.

Next, pass that output through cg-conv to make it tidy. Finally, you are able to start disambiguating.

The full command for processing your text is

cat texts/file.txt | apertium -d . xxx-morph | cg-conv -a > texts/formatted_file.txt

where xxx should be replaced with the corresponding language code (e.g. eng) and file.txt with your text file.

Here is a video tutorial made by a GCI student in 2018 if you're still confused

Example

Let's take the example, "How do you make a turtle fast?"... You will receive morphologically annotated text in the following format. Your task is to add a ; before each of the lines which do not contain appropriate interpretations.

Input Output
"<How>"
	"how" preadv
	"how" adv itg
"<do>"
	"do" vaux inf
	"do" vaux pres
"<you>"
	"prpers" prn subj p2 mf sp
	"prpers" prn obj p2 mf sp
"<make>"
	"make" n sg
	"make" vblex inf
	"make" vblex pres
	"make" vblex imp
"<a>"
	"a" det ind sg
"<turtle>"
	"turtle" n sg
"<fast>"
	"fast" adv
	"fast" adj sint
	"fast" n sg
	"fast" vblex inf
	"fast" vblex pres
	"fast" vblex imp
"<?>"
	"?" sent
"<How>"
	"how" preadv
;	"how" adv itg
"<do>"
	"do" vaux inf
;	"do" vaux pres
"<you>"
	"prpers" prn subj p2 mf sp
;	"prpers" prn obj p2 mf sp
"<make>"
;	"make" n sg
	"make" vblex inf
;	"make" vblex pres
;	"make" vblex imp
"<a>"
	"a" det ind sg
"<turtle>"
	"turtle" n sg
"<fast>"
;	"fast" adv
;	"fast" adj sint
;	"fast" n sg
	"fast" vblex inf
;	"fast" vblex pres
;	"fast" vblex imp
"<?>"
	"?" sent

Language specific guidelines

For specific guidelines on how you make the decision as to which analysis or interpretation is most appropriate in a given context please see the language-specific guidelines below:

More Info

For a more detailed tutorial, see this video