Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Task ideas for Google Code-in/Manually disambiguate text

From Apertium
< Task ideas for Google Code-in(Difference between revisions)
Jump to: navigation, search
m (Fixed a typo)
(Objective)
 
Line 7: Line 7:
 
The objective of this task is to take some text of your choosing and manually disambiguate it. That is, for each ambiguous word you choose the appropriate interpretation in context.
 
The objective of this task is to take some text of your choosing and manually disambiguate it. That is, for each ambiguous word you choose the appropriate interpretation in context.
   
Note that your chosen text should be public domain so that licensing isn't an issue. Public domain text can be obtained from places such as Wikipedia and Project Gutenberg.
+
Note that your chosen text should be open-source so that copyright isn't an issue. Open-source text can be obtained from places such as Wikipedia and Project Gutenberg. Some notable open-source licenses that you may find text under are CC BY, CC BY-SA, CC BY-NC, CC BY-SA-NC, the GFDL, and any form of public domain (including CC0 and the Unlicense).
   
 
== How to do it ==
 
== How to do it ==

Latest revision as of 17:30, 11 January 2020

Contents

Words can have more than one possible interpretation, for example, "tie" in English can be a noun denoting an item of clothing "she put on her tie" or it can be a verb "they tie a knot". We call this ambiguity — the word "tie" is ambiguous between being a noun and a verb. When humans read texts they automatically choose the appropriate interpretation based on the context. Sometimes ambiguity can lead to jokes or misunderstandings when the intended meaning is less frequent. For example, "How do you make a turtle fast?" — "Take away her food." This is funny because fast can mean either "quick" or "not eat".

[edit] Objective

The objective of this task is to take some text of your choosing and manually disambiguate it. That is, for each ambiguous word you choose the appropriate interpretation in context.

Note that your chosen text should be open-source so that copyright isn't an issue. Open-source text can be obtained from places such as Wikipedia and Project Gutenberg. Some notable open-source licenses that you may find text under are CC BY, CC BY-SA, CC BY-NC, CC BY-SA-NC, the GFDL, and any form of public domain (including CC0 and the Unlicense).

[edit] How to do it

First, pass your text through your chosen language's morphological analyser. This will result in a huge jumble of random words, but don't worry! The next step is to clean that up.

Next, pass that output through cg-conv to make it tidy. Finally, you are able to start disambiguating.

The full command for processing your text is

cat texts/file.txt | apertium -d . xxx-morph | cg-conv -a > texts/formatted_file.txt

where xxx should be replaced with the corresponding language code (e.g. eng) and file.txt with your text file.

Here is a video tutorial made by a GCI student in 2018 if you're still confused

[edit] Example

Let's take the example, "How do you make a turtle fast?"... You will receive morphologically annotated text in the following format. Your task is to add a ; before each of the lines which do not contain appropriate interpretations.

Input Output
"<How>"
	"how" preadv
	"how" adv itg
"<do>"
	"do" vaux inf
	"do" vaux pres
"<you>"
	"prpers" prn subj p2 mf sp
	"prpers" prn obj p2 mf sp
"<make>"
	"make" n sg
	"make" vblex inf
	"make" vblex pres
	"make" vblex imp
"<a>"
	"a" det ind sg
"<turtle>"
	"turtle" n sg
"<fast>"
	"fast" adv
	"fast" adj sint
	"fast" n sg
	"fast" vblex inf
	"fast" vblex pres
	"fast" vblex imp
"<?>"
	"?" sent
"<How>"
	"how" preadv
;	"how" adv itg
"<do>"
	"do" vaux inf
;	"do" vaux pres
"<you>"
	"prpers" prn subj p2 mf sp
;	"prpers" prn obj p2 mf sp
"<make>"
;	"make" n sg
	"make" vblex inf
;	"make" vblex pres
;	"make" vblex imp
"<a>"
	"a" det ind sg
"<turtle>"
	"turtle" n sg
"<fast>"
;	"fast" adv
;	"fast" adj sint
;	"fast" n sg
	"fast" vblex inf
;	"fast" vblex pres
;	"fast" vblex imp
"<?>"
	"?" sent

[edit] Language specific guidelines

For specific guidelines on how you make the decision as to which analysis or interpretation is most appropriate in a given context please see the language-specific guidelines below:

[edit] More Info

For a more detailed tutorial, see this video

Personal tools