Difference between revisions of "Ideas for Google Summer of Code/Improving support for non-standard text input"

From Apertium
Jump to navigation Jump to search
(Created page with "Create a module that will standardise non-standard input. For example, slang, abbreviations. ==Some examples from English== * Extra space: "he he" (hehe) * Spacing and hyph...")
 
Line 1: Line 1:
  +
{{TOCD}}
  +
 
Create a module that will standardise non-standard input. For example, slang, abbreviations.
 
Create a module that will standardise non-standard input. For example, slang, abbreviations.
   
Line 9: Line 11:
 
* Non-standard capitalisation: im thinking about it
 
* Non-standard capitalisation: im thinking about it
 
* Abbreviated words: fav,
 
* Abbreviated words: fav,
  +
* Emoticons: :)
   
 
==Coding challenge==
 
==Coding challenge==
   
 
==Tasks==
 
==Tasks==
  +
  +
* Do a literature review of papers on normalisation of input.
   
 
==Frequently asked questions==
 
==Frequently asked questions==

Revision as of 16:04, 12 February 2014

Create a module that will standardise non-standard input. For example, slang, abbreviations.

Some examples from English

  • Extra space: "he he" (hehe)
  • Spacing and hyphen variation: no-one, noone, no one
  • Optional hyphen: re-integrate, reintegrate
  • Missing apostrophe: shes thinking about it
  • Non-standard capitalisation: im thinking about it
  • Abbreviated words: fav,
  • Emoticons: :)

Coding challenge

Tasks

  • Do a literature review of papers on normalisation of input.

Frequently asked questions

See also