Difference between revisions of "Task ideas for Google Code-in/Categorise words from frequency list"

From Apertium
Jump to navigation Jump to search
 
(3 intermediate revisions by one other user not shown)
Line 21: Line 21:
 
|-
 
|-
 
|}
 
|}
Example codes
 
 
</div>
 
</div>
Categorise words by frequency into one of the major part-of-speech categories.
+
Categorize words by frequency into one of the major part-of-speech categories.
   
You will receive a frequency list. Work from top to bottom. At the beginning of each line you should put a letter which categorises the word form by its part-of-speech. For example <code>n</code> for noun, <code>v</code> for verb, etc.
+
You will receive a frequency list. Work from top to bottom. At the beginning of each line, you should put a letter which categorizes the word form by its part-of-speech. For example, <code>n</code> for noun, <code>v</code> for verb, etc.
   
If you cannot recognise a word then you can skip it. If a word can have more than one part-of-speech then copy the line and paste it below with the other code.
+
If you do not recognize a word, you can skip it. If a word can have more than one part of speech, then copy the line and paste it below with the other code.
  +
  +
If a word is misspelled, you can leave it or mark it with an <code>x</code>.
   
 
==Example==
 
==Example==
Line 33: Line 34:
 
Consider this example of a Belarusian frequency list. On the left is the raw list, on the right is the list after part-of-speech letters have been added.
 
Consider this example of a Belarusian frequency list. On the left is the raw list, on the right is the list after part-of-speech letters have been added.
   
<div align="center">
 
 
{|class=wikitable
 
{|class=wikitable
 
! Before !! After
 
! Before !! After
Line 72: Line 72:
 
</pre>
 
</pre>
 
|}
 
|}
</div>
 
   
 
==Useful commands==
 
==Useful commands==

Latest revision as of 22:00, 8 December 2019

Objective[edit]

Part-of-speech Code
Noun n
Verb v
Adjective a
Adverb r
Toponym (Place name) t
Male first name m
Female first name f
Last name c

Categorize words by frequency into one of the major part-of-speech categories.

You will receive a frequency list. Work from top to bottom. At the beginning of each line, you should put a letter which categorizes the word form by its part-of-speech. For example, n for noun, v for verb, etc.

If you do not recognize a word, you can skip it. If a word can have more than one part of speech, then copy the line and paste it below with the other code.

If a word is misspelled, you can leave it or mark it with an x.

Example[edit]

Consider this example of a Belarusian frequency list. On the left is the raw list, on the right is the list after part-of-speech letters have been added.

Before After
   ^4606/4606<num>$ ^былі/*былі$
   ^4493/4493<num>$ ^была/*была$
   ^4484/4484<num>$ ^Беларусі/*Беларусі$
   ^4394/4394<num>$ ^На/на<pr>$
   ^3570/3570<num>$ ^горад/*горад$
   ^3570/3570<num>$ ^але/*але$
   ^3511/3511<num>$ ^пасля/*пасля$
   ^3473/3473<num>$ ^было/*было$
   ^3381/3381<num>$ ^пры/*пры$
   ^2491/2491<num>$ ^тэрыторыі/*тэрыторыі$
   ^2470/2470<num>$ ^Расіі/*Расіі$
   ^2442/2442<num>$ ^дзе/*дзе$
   ^2409/2409<num>$ ^вайны/*вайны$
   ^2316/2316<num>$ ^цэнтр/*цэнтр$
v   ^4606/4606<num>$ ^былі/*былі$
v   ^4493/4493<num>$ ^была/*была$
t   ^4484/4484<num>$ ^Беларусі/*Беларусі$
   ^4394/4394<num>$ ^На/на<pr>$
n   ^3570/3570<num>$ ^горад/*горад$
   ^3570/3570<num>$ ^але/*але$
   ^3511/3511<num>$ ^пасля/*пасля$
v   ^3473/3473<num>$ ^было/*было$
   ^3381/3381<num>$ ^пры/*пры$
n   ^2491/2491<num>$ ^тэрыторыі/*тэрыторыі$
   ^2470/2470<num>$ ^Расіі/*Расіі$
   ^2442/2442<num>$ ^дзе/*дзе$
n   ^2409/2409<num>$ ^вайны/*вайны$
n   ^2316/2316<num>$ ^цэнтр/*цэнтр$

Useful commands[edit]

To find out how many words you have categorised for a particular part of speech:

cat <filename> | grep "^<code>" | wc -l

e.g. for nouns:

cat bel.hitparade | grep "^n" | wc -l
4