Difference between revisions of "Task ideas for Google Code-in/Categorise words from frequency list"
ScoopGracie (talk | contribs) |
|||
(6 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
==Objective== |
==Objective== |
||
− | <div |
+ | <div style="float:right"> |
+ | {|class=wikitable |
||
− | {| |
||
! Part-of-speech !! Code |
! Part-of-speech !! Code |
||
|- |
|- |
||
Line 18: | Line 18: | ||
| Female first name || <code>f</code> |
| Female first name || <code>f</code> |
||
|- |
|- |
||
− | | Last name || <code> |
+ | | Last name || <code>c</code> |
|- |
|- |
||
|} |
|} |
||
</div> |
</div> |
||
− | + | Categorize words by frequency into one of the major part-of-speech categories. |
|
− | You will receive a frequency list. Work from top to bottom. At the beginning of each line you should put a letter which |
+ | You will receive a frequency list. Work from top to bottom. At the beginning of each line, you should put a letter which categorizes the word form by its part-of-speech. For example, <code>n</code> for noun, <code>v</code> for verb, etc. |
− | If you |
+ | If you do not recognize a word, you can skip it. If a word can have more than one part of speech, then copy the line and paste it below with the other code. |
+ | |||
+ | If a word is misspelled, you can leave it or mark it with an <code>x</code>. |
||
==Example== |
==Example== |
||
Line 32: | Line 34: | ||
Consider this example of a Belarusian frequency list. On the left is the raw list, on the right is the list after part-of-speech letters have been added. |
Consider this example of a Belarusian frequency list. On the left is the raw list, on the right is the list after part-of-speech letters have been added. |
||
− | <div align="center"> |
||
{|class=wikitable |
{|class=wikitable |
||
! Before !! After |
! Before !! After |
||
Line 71: | Line 72: | ||
</pre> |
</pre> |
||
|} |
|} |
||
− | </div> |
||
==Useful commands== |
==Useful commands== |
Latest revision as of 22:00, 8 December 2019
Objective[edit]
Part-of-speech | Code |
---|---|
Noun | n
|
Verb | v
|
Adjective | a
|
Adverb | r
|
Toponym (Place name) | t
|
Male first name | m
|
Female first name | f
|
Last name | c
|
Categorize words by frequency into one of the major part-of-speech categories.
You will receive a frequency list. Work from top to bottom. At the beginning of each line, you should put a letter which categorizes the word form by its part-of-speech. For example, n
for noun, v
for verb, etc.
If you do not recognize a word, you can skip it. If a word can have more than one part of speech, then copy the line and paste it below with the other code.
If a word is misspelled, you can leave it or mark it with an x
.
Example[edit]
Consider this example of a Belarusian frequency list. On the left is the raw list, on the right is the list after part-of-speech letters have been added.
Before | After |
---|---|
^4606/4606<num>$ ^былі/*былі$ ^4493/4493<num>$ ^была/*была$ ^4484/4484<num>$ ^Беларусі/*Беларусі$ ^4394/4394<num>$ ^На/на<pr>$ ^3570/3570<num>$ ^горад/*горад$ ^3570/3570<num>$ ^але/*але$ ^3511/3511<num>$ ^пасля/*пасля$ ^3473/3473<num>$ ^было/*было$ ^3381/3381<num>$ ^пры/*пры$ ^2491/2491<num>$ ^тэрыторыі/*тэрыторыі$ ^2470/2470<num>$ ^Расіі/*Расіі$ ^2442/2442<num>$ ^дзе/*дзе$ ^2409/2409<num>$ ^вайны/*вайны$ ^2316/2316<num>$ ^цэнтр/*цэнтр$ |
v ^4606/4606<num>$ ^былі/*былі$ v ^4493/4493<num>$ ^была/*была$ t ^4484/4484<num>$ ^Беларусі/*Беларусі$ ^4394/4394<num>$ ^На/на<pr>$ n ^3570/3570<num>$ ^горад/*горад$ ^3570/3570<num>$ ^але/*але$ ^3511/3511<num>$ ^пасля/*пасля$ v ^3473/3473<num>$ ^было/*было$ ^3381/3381<num>$ ^пры/*пры$ n ^2491/2491<num>$ ^тэрыторыі/*тэрыторыі$ ^2470/2470<num>$ ^Расіі/*Расіі$ ^2442/2442<num>$ ^дзе/*дзе$ n ^2409/2409<num>$ ^вайны/*вайны$ n ^2316/2316<num>$ ^цэнтр/*цэнтр$ |
Useful commands[edit]
To find out how many words you have categorised for a particular part of speech:
cat <filename> | grep "^<code>" | wc -l
e.g. for nouns:
cat bel.hitparade | grep "^n" | wc -l 4