Difference between revisions of "Task ideas for Google Code-in/Categorise words from frequency list"
Jump to navigation
Jump to search
Line 51: | Line 51: | ||
</pre> |
</pre> |
||
|} |
|} |
||
− | </ |
+ | </div> |
==Useful commands== |
==Useful commands== |
Revision as of 00:39, 1 November 2013
Objective
Categorise words by frequency into one of the major part-of-speech categories.
You will receive a frequency list. Work from top to bottom. At the beginning of each line you should put a letter which categorises the word form by its part-of-speech. For example n
for noun, v
for verb, etc.
If you cannot recognise a word then you can skip it. If a word can have more than one part-of-speech then copy the line and paste it below with the other code.
Example
Consider this example of a Belarusian frequency list. On the left is the raw list, on the right is the list after part-of-speech letters have been added.
Before | After |
---|---|
^4606/4606<num>$ ^былі/*былі$ ^4493/4493<num>$ ^была/*была$ ^4484/4484<num>$ ^Беларусі/*Беларусі$ ^4394/4394<num>$ ^На/на<pr>$ ^3570/3570<num>$ ^горад/*горад$ ^3570/3570<num>$ ^але/*але$ ^3511/3511<num>$ ^пасля/*пасля$ ^3473/3473<num>$ ^было/*было$ ^3381/3381<num>$ ^пры/*пры$ ^2491/2491<num>$ ^тэрыторыі/*тэрыторыі$ ^2470/2470<num>$ ^Расіі/*Расіі$ ^2442/2442<num>$ ^дзе/*дзе$ ^2409/2409<num>$ ^вайны/*вайны$ ^2316/2316<num>$ ^цэнтр/*цэнтр$ |
v ^4606/4606<num>$ ^былі/*былі$ v ^4493/4493<num>$ ^была/*была$ t ^4484/4484<num>$ ^Беларусі/*Беларусі$ ^4394/4394<num>$ ^На/на<pr>$ n ^3570/3570<num>$ ^горад/*горад$ ^3570/3570<num>$ ^але/*але$ ^3511/3511<num>$ ^пасля/*пасля$ v ^3473/3473<num>$ ^было/*было$ ^3381/3381<num>$ ^пры/*пры$ n ^2491/2491<num>$ ^тэрыторыі/*тэрыторыі$ ^2470/2470<num>$ ^Расіі/*Расіі$ ^2442/2442<num>$ ^дзе/*дзе$ n ^2409/2409<num>$ ^вайны/*вайны$ n ^2316/2316<num>$ ^цэнтр/*цэнтр$ |
Useful commands
To find out how many words you have categorised for a particular part of speech:
cat <filename> | grep "^<code>" | wc -l
e.g. for nouns:
cat bel.hitparade | grep "^n" | wc -l 4