Difference between revisions of "Task ideas for Google Code-in/Scrape inflection information from Wiktionary"

From Apertium
Jump to navigation Jump to search
Line 39: Line 39:
 
</pre>
 
</pre>
   
  +
  +
See [[User:Wei2912]]'s Crawler if you want to build on some previous work.
   
 
[[Category:Tasks for Google Code-in|Scrape inflection information from Wiktionary]]
 
[[Category:Tasks for Google Code-in|Scrape inflection information from Wiktionary]]

Revision as of 12:18, 2 January 2014

Objective

This objective of this task is to convert tables of inflectional information on Wiktionary into a format useful for Apertium, e.g. speling format.

Example

For example, take the Bulgarian noun вода, the page on Wiktionary for вода has the inflection information for Bulgarian. The table looks something like:

Singular Plural
indefinite вода води
definite водата водите
vocative водо води

The equivalent in speling format would be:

вода; вода; sg.ind; n.f
вода; водата; sg.def; n.f
вода; водо; sg.voc; n.f
вода; води; pl.ind; n.f
вода; водите; pl.def; n.f
вода; води; pl.voc; n.f

Where n.f means "noun, feminine" (this information will also typically be on the Wiktionary page).


Note: for most parts of speech, the fourth column will just have the part of speech alone, and all other sub-tags in the third column, e.g. adjectives look like

vacker; vackert; abs.ind.sg.nt; adj
vacker; vackra; abs.pl; adj


See User:Wei2912's Crawler if you want to build on some previous work.