Difference between revisions of "Task ideas for Google Code-in/Scrape inflection information from Wiktionary"
Jump to navigation
Jump to search
(Created page with " ==Objective== This objective of this task is to convert tables of inflectional information on [http://www.wiktionary.org Wiktionary] into a format useful for Apertium, e.g. ...") |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
This objective of this task is to convert tables of inflectional information on [http://www.wiktionary.org Wiktionary] into a format useful for Apertium, e.g. [[speling format]]. |
This objective of this task is to convert tables of inflectional information on [http://www.wiktionary.org Wiktionary] into a format useful for Apertium, e.g. [[speling format]]. |
||
+ | |||
+ | |||
==Example== |
==Example== |
||
Line 32: | Line 34: | ||
Where <code>n.f</code> means "noun, feminine" (this information will also typically be on the Wiktionary page). |
Where <code>n.f</code> means "noun, feminine" (this information will also typically be on the Wiktionary page). |
||
+ | |||
+ | '''Note''': for most parts of speech, the fourth column will just have the part of speech alone, and all other sub-tags in the third column, e.g. adjectives look like |
||
+ | <pre> |
||
+ | vacker; vackert; abs.ind.sg.nt; adj |
||
+ | vacker; vackra; abs.pl; adj |
||
+ | </pre> |
||
+ | |||
+ | |||
+ | == Resources == |
||
+ | * [[User:Wei2912]]'s Crawler |
||
+ | * https://github.com/wswu/yawipa / https://www.cs.jhu.edu/~winston/yawipa-data.html – seems like a very complete project, with many different kinds of tables |
||
[[Category:Tasks for Google Code-in|Scrape inflection information from Wiktionary]] |
[[Category:Tasks for Google Code-in|Scrape inflection information from Wiktionary]] |
Latest revision as of 12:10, 26 May 2023
Objective[edit]
This objective of this task is to convert tables of inflectional information on Wiktionary into a format useful for Apertium, e.g. speling format.
Example[edit]
For example, take the Bulgarian noun вода, the page on Wiktionary for вода has the inflection information for Bulgarian. The table looks something like:
Singular | Plural | |
---|---|---|
indefinite | вода | води |
definite | водата | водите |
vocative | водо | води |
The equivalent in speling format would be:
вода; вода; sg.ind; n.f вода; водата; sg.def; n.f вода; водо; sg.voc; n.f вода; води; pl.ind; n.f вода; водите; pl.def; n.f вода; води; pl.voc; n.f
Where n.f
means "noun, feminine" (this information will also typically be on the Wiktionary page).
Note: for most parts of speech, the fourth column will just have the part of speech alone, and all other sub-tags in the third column, e.g. adjectives look like
vacker; vackert; abs.ind.sg.nt; adj vacker; vackra; abs.pl; adj
Resources[edit]
- User:Wei2912's Crawler
- https://github.com/wswu/yawipa / https://www.cs.jhu.edu/~winston/yawipa-data.html – seems like a very complete project, with many different kinds of tables