Difference between revisions of "OmegaWiki"
Line 87: | Line 87: | ||
+----------+-----------+--------------------+-------------+ |
+----------+-----------+--------------------+-------------+ |
||
</pre> |
</pre> |
||
This assumes that the <code>syntrans_sid</code> is the same as the <code>uw_option_attribute_values.object_id</code> which may not always be the case. |
Revision as of 11:54, 30 July 2007
The OmegaWiki database layout is pretty dreadful, hopefully this will make things slightly easier for anyone brave enough to look near it.
Retrieving a list of POS tags
First find the language which you would like to retrieve the POS tags for:
mysql> select * from language_names where language_name = 'Welsh'; +-------------+------------------+---------------+ | language_id | name_language_id | language_name | +-------------+------------------+---------------+ | 153 | 85 | Welsh | | 153 | 89 | Welsh | +-------------+------------------+---------------+ 2 rows in set (0.00 sec)
So the language_id
is '153', we'll need to use this later on.
Now we need to retrieve the list of parts of speech, to do this we need 3 tables:
mysql> select option_id,attribute_id,option_mid,uw_option_attribute_options.language_id,uw_defined_meaning.expression_id,spelling -> from uw_option_attribute_options,uw_defined_meaning,uw_expression_ns -> where attribute_id = '409106' and uw_option_attribute_options.language_id = '153' and -> uw_defined_meaning.defined_meaning_id = option_mid and uw_expression_ns.expression_id = -> uw_defined_meaning.expression_id; +-----------+--------------+------------+-------------+---------------+-----------+ | option_id | attribute_id | option_mid | language_id | expression_id | spelling | +-----------+--------------+------------+-------------+---------------+-----------+ | 435748 | 409106 | 5612 | 153 | 121924 | noun | | 435751 | 409106 | 6100 | 153 | 124600 | verb | | 435753 | 409106 | 6102 | 153 | 124610 | adjective | +-----------+--------------+------------+-------------+---------------+-----------+ 3 rows in set (0.00 sec)
uw_expression_ns.spelling
is the way the word is spelt.uw_defined_meaning.defined_meaning_id
is the "defined meaning" of the part of speech, e.g. it describes what a "verb" is, or an "adjective".uw_option_attribute_options.attribute_id
defines that this "defined meaning" is a "part of speech" option.
Retrieving a list of lemmata that match a POS tag
So, lets retrieve all Welsh nouns!
First retrive the option_id
of the POS tag from the uw_option_attribute_options
table:
Remember, option_mid
is the defined meaning of the part of speech that you want, in this case '5612' is "noun".
mysql> select option_id,attribute_id,option_mid,language_id -> from uw_option_attribute_options -> where option_mid = '5612' and language_id = '153'; +-----------+--------------+------------+-------------+ | option_id | attribute_id | option_mid | language_id | +-----------+--------------+------------+-------------+ | 435748 | 409106 | 5612 | 153 | +-----------+--------------+------------+-------------+
Now to retrieve the list of nouns. We need to take the option_id
from above, and then paste it into this query!
Note: This query could take over a minute, so go to grab a cup of coffee or something!
mysql> select value_id,object_id,uw_defined_meaning.defined_meaning_id,spelling -> from uw_option_attribute_values,uw_syntrans,uw_defined_meaning,uw_expression_ns -> where uw_option_attribute_values.option_id = '435748' -> and uw_syntrans.syntrans_sid = uw_option_attribute_values.object_id -> and uw_defined_meaning.defined_meaning_id = uw_syntrans.defined_meaning_id -> and uw_expression_ns.expression_id = uw_defined_meaning.expression_id; +----------+-----------+--------------------+-------------+ | value_id | object_id | defined_meaning_id | spelling | +----------+-----------+--------------------+-------------+ | 438988 | 438983 | 5930 | skill | | 437913 | 437904 | 437893 | bargain | | 438025 | 438006 | 437948 | bankruptcy | | 439078 | 439059 | 439017 | chassis | | 439079 | 439061 | 439017 | chassis | | 440330 | 440318 | 440185 | diplomat | | 442533 | 442508 | 442442 | defendant | | 444812 | 444805 | 444787 | taxpayer | | 444887 | 444874 | 444834 | traditional | | 473807 | 473789 | 473754 | equilibrium | | 474801 | 474791 | 474762 | enterprise | | 475455 | 475442 | 475412 | volunteer | +----------+-----------+--------------------+-------------+
This assumes that the syntrans_sid
is the same as the uw_option_attribute_values.object_id
which may not always be the case.