Difference between revisions of "OmegaWiki"

From Apertium
Jump to navigation Jump to search
Line 87: Line 87:
+----------+-----------+--------------------+-------------+
+----------+-----------+--------------------+-------------+
</pre>
</pre>

This assumes that the <code>syntrans_sid</code> is the same as the <code>uw_option_attribute_values.object_id</code> which may not always be the case.

Revision as of 11:54, 30 July 2007

The OmegaWiki database layout is pretty dreadful, hopefully this will make things slightly easier for anyone brave enough to look near it.

Retrieving a list of POS tags

First find the language which you would like to retrieve the POS tags for:

mysql> select * from language_names where language_name = 'Welsh';
+-------------+------------------+---------------+
| language_id | name_language_id | language_name |
+-------------+------------------+---------------+
|         153 |               85 | Welsh         | 
|         153 |               89 | Welsh         | 
+-------------+------------------+---------------+
2 rows in set (0.00 sec)

So the language_id is '153', we'll need to use this later on.

Now we need to retrieve the list of parts of speech, to do this we need 3 tables:

mysql> select option_id,attribute_id,option_mid,uw_option_attribute_options.language_id,uw_defined_meaning.expression_id,spelling 
    -> from uw_option_attribute_options,uw_defined_meaning,uw_expression_ns
    -> where attribute_id = '409106' and uw_option_attribute_options.language_id = '153' and 
    -> uw_defined_meaning.defined_meaning_id = option_mid and uw_expression_ns.expression_id =
    -> uw_defined_meaning.expression_id;
+-----------+--------------+------------+-------------+---------------+-----------+
| option_id | attribute_id | option_mid | language_id | expression_id | spelling  |
+-----------+--------------+------------+-------------+---------------+-----------+
|    435748 |       409106 |       5612 |         153 |        121924 | noun      | 
|    435751 |       409106 |       6100 |         153 |        124600 | verb      | 
|    435753 |       409106 |       6102 |         153 |        124610 | adjective | 
+-----------+--------------+------------+-------------+---------------+-----------+
3 rows in set (0.00 sec)
  • uw_expression_ns.spelling is the way the word is spelt.
  • uw_defined_meaning.defined_meaning_id is the "defined meaning" of the part of speech, e.g. it describes what a "verb" is, or an "adjective".
  • uw_option_attribute_options.attribute_id defines that this "defined meaning" is a "part of speech" option.

Retrieving a list of lemmata that match a POS tag

So, lets retrieve all Welsh nouns!

First retrive the option_id of the POS tag from the uw_option_attribute_options table:

Remember, option_mid is the defined meaning of the part of speech that you want, in this case '5612' is "noun".

mysql> select option_id,attribute_id,option_mid,language_id
     -> from uw_option_attribute_options
     -> where option_mid = '5612' and language_id = '153';
+-----------+--------------+------------+-------------+
| option_id | attribute_id | option_mid | language_id |
+-----------+--------------+------------+-------------+
|    435748 |       409106 |       5612 |         153 | 
+-----------+--------------+------------+-------------+

Now to retrieve the list of nouns. We need to take the option_id from above, and then paste it into this query!

Note: This query could take over a minute, so go to grab a cup of coffee or something!

mysql> select value_id,object_id,uw_defined_meaning.defined_meaning_id,spelling
    -> from uw_option_attribute_values,uw_syntrans,uw_defined_meaning,uw_expression_ns
    -> where uw_option_attribute_values.option_id = '435748' 
    -> and uw_syntrans.syntrans_sid = uw_option_attribute_values.object_id
    -> and uw_defined_meaning.defined_meaning_id = uw_syntrans.defined_meaning_id
    -> and uw_expression_ns.expression_id = uw_defined_meaning.expression_id;
+----------+-----------+--------------------+-------------+
| value_id | object_id | defined_meaning_id | spelling    |
+----------+-----------+--------------------+-------------+
|   438988 |    438983 |               5930 | skill       | 
|   437913 |    437904 |             437893 | bargain     | 
|   438025 |    438006 |             437948 | bankruptcy  | 
|   439078 |    439059 |             439017 | chassis     | 
|   439079 |    439061 |             439017 | chassis     | 
|   440330 |    440318 |             440185 | diplomat    | 
|   442533 |    442508 |             442442 | defendant   | 
|   444812 |    444805 |             444787 | taxpayer    | 
|   444887 |    444874 |             444834 | traditional | 
|   473807 |    473789 |             473754 | equilibrium | 
|   474801 |    474791 |             474762 | enterprise  | 
|   475455 |    475442 |             475412 | volunteer   | 
+----------+-----------+--------------------+-------------+

This assumes that the syntrans_sid is the same as the uw_option_attribute_values.object_id which may not always be the case.