Difference between revisions of "OmegaWiki"

From Apertium
Jump to navigation Jump to search
Line 69: Line 69:
-> and uw_syntrans.syntrans_sid = uw_option_attribute_values.object_id
-> and uw_syntrans.syntrans_sid = uw_option_attribute_values.object_id
-> and uw_defined_meaning.defined_meaning_id = uw_syntrans.defined_meaning_id
-> and uw_defined_meaning.defined_meaning_id = uw_syntrans.defined_meaning_id
-> and uw_expression_ns.expression_id = uw_defined_meaning.expression_id;
-> and uw_expression_ns.expression_id = uw_syntrans.expression_id;
+----------+-----------+--------------------+-------------+
+----------+-----------+--------------------+--------------+
| value_id | object_id | defined_meaning_id | spelling |
| value_id | object_id | defined_meaning_id | spelling |
+----------+-----------+--------------------+-------------+
+----------+-----------+--------------------+--------------+
| 438988 | 438983 | 5930 | skill |
| 437913 | 437904 | 437893 | bargen |
| 437913 | 437904 | 437893 | bargain |
| 438025 | 438006 | 437948 | methdaliad |
| 438025 | 438006 | 437948 | bankruptcy |
| 438988 | 438983 | 5930 | gallu |
| 439078 | 439059 | 439017 | chassis |
| 439078 | 439059 | 439017 | siasi |
| 439079 | 439061 | 439017 | chassis |
| 439079 | 439061 | 439017 | ffrâm |
| 440330 | 440318 | 440185 | diplomat |
| 440330 | 440318 | 440185 | diplomydd |
| 442533 | 442508 | 442442 | defendant |
| 442533 | 442508 | 442442 | diffynnydd |
| 444812 | 444805 | 444787 | taxpayer |
| 444812 | 444805 | 444787 | trethdalwr |
| 444887 | 444874 | 444834 | traditional |
| 444887 | 444874 | 444834 | traddodiadol |
| 473807 | 473789 | 473754 | equilibrium |
| 473807 | 473789 | 473754 | cydbwysedd |
| 474801 | 474791 | 474762 | enterprise |
| 474801 | 474791 | 474762 | menter |
| 475455 | 475442 | 475412 | volunteer |
| 475455 | 475442 | 475412 | gwirfoddolwr |
+----------+-----------+--------------------+-------------+
+----------+-----------+--------------------+--------------+
</pre>
</pre>



Revision as of 12:10, 30 July 2007

The OmegaWiki database layout is pretty dreadful, hopefully this will make things slightly easier for anyone brave enough to look near it.

Retrieving a list of POS tags

First find the language which you would like to retrieve the POS tags for:

mysql> select * from language_names where language_name = 'Welsh';
+-------------+------------------+---------------+
| language_id | name_language_id | language_name |
+-------------+------------------+---------------+
|         153 |               85 | Welsh         | 
|         153 |               89 | Welsh         | 
+-------------+------------------+---------------+
2 rows in set (0.00 sec)

So the language_id is '153', we'll need to use this later on.

Now we need to retrieve the list of parts of speech, to do this we need 3 tables:

mysql> select option_id,attribute_id,option_mid,uw_option_attribute_options.language_id,uw_defined_meaning.expression_id,spelling 
    -> from uw_option_attribute_options,uw_defined_meaning,uw_expression_ns
    -> where attribute_id = '409106' and uw_option_attribute_options.language_id = '153' and 
    -> uw_defined_meaning.defined_meaning_id = option_mid and uw_expression_ns.expression_id =
    -> uw_defined_meaning.expression_id;
+-----------+--------------+------------+-------------+---------------+-----------+
| option_id | attribute_id | option_mid | language_id | expression_id | spelling  |
+-----------+--------------+------------+-------------+---------------+-----------+
|    435748 |       409106 |       5612 |         153 |        121924 | noun      | 
|    435751 |       409106 |       6100 |         153 |        124600 | verb      | 
|    435753 |       409106 |       6102 |         153 |        124610 | adjective | 
+-----------+--------------+------------+-------------+---------------+-----------+
3 rows in set (0.00 sec)
  • uw_expression_ns.spelling is the way the word is spelt.
  • uw_defined_meaning.defined_meaning_id is the "defined meaning" of the part of speech, e.g. it describes what a "verb" is, or an "adjective".
  • uw_option_attribute_options.attribute_id defines that this "defined meaning" is a "part of speech" option.

Retrieving a list of lemmata that match a POS tag

So, lets retrieve all Welsh nouns!

First retrive the option_id of the POS tag from the uw_option_attribute_options table:

Remember, option_mid is the defined meaning of the part of speech that you want, in this case '5612' is "noun".

mysql> select option_id,attribute_id,option_mid,language_id
     -> from uw_option_attribute_options
     -> where option_mid = '5612' and language_id = '153';
+-----------+--------------+------------+-------------+
| option_id | attribute_id | option_mid | language_id |
+-----------+--------------+------------+-------------+
|    435748 |       409106 |       5612 |         153 | 
+-----------+--------------+------------+-------------+

Now to retrieve the list of nouns. We need to take the option_id from above, and then paste it into this query!

Note: This query could take over a minute, so go to grab a cup of coffee or something!

mysql> select value_id,object_id,uw_defined_meaning.defined_meaning_id,spelling
    -> from uw_option_attribute_values,uw_syntrans,uw_defined_meaning,uw_expression_ns
    -> where uw_option_attribute_values.option_id = '435748' 
    -> and uw_syntrans.syntrans_sid = uw_option_attribute_values.object_id
    -> and uw_defined_meaning.defined_meaning_id = uw_syntrans.defined_meaning_id
    -> and uw_expression_ns.expression_id = uw_syntrans.expression_id;
+----------+-----------+--------------------+--------------+
| value_id | object_id | defined_meaning_id | spelling     |
+----------+-----------+--------------------+--------------+
|   437913 |    437904 |             437893 | bargen       | 
|   438025 |    438006 |             437948 | methdaliad   | 
|   438988 |    438983 |               5930 | gallu        | 
|   439078 |    439059 |             439017 | siasi        | 
|   439079 |    439061 |             439017 | ffrâm        | 
|   440330 |    440318 |             440185 | diplomydd    | 
|   442533 |    442508 |             442442 | diffynnydd   | 
|   444812 |    444805 |             444787 | trethdalwr   | 
|   444887 |    444874 |             444834 | traddodiadol | 
|   473807 |    473789 |             473754 | cydbwysedd   | 
|   474801 |    474791 |             474762 | menter       | 
|   475455 |    475442 |             475412 | gwirfoddolwr | 
+----------+-----------+--------------------+--------------+

This assumes that the syntrans_sid is the same as the uw_option_attribute_values.object_id which may not always be the case.