Difference between revisions of "OmegaWiki"

From Apertium
Jump to navigation Jump to search
(New page: The OmegaWiki database layout is pretty dreadful, hopefully this will make things slightly easier for anyone brave enough to look near it. ==Retrieving a list of POS tags== First find th...)
 
(+link to database documentation)
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The OmegaWiki database layout is pretty dreadful, hopefully this will make things slightly easier for anyone brave enough to look near it.
The OmegaWiki database layout is pretty dreadful, hopefully this will make things slightly easier for anyone brave enough to look near it. You might want to read [http://www.omegawiki.org/Help:OmegaWiki_database_layout the documentation for the OmegaWiki database] and [http://www.omegawiki.org/DefinedMeaning this description of "DefinedMeaning"].


==Retrieving a list of POS tags==
==Retrieving a list of POS tags==
Line 21: Line 21:


<pre>
<pre>
mysql> select option_id,attribute_id,option_mid,uw_option_attribute_options.language_id,uw_defined_meaning.expression_id,spelling
mysql> select option_id,attribute_id,option_mid,uw_option_attribute_options.language_id,uw_defined_meaning.expression_id,spelling
-> from uw_option_attribute_options,uw_defined_meaning,uw_expression_ns
-> from uw_option_attribute_options,uw_defined_meaning,uw_expression_ns
-> where attribute_id = '409106' and uw_option_attribute_options.language_id = '153' and uw_defined_meaning.defined_meaning_id = option_mid and uw_expression_ns.expression_id = uw_defined_meaning.expression_id;
-> where attribute_id = '409106' and uw_option_attribute_options.language_id = '153' and
-> uw_defined_meaning.defined_meaning_id = option_mid and uw_expression_ns.expression_id =
-> uw_defined_meaning.expression_id;
+-----------+--------------+------------+-------------+---------------+-----------+
+-----------+--------------+------------+-------------+---------------+-----------+
| option_id | attribute_id | option_mid | language_id | expression_id | spelling |
| option_id | attribute_id | option_mid | language_id | expression_id | spelling |
Line 37: Line 39:
* <code>uw_defined_meaning.defined_meaning_id</code> is the "defined meaning" of the part of speech, e.g. it describes what a "verb" is, or an "adjective".
* <code>uw_defined_meaning.defined_meaning_id</code> is the "defined meaning" of the part of speech, e.g. it describes what a "verb" is, or an "adjective".
* <code>uw_option_attribute_options.attribute_id</code> defines that this "defined meaning" is a "part of speech" option.
* <code>uw_option_attribute_options.attribute_id</code> defines that this "defined meaning" is a "part of speech" option.

==Retrieving a list of lemmata that match a POS tag==

So, lets retrieve all Welsh nouns!

First retrieve the <code>option_id</code> of the POS tag from the <code>uw_option_attribute_options</code> table:

Remember, <code>option_mid</code> is the defined meaning of the part of speech that you want, in this case '5612' is "noun".

<pre>
mysql> select option_id,attribute_id,option_mid,language_id
-> from uw_option_attribute_options
-> where option_mid = '5612' and language_id = '153';
+-----------+--------------+------------+-------------+
| option_id | attribute_id | option_mid | language_id |
+-----------+--------------+------------+-------------+
| 435748 | 409106 | 5612 | 153 |
+-----------+--------------+------------+-------------+
</pre>

Now to retrieve the list of nouns. We need to take the <code>option_id</code> from above, and then paste it into this query!

''Note: This query could take over a minute, so go to grab a cup of coffee or something!''

<pre>
mysql> select value_id,object_id,uw_defined_meaning.defined_meaning_id,spelling
-> from uw_option_attribute_values,uw_syntrans,uw_defined_meaning,uw_expression_ns
-> where uw_option_attribute_values.option_id = '435748'
-> and uw_syntrans.syntrans_sid = uw_option_attribute_values.object_id
-> and uw_defined_meaning.defined_meaning_id = uw_syntrans.defined_meaning_id
-> and uw_expression_ns.expression_id = uw_syntrans.expression_id;
+----------+-----------+--------------------+--------------+
| value_id | object_id | defined_meaning_id | spelling |
+----------+-----------+--------------------+--------------+
| 437913 | 437904 | 437893 | bargen |
| 438025 | 438006 | 437948 | methdaliad |
| 438988 | 438983 | 5930 | gallu |
| 439078 | 439059 | 439017 | siasi |
| 439079 | 439061 | 439017 | ffrâm |
| 440330 | 440318 | 440185 | diplomydd |
| 442533 | 442508 | 442442 | diffynnydd |
| 444812 | 444805 | 444787 | trethdalwr |
| 444887 | 444874 | 444834 | traddodiadol |
| 473807 | 473789 | 473754 | cydbwysedd |
| 474801 | 474791 | 474762 | menter |
| 475455 | 475442 | 475412 | gwirfoddolwr |
+----------+-----------+--------------------+--------------+
</pre>

This assumes that the <code>syntrans_sid</code> is the same as the <code>uw_option_attribute_values.object_id</code> which may not always be the case.

==Retrieve a list of translations==

You need to know the <code>defined_meaning_id</code> and the <code>language_id</code> of the language that you would like the <code>language_name</code> shown in.

<pre>
mysql> select syntrans_sid,defined_meaning_id,identical_meaning,language_name,spelling
-> from uw_syntrans,uw_expression_ns,language_names
-> where defined_meaning_id = '437893'
-> and uw_expression_ns.expression_id = uw_syntrans.expression_id
-> and language_names.language_id = uw_expression_ns.language_id
-> and language_names.name_language_id = '85';
+--------------+--------------------+-------------------+---------------+-----------+
| syntrans_sid | defined_meaning_id | identical_meaning | language_name | spelling |
+--------------+--------------------+-------------------+---------------+-----------+
| 437894 | 437893 | 1 | English | bargain |
| 437897 | 437893 | 1 | Dutch | koopje |
| 437898 | 437893 | 1 | French | occasion |
| 437900 | 437893 | 1 | Japanese | お買得 |
| 437902 | 437893 | 1 | Castilian | ganga |
| 437904 | 437893 | 1 | Welsh | bargen |
| 438194 | 437893 | 1 | French | affaire |
+--------------+--------------------+-------------------+---------------+-----------+
</pre>

The <code>identical_meaning</code> flag means that the meaning of the words is thought to be identical in each language.

You can reduce this to your two languages that you want with a further constraint (note, 85 = English, 153 = Welsh):

<pre>

mysql> select syntrans_sid,defined_meaning_id,identical_meaning,language_name,spelling
-> from uw_syntrans,uw_expression_ns,language_names
-> where defined_meaning_id = '437893'
-> and uw_expression_ns.expression_id = uw_syntrans.expression_id
-> and language_names.language_id = uw_expression_ns.language_id
-> and language_names.name_language_id = '85'
-> and (uw_expression_ns.language_id = '85' or uw_expression_ns.language_id = '153');
+-------------------+---------------+----------+
| identical_meaning | language_name | spelling |
+-------------------+---------------+----------+
| 1 | English | bargain |
| 1 | Welsh | bargen |
+-------------------+---------------+----------+
</pre>

Latest revision as of 11:49, 14 July 2011

The OmegaWiki database layout is pretty dreadful, hopefully this will make things slightly easier for anyone brave enough to look near it. You might want to read the documentation for the OmegaWiki database and this description of "DefinedMeaning".

Retrieving a list of POS tags[edit]

First find the language which you would like to retrieve the POS tags for:

mysql> select * from language_names where language_name = 'Welsh';
+-------------+------------------+---------------+
| language_id | name_language_id | language_name |
+-------------+------------------+---------------+
|         153 |               85 | Welsh         | 
|         153 |               89 | Welsh         | 
+-------------+------------------+---------------+
2 rows in set (0.00 sec)

So the language_id is '153', we'll need to use this later on.

Now we need to retrieve the list of parts of speech, to do this we need 3 tables:

mysql> select option_id,attribute_id,option_mid,uw_option_attribute_options.language_id,uw_defined_meaning.expression_id,spelling 
    -> from uw_option_attribute_options,uw_defined_meaning,uw_expression_ns
    -> where attribute_id = '409106' and uw_option_attribute_options.language_id = '153' and 
    -> uw_defined_meaning.defined_meaning_id = option_mid and uw_expression_ns.expression_id =
    -> uw_defined_meaning.expression_id;
+-----------+--------------+------------+-------------+---------------+-----------+
| option_id | attribute_id | option_mid | language_id | expression_id | spelling  |
+-----------+--------------+------------+-------------+---------------+-----------+
|    435748 |       409106 |       5612 |         153 |        121924 | noun      | 
|    435751 |       409106 |       6100 |         153 |        124600 | verb      | 
|    435753 |       409106 |       6102 |         153 |        124610 | adjective | 
+-----------+--------------+------------+-------------+---------------+-----------+
3 rows in set (0.00 sec)
  • uw_expression_ns.spelling is the way the word is spelt.
  • uw_defined_meaning.defined_meaning_id is the "defined meaning" of the part of speech, e.g. it describes what a "verb" is, or an "adjective".
  • uw_option_attribute_options.attribute_id defines that this "defined meaning" is a "part of speech" option.

Retrieving a list of lemmata that match a POS tag[edit]

So, lets retrieve all Welsh nouns!

First retrieve the option_id of the POS tag from the uw_option_attribute_options table:

Remember, option_mid is the defined meaning of the part of speech that you want, in this case '5612' is "noun".

mysql> select option_id,attribute_id,option_mid,language_id
     -> from uw_option_attribute_options
     -> where option_mid = '5612' and language_id = '153';
+-----------+--------------+------------+-------------+
| option_id | attribute_id | option_mid | language_id |
+-----------+--------------+------------+-------------+
|    435748 |       409106 |       5612 |         153 | 
+-----------+--------------+------------+-------------+

Now to retrieve the list of nouns. We need to take the option_id from above, and then paste it into this query!

Note: This query could take over a minute, so go to grab a cup of coffee or something!

mysql> select value_id,object_id,uw_defined_meaning.defined_meaning_id,spelling
    -> from uw_option_attribute_values,uw_syntrans,uw_defined_meaning,uw_expression_ns
    -> where uw_option_attribute_values.option_id = '435748' 
    -> and uw_syntrans.syntrans_sid = uw_option_attribute_values.object_id
    -> and uw_defined_meaning.defined_meaning_id = uw_syntrans.defined_meaning_id
    -> and uw_expression_ns.expression_id = uw_syntrans.expression_id;
+----------+-----------+--------------------+--------------+
| value_id | object_id | defined_meaning_id | spelling     |
+----------+-----------+--------------------+--------------+
|   437913 |    437904 |             437893 | bargen       | 
|   438025 |    438006 |             437948 | methdaliad   | 
|   438988 |    438983 |               5930 | gallu        | 
|   439078 |    439059 |             439017 | siasi        | 
|   439079 |    439061 |             439017 | ffrâm        | 
|   440330 |    440318 |             440185 | diplomydd    | 
|   442533 |    442508 |             442442 | diffynnydd   | 
|   444812 |    444805 |             444787 | trethdalwr   | 
|   444887 |    444874 |             444834 | traddodiadol | 
|   473807 |    473789 |             473754 | cydbwysedd   | 
|   474801 |    474791 |             474762 | menter       | 
|   475455 |    475442 |             475412 | gwirfoddolwr | 
+----------+-----------+--------------------+--------------+

This assumes that the syntrans_sid is the same as the uw_option_attribute_values.object_id which may not always be the case.

Retrieve a list of translations[edit]

You need to know the defined_meaning_id and the language_id of the language that you would like the language_name shown in.

mysql> select syntrans_sid,defined_meaning_id,identical_meaning,language_name,spelling 
    -> from uw_syntrans,uw_expression_ns,language_names 
    -> where defined_meaning_id = '437893' 
    -> and uw_expression_ns.expression_id = uw_syntrans.expression_id 
    -> and language_names.language_id = uw_expression_ns.language_id 
    -> and language_names.name_language_id = '85';
+--------------+--------------------+-------------------+---------------+-----------+
| syntrans_sid | defined_meaning_id | identical_meaning | language_name | spelling  |
+--------------+--------------------+-------------------+---------------+-----------+
|       437894 |             437893 |                 1 | English       | bargain   | 
|       437897 |             437893 |                 1 | Dutch         | koopje    | 
|       437898 |             437893 |                 1 | French        | occasion  | 
|       437900 |             437893 |                 1 | Japanese      | お買得     | 
|       437902 |             437893 |                 1 | Castilian     | ganga     | 
|       437904 |             437893 |                 1 | Welsh         | bargen    | 
|       438194 |             437893 |                 1 | French        | affaire   | 
+--------------+--------------------+-------------------+---------------+-----------+

The identical_meaning flag means that the meaning of the words is thought to be identical in each language.

You can reduce this to your two languages that you want with a further constraint (note, 85 = English, 153 = Welsh):


mysql> select syntrans_sid,defined_meaning_id,identical_meaning,language_name,spelling 
    -> from uw_syntrans,uw_expression_ns,language_names 
    -> where defined_meaning_id = '437893' 
    -> and uw_expression_ns.expression_id = uw_syntrans.expression_id 
    -> and language_names.language_id = uw_expression_ns.language_id 
    -> and language_names.name_language_id = '85'
    -> and (uw_expression_ns.language_id = '85' or uw_expression_ns.language_id = '153');
+-------------------+---------------+----------+
| identical_meaning | language_name | spelling |
+-------------------+---------------+----------+
|                 1 | English       | bargain  | 
|                 1 | Welsh         | bargen   | 
+-------------------+---------------+----------+