Difference between revisions of "Wikidata"
(Created page with "Here's an example query to get proper name translations for countries in Nynorsk/Bokmål/Danish from Wikidata: <pre> PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX wd: <...") |
|||
(7 intermediate revisions by the same user not shown) | |||
Line 27: | Line 27: | ||
Mouse-over things like "wd:Q6256" to show what they refer to, or look them up at urls like https://www.wikidata.org/wiki/Q6256 or https://www.wikidata.org/wiki/Property:P279 |
Mouse-over things like "wd:Q6256" to show what they refer to, or look them up at urls like https://www.wikidata.org/wiki/Q6256 or https://www.wikidata.org/wiki/Property:P279 |
||
To get lots of hits, click the "🔗Link▼" button and right-click and copy the link to " |
To get lots of hits, click the "🔗Link▼" button and right-click and copy the link to "REST Endpoint"; you can curl this with an increased LIMIT into a big file, e.g. |
||
https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=PREFIX+wikibase%3A+%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0D%0APREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0D%0APREFIX+wdt%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+p%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2F%3E%0D%0APREFIX+v%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fstatement%2F%3E%0D%0ASELECT+*+WHERE+%7B%0D%0A+%3Fp+wdt%3AP31%2Fwdt%3AP279+wd%3AQ6256+.%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22nn%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FnnName+.%0D%0A++%7D%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22no%2Cnb%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FnbName+.%0D%0A++%7D%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22da%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FdaName+.%0D%0A++%7D%0D%0A+%7D+LIMIT+ |
<pre>curl -o result.xml 'https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=PREFIX+wikibase%3A+%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0D%0APREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0D%0APREFIX+wdt%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+p%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2F%3E%0D%0APREFIX+v%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fstatement%2F%3E%0D%0ASELECT+*+WHERE+%7B%0D%0A+%3Fp+wdt%3AP31%2Fwdt%3AP279+wd%3AQ6256+.%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22nn%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FnnName+.%0D%0A++%7D%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22no%2Cnb%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FnbName+.%0D%0A++%7D%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22da%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FdaName+.%0D%0A++%7D%0D%0A+%7D+LIMIT+1000%0D%0A' |
||
</pre> |
|||
==Only where names differ== |
|||
This might be a more interesting list: |
|||
<pre> |
|||
PREFIX wikibase: <http://wikiba.se/ontology#> |
|||
PREFIX wd: <http://www.wikidata.org/entity/> |
|||
PREFIX wdt: <http://www.wikidata.org/prop/direct/> |
|||
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> |
|||
PREFIX p: <http://www.wikidata.org/prop/> |
|||
PREFIX v: <http://www.wikidata.org/prop/statement/> |
|||
SELECT * WHERE { |
|||
?p wdt:P31/wdt:P279 wd:Q6256 . |
|||
SERVICE wikibase:label { |
|||
bd:serviceParam wikibase:language "nn" . |
|||
?p rdfs:label ?nnName . |
|||
} |
|||
SERVICE wikibase:label { |
|||
bd:serviceParam wikibase:language "da" . |
|||
?p rdfs:label ?daName . |
|||
} |
|||
FILTER (!(?nnName = ?daName)) |
|||
} LIMIT 10 |
|||
</pre> |
|||
==Getting labels from dumps== |
|||
See https://www.wikidata.org/wiki/Wikidata:Database_download for the dumps; downloads are at https://dumps.wikimedia.org/wikidatawiki/entities/ – then you can run: |
|||
<pre> |
|||
$ bzcat wikidata-20160229-all.json.bz2 \ |
|||
| grep '^{' |sed 's/,$//' \ |
|||
| jq -c '{ "da":.labels.da.value, "sv":.labels.sv.value, "nn":.labels.nn.value }' |
|||
</pre> |
|||
(the grep+sed is necessary so jq won't try to fit the whole array in memory) |
|||
You'll get some silliness like <code>{"da":"CSS","sv":"Cascading Style Sheets","nn":"Stilark"}</code> but there's probably some gold in there as well. |
|||
A simple way to get only toponyms is to check that the entry "claims" a [https://www.wikidata.org/wiki/Property:P625 coordinate location], ie. |
|||
<pre> |
|||
$ bzcat wikidata-20160229-all.json.bz2 \ |
|||
| grep '^{' |sed 's/,$//' \ |
|||
| jq -c 'if .claims.P625 then { "da":.labels.da.value, "sv":.labels.sv.value, "nn":.labels.nn.value } else null end' |
|||
</pre> |
|||
(You can't just grep for '"datatype":"globe-coordinate"' or P625 or whatever, since it has to be the ''top-level'' entry which has that property. If you just do a simple grep, you'll also get properties-of-properties, e.g. [https://www.wikidata.org/wiki/Q909 Borges] was buried at a coordinate location.) |
|||
More info at https://meta.wikimedia.org/wiki/Grants:Learning_patterns/Using_Wikidata_to_make_Machine_Translation_dictionary_entries |
|||
==See also== |
|||
* [[Building_dictionaries#Generating_bilingual_dictionary_entries]] |
|||
you can curl this with an increased LIMIT into a big file :-) |
|||
[[Category:Writing dictionaries]] |
[[Category:Writing dictionaries]] |
||
[[Category:Documentation in English]] |
Latest revision as of 09:41, 21 June 2016
Here's an example query to get proper name translations for countries in Nynorsk/Bokmål/Danish from Wikidata:
PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX p: <http://www.wikidata.org/prop/> PREFIX v: <http://www.wikidata.org/prop/statement/> SELECT * WHERE { ?p wdt:P31/wdt:P279 wd:Q6256 . SERVICE wikibase:label { bd:serviceParam wikibase:language "nn" . ?p rdfs:label ?nnName . } SERVICE wikibase:label { bd:serviceParam wikibase:language "no,nb" . ?p rdfs:label ?nbName . } SERVICE wikibase:label { bd:serviceParam wikibase:language "da" . ?p rdfs:label ?daName . } } LIMIT 10
You can paste that into https://query.wikidata.org/ to get the first 10 hits.
Mouse-over things like "wd:Q6256" to show what they refer to, or look them up at urls like https://www.wikidata.org/wiki/Q6256 or https://www.wikidata.org/wiki/Property:P279
To get lots of hits, click the "🔗Link▼" button and right-click and copy the link to "REST Endpoint"; you can curl this with an increased LIMIT into a big file, e.g.
curl -o result.xml 'https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=PREFIX+wikibase%3A+%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0D%0APREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0D%0APREFIX+wdt%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+p%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2F%3E%0D%0APREFIX+v%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fstatement%2F%3E%0D%0ASELECT+*+WHERE+%7B%0D%0A+%3Fp+wdt%3AP31%2Fwdt%3AP279+wd%3AQ6256+.%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22nn%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FnnName+.%0D%0A++%7D%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22no%2Cnb%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FnbName+.%0D%0A++%7D%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22da%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FdaName+.%0D%0A++%7D%0D%0A+%7D+LIMIT+1000%0D%0A'
Only where names differ[edit]
This might be a more interesting list:
PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX p: <http://www.wikidata.org/prop/> PREFIX v: <http://www.wikidata.org/prop/statement/> SELECT * WHERE { ?p wdt:P31/wdt:P279 wd:Q6256 . SERVICE wikibase:label { bd:serviceParam wikibase:language "nn" . ?p rdfs:label ?nnName . } SERVICE wikibase:label { bd:serviceParam wikibase:language "da" . ?p rdfs:label ?daName . } FILTER (!(?nnName = ?daName)) } LIMIT 10
Getting labels from dumps[edit]
See https://www.wikidata.org/wiki/Wikidata:Database_download for the dumps; downloads are at https://dumps.wikimedia.org/wikidatawiki/entities/ – then you can run:
$ bzcat wikidata-20160229-all.json.bz2 \ | grep '^{' |sed 's/,$//' \ | jq -c '{ "da":.labels.da.value, "sv":.labels.sv.value, "nn":.labels.nn.value }'
(the grep+sed is necessary so jq won't try to fit the whole array in memory)
You'll get some silliness like {"da":"CSS","sv":"Cascading Style Sheets","nn":"Stilark"}
but there's probably some gold in there as well.
A simple way to get only toponyms is to check that the entry "claims" a coordinate location, ie.
$ bzcat wikidata-20160229-all.json.bz2 \ | grep '^{' |sed 's/,$//' \ | jq -c 'if .claims.P625 then { "da":.labels.da.value, "sv":.labels.sv.value, "nn":.labels.nn.value } else null end'
(You can't just grep for '"datatype":"globe-coordinate"' or P625 or whatever, since it has to be the top-level entry which has that property. If you just do a simple grep, you'll also get properties-of-properties, e.g. Borges was buried at a coordinate location.)