Wikidata
Here's an example query to get proper name translations for countries in Nynorsk/Bokmål/Danish from Wikidata:
PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX p: <http://www.wikidata.org/prop/> PREFIX v: <http://www.wikidata.org/prop/statement/> SELECT * WHERE { ?p wdt:P31/wdt:P279 wd:Q6256 . SERVICE wikibase:label { bd:serviceParam wikibase:language "nn" . ?p rdfs:label ?nnName . } SERVICE wikibase:label { bd:serviceParam wikibase:language "no,nb" . ?p rdfs:label ?nbName . } SERVICE wikibase:label { bd:serviceParam wikibase:language "da" . ?p rdfs:label ?daName . } } LIMIT 10
You can paste that into https://query.wikidata.org/ to get the first 10 hits.
Mouse-over things like "wd:Q6256" to show what they refer to, or look them up at urls like https://www.wikidata.org/wiki/Q6256 or https://www.wikidata.org/wiki/Property:P279
To get lots of hits, click the "🔗Link▼" button and right-click and copy the link to "REST Endpoint"; you can curl this with an increased LIMIT into a big file, e.g.
curl -o result.xml 'https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=PREFIX+wikibase%3A+%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0D%0APREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0D%0APREFIX+wdt%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+p%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2F%3E%0D%0APREFIX+v%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fstatement%2F%3E%0D%0ASELECT+*+WHERE+%7B%0D%0A+%3Fp+wdt%3AP31%2Fwdt%3AP279+wd%3AQ6256+.%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22nn%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FnnName+.%0D%0A++%7D%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22no%2Cnb%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FnbName+.%0D%0A++%7D%0D%0A++SERVICE+wikibase%3Alabel+%7B%0D%0A++++bd%3AserviceParam+wikibase%3Alanguage+%22da%22+.%0D%0A++++++++%3Fp+rdfs%3Alabel+%3FdaName+.%0D%0A++%7D%0D%0A+%7D+LIMIT+1000%0D%0A'
Only where names differ[edit]
This might be a more interesting list:
PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX p: <http://www.wikidata.org/prop/> PREFIX v: <http://www.wikidata.org/prop/statement/> SELECT * WHERE { ?p wdt:P31/wdt:P279 wd:Q6256 . SERVICE wikibase:label { bd:serviceParam wikibase:language "nn" . ?p rdfs:label ?nnName . } SERVICE wikibase:label { bd:serviceParam wikibase:language "da" . ?p rdfs:label ?daName . } FILTER (!(?nnName = ?daName)) } LIMIT 10
Getting labels from dumps[edit]
See https://www.wikidata.org/wiki/Wikidata:Database_download for the dumps; downloads are at https://dumps.wikimedia.org/wikidatawiki/entities/ – then you can run:
$ bzcat wikidata-20160229-all.json.bz2 \ | grep '^{' |sed 's/,$//' \ | jq -c '{ "da":.labels.da.value, "sv":.labels.sv.value, "nn":.labels.nn.value }'
(the grep+sed is necessary so jq won't try to fit the whole array in memory)
You'll get some silliness like {"da":"CSS","sv":"Cascading Style Sheets","nn":"Stilark"}
but there's probably some gold in there as well.
A simple way to get only toponyms is to check that the entry "claims" a coordinate location, ie.
$ bzcat wikidata-20160229-all.json.bz2 \ | grep '^{' |sed 's/,$//' \ | jq -c 'if .claims.P625 then { "da":.labels.da.value, "sv":.labels.sv.value, "nn":.labels.nn.value } else null end'
(You can't just grep for '"datatype":"globe-coordinate"' or P625 or whatever, since it has to be the top-level entry which has that property. If you just do a simple grep, you'll also get properties-of-properties, e.g. Borges was buried at a coordinate location.)