Ideas for Google Summer of Code/Advanced Wikipedia translation

From Apertium
Jump to navigation Jump to search

Translating Wikipedia, and Wikipedia's wiki syntax, presents a different sort of challenge to the usual formats in Apertium, because much of the formatting exists to convey meaning, and this meaning must be considered by the translator of a wikipedia article.

At the basic level, links, categories, and templates cannot simply be transferred from one wikipedia to another, so it's not appropriate to represent them exactly in the output. There are, however, typically equivalents in the target wikipedia that should be used instead - they should be 'translated'.

DBPedia is a database containing data extracted from Wikipedia. Initially targeting the English Wikipedia, it is currently being extended to other languages. At present, for several Wikipedias, it's possible to craft a query for DBPedia to get the equivalent page for a number of items - links and categories, for example.

In addition, the extraction templates used to extract information from Wikipedia's infoboxes could be used to construct infoboxes for the target.

Take, for example, the templates for the philosopher infobox for en.wikipedia and ca.wikipedia:

{{TemplateMapping
| mapToClass = Philosopher
| mappings =
	{{PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }}
	{{PropertyMapping | templateProperty = birth_date | ontologyProperty = birthDate }}
	{{PropertyMapping | templateProperty = birth_date | ontologyProperty = birthYear }}
	{{PropertyMapping | templateProperty = birth_place | ontologyProperty = birthPlace }}
	{{PropertyMapping | templateProperty = death_date | ontologyProperty = deathDate }}
	{{PropertyMapping | templateProperty = death_date | ontologyProperty = deathYear }}
	{{PropertyMapping | templateProperty = death_place | ontologyProperty = deathPlace }}
	{{PropertyMapping | templateProperty = region | ontologyProperty = region }}
	{{PropertyMapping | templateProperty = era | ontologyProperty = era }}
	{{PropertyMapping | templateProperty = school_tradition | ontologyProperty = philosophicalSchool }}
	{{PropertyMapping | templateProperty = main_interests | ontologyProperty = mainInterest }}
	{{PropertyMapping | templateProperty = notable_ideas  | ontologyProperty = notableIdea }}
	{{PropertyMapping | templateProperty = influences | ontologyProperty = influencedBy }}
	{{PropertyMapping | templateProperty = influenced | ontologyProperty = influenced }}
}}
{{TemplateMapping
| mapToClass = Philosopher
| mappings =
	{{PropertyMapping | templateProperty = nom | ontologyProperty = foaf:name }}
	{{PropertyMapping | templateProperty = naixement | ontologyProperty = birthDate }}
	{{PropertyMapping | templateProperty = mort | ontologyProperty = deathDate }}
	{{PropertyMapping | templateProperty = regio | ontologyProperty = region }}
	{{PropertyMapping | templateProperty = era | ontologyProperty = era }}
	{{PropertyMapping | templateProperty = escola_tradicio | ontologyProperty = philosophicalSchool }}
	{{PropertyMapping | templateProperty = interessos | ontologyProperty = mainInterest }}
	{{PropertyMapping | templateProperty = idees  | ontologyProperty = notableIdea }}
	{{PropertyMapping | templateProperty = influencies | ontologyProperty = influencedBy }}
	{{PropertyMapping | templateProperty = influencia | ontologyProperty = influenced }}
}}

Given a list of mappings between equivalent infoboxes, it would be possible to generate a target infobox based on the ontologyProperty mappings.