Frequently Asked Questions
There are many ways to contribute to Apertium, from sending us lists of words or phrases you find that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. Here are some question frequently asked by users.
Our language data is in various formats, including XML and other human-editable texts. Language data is split into single-language packages that can analyse and generate a given language, and translation pairs that perform transfer and transformation between two languages. The single-language packages are shared amongst many pairs.
If you wish to contribute to the language agnostic native tools you'll need to know C++.
If you wish to contribute language data to Apertium, your contributions should fit in our existing pipeline. That is, it should be rule-based and deterministic. We will happily help you learn our formats and methods, and we know from experience it is possible to learn and use Apertium in short time.
We do not currently include any statistical or neural machine translation tools or methods. We are often asked if contributions can be made with statistical or neural systems, but for now they cannot.
For more information about how to contribute, see Contributing.
How do I start off?
Regardless of the kind of contribution you want to do, the two things to start with are to subscribe to the mailing list apertium-stuff, which is where most of the discussion goes on. Also, come and idle on the IRC channel
To decide what you want to contribute, take a look at Development and Projects for some ideas we've had around programming, extending the engine, and have a look at the Incubator if you're interested in linguistic issues. If you can't find anything that interests you or piques your interest, just send an email or ask someone on IRC and they'll be happy to help.
How do I add or fix words?
If you have some words that are unknown in a certain language pair, you can help out by simply writing list of words and their translations, e.g.
house; noun; casa; noun f dog; noun; perro; noun m
into a file, and sending that to the mailing list. Most likely you want to send to the one called "apertium-stuff"; subscribe here, then attach the file and send it to firstname.lastname@example.org.
You can also send a spreadsheet file—if you prefer that.
How can I contribute my knowledge?
The Indirect contribution guide has some tips on how to contribute your knowledge of a language to create resources that we use in Apertium, such as
- Writing contrastive analyses
- Cataloguing resources
- Hand-translating text
- Converting dictionaries
- Contributing to related projects
How do I get more involved?
Next, you should install apertium, lttoolbox and some language pair to play around with.
If you want to create or contribute to a language pair, go through the New language pair HOWTO. This is required reading for anyone who wants to get involved with developing Apertium language pairs. Also, take a look at Contributing to an existing pair, meant for those who want to contribute to existing language pairs. You can improve the quality of the translation for an existing pair by correcting errors in the dictionaries. You will find some hints on the page Finding_errors_in_dictionaries.
Next up, the Apertium EU Workshop site is a comprehensive guide to rule based machine translation with Apertium (originally made for a four-day course on Apertium for people with little background in machine translation); print this out and read it on the bus/train/boat
If you're a student, Google Summer of Code or Google Code-In for high-school students) is a good way to get involved with Apertium, and the ideas page there has lots of project tips if you're more interested in programming than linguistics/language pairs. If you are on the task of requesting a wiki account and adopting a page, contact a mentor to request an account to gain access to edit the wiki.
Why are you using XML and not a database?
XML is not a really inefficient format to store dictionaries. With all these spaces and tags, they are complicated to read. Would it not be better to have all the information in a database, like Postgres or MySQL? Or even in ordinary text files?
- Each data item is explicitly tagged with a descriptive tag named with a clear meaning associated with it
- Document structure can be easily validated using DTDs or schemas
- Several technologies exist for XML (conversion to and from XML, interoperability).
- XML is quite easy to process with word processing tools like sed, cut and awk.
- You can read more practical and theoretical details about our format to memorize the dictionaries here: Morphological dictionary.
Does Apertium support separable verbs?
Several languages, for example, most of the Germanic languages (with the exception of English) and the Hungarian have a phenomenon called "separable verbs", also called "attached prepositions" or by other names. This is when the verb's infinitive has a part that is detached and displaced when the verb is conjugated. For example in Afrikaans, the verb "to announce" is "aankondig". The part "aan" is separated when the verb is conjugated, so for example:
- Sterrekundiges kondig [die ontdekking] aan.
- Astronomers announce [the discovery].
The stem "kondig" does not by itself mean anything, only in conjunction with the particle "aan;" however, this is not always the case. The past participle is formed by inserting "ge" in between the particle and the stem, for example:
- Sterrekundiges het [die ontdekking] aangekondig.
- Astronomers have announced [the discovery].
The answer is yes, we do have a module created for exactly this purpose: Apertium-separable. However, this module has not yet been incorporated into many of our existing pairs.