Frequently Asked Questions
There are many ways to contribute to Apertium, from sending us lists of words or phrases you find that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. Here are some question frequently asked by users.
Contents
How do I start off?
Regardless of the kind of contribution you want to do, the two things to start with are to subscribe to the mailing list apertium-stuff, which is where most of the discussion goes on. Also, come and idle on the IRC channel #apertium
on irc.freenode.net
.
To decide what you want to contribute, take a look at Development and Projects for some ideas we've had around programming, extending the engine, and have a look at the Incubator if you're interested in linguistic issues. If you can't find anything that interests you or piques your interest, just send an email or ask someone on IRC and they'll be happy to help.
How do I add or fix words?
If you have some words that are unknown in a certain language pair, you can help out by simply writing list of words and their translations, e.g.
house; noun; casa; noun f dog; noun; perro; noun m
into a file, and sending that to the mailing list. Most likely you want to send to the one called "apertium-stuff"; subscribe here, then attach the file and send it to apertium-stuff@lists.sourceforge.net.
You can also send a spreadsheet file—if you prefer that.
How can I contribute my knowledge?
The Indirect contribution guide has some tips on how to contribute your knowledge of a language to create resources that we use in Apertium, such as
- Writing contrastive analyses
- Cataloguing resources
- Hand-translating text
- Converting dictionaries
- Contributing to related projects
How do I get more involved?
The first thing you should do if you want to get more involved is to introduce yourself on the mailing list and hang out on our IRC channel. There is also a list of Apertium mentors.
Next, you should install apertium, lttoolbox and some language pair to play around with.
If you want to create or contribute to a language pair, go through the New language pair HOWTO. This is required reading for anyone who wants to get involved with developing Apertium language pairs. Also, take a look at Contributing to an existing pair, meant for those who want to contribute to existing language pairs. You can improve the quality of the translation for an existing pair by correcting errors in the dictionaries. You will find some hints on the page Finding_errors_in_dictionaries.
Next up, the Apertium EU Workshop site is a comprehensive guide to rule based machine translation with Apertium (originally made for a four-day course on Apertium for people with little background in machine translation); print this out and read it on the bus/train/boat
If you're a student, Google Summer of Code or Google Code-In for high-school students) is a good way to get involved with Apertium, and the ideas page there has lots of project tips if you're more interested in programming than linguistics/language pairs. If you are on the task of requesting a wiki account and adopting a page, contact a mentor to request an account to gain access to edit the wiki.
Why are you using XML and not a database?
XML is not a really inefficient format to store dictionaries. With all these spaces and tags, they are complicated to read. Would it not be better to have all the information in a database, like Postgres or MySQL? Or even in ordinary text files?
- Each data item is explicitly tagged with a descriptive tag named with a clear meaning associated with it
- Document structure can be easily validated using DTDs or schemas
- Several technologies exist for XML (conversion to and from XML, interoperability).
- XML is quite easy to process with word processing tools like sed, cut and awk.
- You can read more practical and theoretical details about our format to memorize the dictionaries here: Morphological dictionary.
Does Apertium support separable verbs?
Several languages, for example, most of the Germanic languages (with the exception of English) and the Hungarian have a phenomenon called "separable verbs", also called "attached prepositions" or by other names. This is when the verb's infinitive has a part that is detached and displaced when the verb is conjugated. For example in Afrikaans, the verb "to announce" is "aankondig". The part "aan" is separated when the verb is conjugated, so for example:
- Sterrekundiges kondig [die ontdekking] aan.
- Astronomers announce [the discovery].
The stem "kondig" does not by itself mean anything, only in conjunction with the particle "aan;" however, this is not always the case. The past participle is formed by inserting "ge" in between the particle and the stem, for example:
- Sterrekundiges het [die ontdekking] aangekondig.
- Astronomers have announced [the discovery].
Essentially no, for the moment we do not support separable verbs. The problem for Apertium occurs when the non-separated part does not mean anything, it is for the moment impossible to analyze a word in two parts when they are separated by something as nebulous as a nominal group (NP). There are a number of hacks that can be tried to work around this deficiency, but none of them work properly. If you would like more information on this, or have ideas on how to deal with it or cope with it, please see our Separable verbs page.