Frequently Asked Questions

From Apertium
Revision as of 04:45, 4 December 2019 by Joshuajy03 (talk | contribs)
Jump to navigation Jump to search

There are many ways to contribute to Apertium, from sending us lists of words or phrases you find that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. Here are some question frequently asked by users

How do I start off?

Regardless of the kind of contribution you want to do, the two things to start with are to subscribe to the mailing list apertium-stuff, which is where most of the discussion goes on. Also, come and idle on the IRC channel #apertium on irc.freenode.net.

To decide what you want to contribute, take a look at Development and Projects for some ideas we've had around programming, extending the engine, and have a look at the Incubator if you're interested in linguistic issues. If you can't find anything that interests you or piques your interest, just send an email or ask someone on IRC and they'll be happy to help.

How do I add or fix words?

If you have some words that are unknown in a certain language pair, you can help out by simply writing list of words and their translations, e.g.

house; noun; casa; noun f
dog; noun; perro; noun m

into a file, and sending that to the mailing list. Most likely you want to send to the one called "apertium-stuff"; subscribe here, then attach the file and send it to apertium-stuff@lists.sourceforge.net.

You can also send a spreadsheet file—if you prefer that.

How can I contribute my knowledge?

The Indirect contribution guide has some tips on how to contribute your knowledge of a language to create resources that we use in Apertium, such as

  • Writing contrastive analyses
  • Cataloguing resources
  • Hand-translating text
  • Converting dictionaries
  • Contributing to related projects

How do I get more involved?

The first thing you should do if you want to get more involved is to introduce yourself on the mailing list and hang out on our IRC channel. There is also a list of Apertium mentors.

Next, you should install apertium, lttoolbox and some language pair to play around with.

If you want to create or contribute to a language pair, go through the New language pair HOWTO. This is required reading for anyone who wants to get involved with developing Apertium language pairs. Also, take a look at Contributing to an existing pair, meant for those who want to contribute to existing language pairs. You can improve the quality of the translation for an existing pair by correcting errors in the dictionaries. You will find some hints on the page Finding_errors_in_dictionaries.

Next up, the Apertium EU Workshop site is a comprehensive guide to rule based machine translation with Apertium (originally made for a four-day course on Apertium for people with little background in machine translation); print this out and read it on the bus/train/boat

If you're a student, Google Summer of Code or Google Code-In for high-school students) is a good way to get involved with Apertium, and the ideas page there has lots of project tips if you're more interested in programming than linguistics/language pairs. If you are on the task of requesting a wiki account and adopting a page, contact a mentor to request an account to gain access to edit the wiki.

Why are you using XML and not a database?

XML is not a really inefficient format to store dictionaries, all these spaces and tags, they are complicated to read, would not it be better to have all the information in a database, like Postgres or MySQL? Or even in ordinary text files?

Reply Each data item is explicitly tagged with a descriptive tag named with a clear meaning associated with it Document structure can be easily validated using DTDs or schemas Several technologies exist for XML (conversion to and from XML, interoperability). XML is quite easy to process with word processing tools like sed, cut and awk. You can read more practical and theoretical details about our format to memorize the dictionaries here: Morphological dictionary.