Difference between revisions of "Frequently Asked Questions"

From Apertium
Jump to navigation Jump to search
m (Text replacement - "(chat|irc)\.freenode\.net" to "irc.oftc.net")
 
(28 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{Otherlang|Questions fréquentes|{{French}}}}
There are many ways to contribute to Apertium, from sending us lists of words or phrases you find that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces.


There are many ways to contribute to Apertium, from sending us lists of words or phrases you find that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. Here are some question frequently asked by users.
Regardless of the kind of contribution you want to do, the two things to start with are to subscribe to the mailing list [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff], which is where most of the discussion goes on. Also, come and idle on the [[IRC|IRC channel]] <code>#apertium</code> on <code>irc.freenode.net</code>.


== The Blurb ==
To decide what you want to contribute, take a look at ''[[Development]]'' and ''[[Projects]]'' for some ideas we've had around programming, extending the engine, and have a look at the ''[[Incubator]]'' if you're interested in linguistic issues. If you can't find anything that interests you or piques your interest, just send an email or ask someone on irc and they'll be happy to help.
Apertium is a ''[https://en.wikipedia.org/wiki/Rule-based_machine_translation rule-based machine translation]'' toolchain and ecosystem, with many of our tools based on [https://en.wikipedia.org/wiki/Finite-state_transducer finite-state transducers].


Our language agnostic tools are native and written in [https://en.wikipedia.org/wiki/C++ C++]. The various development helpers are mostly in [https://python.org/ Python].
==Adding/fixing unknown words==

Our language data is in various formats, including XML and other human-editable texts. Language data is split into single-language packages that can analyse and generate a given language, and translation pairs that perform transfer and transformation between two languages. The single-language packages are shared amongst many pairs.

If you wish to contribute to the language agnostic native tools you'll need to know C++.

If you wish to contribute language data to Apertium, your contributions should fit in our [[Apertium_system_architecture|existing pipeline]]. That is, it should be rule-based and deterministic. We will happily [[Contact|help you learn]] our formats and methods, and we know from experience it is possible to learn and use Apertium in short time.

We do not currently include any [https://en.wikipedia.org/wiki/Statistical_machine_translation statistical] or [https://en.wikipedia.org/wiki/Neural_machine_translation neural] machine translation tools or methods. We are often asked if contributions can be made with statistical or neural systems, but for now they cannot.

For more information about how to contribute, see [[Contributing]].

==How do I start off?==

Regardless of the kind of contribution you want to do, the two things to start with are to subscribe to the mailing list [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff], which is where most of the discussion goes on. Also, come and idle on the [[IRC|IRC channel]] <code>#apertium</code> on <code>irc.oftc.net</code>.

To decide what you want to contribute, take a look at ''[[Development]]'' and ''[[Projects]]'' for some ideas we've had around programming, extending the engine, and have a look at the ''[[Incubator]]'' if you're interested in linguistic issues. If you can't find anything that interests you or piques your interest, just send an email or ask someone on IRC and they'll be happy to help.

==How do I add or fix words?==
If you have some words that are unknown in a certain language pair, you can help out by simply writing list of words and their translations, e.g.
If you have some words that are unknown in a certain language pair, you can help out by simply writing list of words and their translations, e.g.
<pre>
<pre>
Line 16: Line 35:
You can also send a spreadsheet file—if you prefer that.
You can also send a spreadsheet file—if you prefer that.


==Contribute language knowledge==
==How can I contribute my knowledge?==
The [[Indirect contribution guide]] has some tips on how to contribute your knowledge of a language to create resources that we use in Apertium, such as
The [[Indirect contribution guide]] has some tips on how to contribute your knowledge of a language to create resources that we use in Apertium, such as
* Writing contrastive analyses
* Writing contrastive analyses
Line 24: Line 43:
* Contributing to related projects
* Contributing to related projects


==Getting more involved==
==How do I get more involved?==
The first thing you should do if you want to get more involved is to introduce yourself on the [[mailing list]] and hang out on our [[IRC]] channel. There is also a [[list of Apertium mentors]].
The first thing you should do if you want to get more involved is to introduce yourself on the [[mailing list]] and hang out on our [[IRC]] channel. There is also a [[list of Apertium mentors]].


Line 31: Line 50:
If you want to create or contribute to a language pair, go through the [[New language pair HOWTO]]. This is required reading for anyone who wants to get involved with developing Apertium language pairs. Also, take a look at [[Contributing to an existing pair]], meant for those who want to contribute to existing language pairs. You can improve the quality of the translation for an existing pair by correcting errors in the dictionaries. You will find some hints on the page [[Finding_errors_in_dictionaries]].
If you want to create or contribute to a language pair, go through the [[New language pair HOWTO]]. This is required reading for anyone who wants to get involved with developing Apertium language pairs. Also, take a look at [[Contributing to an existing pair]], meant for those who want to contribute to existing language pairs. You can improve the quality of the translation for an existing pair by correcting errors in the dictionaries. You will find some hints on the page [[Finding_errors_in_dictionaries]].


Next up, the [https://www.abumatran.eu/wp-content/uploads/2014/12/abumatran-apertium-workshop-data-guide.pdf Apertium EU Workshop site] is a comprehensive guide to rule based machine translation with Apertium (originally made for a four-day course on Apertium for people with little background in machine translation); print this out and read it on the bus/train/boat
<!-- link dead? :(

If you're a student, [[Google Summer of Code]] or [https://codein.withgoogle.com/ Google Code-In] for high-school students) is a good way to get involved with Apertium, and the ideas page there has lots of project tips if you're more interested in programming than linguistics/language pairs. If you are on the task of requesting a wiki account and adopting a page, contact a mentor to request an account to gain access to edit the wiki.

==Why are you using XML and not a database?==
XML is not a really inefficient format to store dictionaries. With all these spaces and tags, they are complicated to read. Would it not be better to have all the information in a database, like Postgres or MySQL? Or even in ordinary text files?

* Each data item is explicitly tagged with a descriptive tag named with a clear meaning associated with it
* Document structure can be easily validated using DTDs or schemas
* Several technologies exist for XML (conversion to and from XML, interoperability).
* XML is quite easy to process with word processing tools like sed, cut and awk.
* You can read more practical and theoretical details about our format to memorize the dictionaries here: ''[[Morphological dictionary]]''.

==Does Apertium support separable verbs?==
Several languages, for example, most of the Germanic languages ​​(with the exception of English) and the Hungarian have a phenomenon called "separable verbs", also called "attached prepositions" or by other names. This is when the verb's infinitive has a part that is detached and displaced when the verb is conjugated. For example in Afrikaans, the verb "to announce" is "aankondig". The part "aan" is separated when the verb is conjugated, so for example:

* Sterrekundiges '''kondig''' [die ontdekking] '''aan'''.
* Astronomers '''announce''' [the discovery].

The stem "kondig" does not by itself mean anything, only in conjunction with the particle "aan;" however, this is not always the case. The past participle is formed by inserting "ge" in between the particle and the stem, for example:


* Sterrekundiges '''het''' [die ontdekking] '''aangekondig'''.
Next up, the [http://wiki.apertium.eu/index.php/Programme_overview Apertium EU Workshop site] is a comprehensive guide to rule based machine translation with Apertium (originally made for a four-day course on Apertium for people with little background in machine translation); print this out and read it on the bus/train/boat :) -->
* Astronomers '''have announced''' [the discovery].


The answer is yes, we do have a module created for exactly this purpose: [[Apertium separable|Apertium-separable]]. However, this module has not yet been incorporated into many of our existing pairs.
If you're a student, [[Google Summer of Code]] (or Google Code-In for high-school students) is a good way to get involved with Apertium, and the ideas page there has lots of project tips if you're more interested in programming than linguistics/language pairs. If you are on the first task, contact a mentor to request an account to gain access to edit the wiki.


[[Category:Documentation in English]]
[[Category:Documentation in English]]

Latest revision as of 06:27, 27 May 2021

En français

There are many ways to contribute to Apertium, from sending us lists of words or phrases you find that are incorrectly translated, to getting involved in creating a new language pair or programming on tools or user interfaces. Here are some question frequently asked by users.

The Blurb[edit]

Apertium is a rule-based machine translation toolchain and ecosystem, with many of our tools based on finite-state transducers.

Our language agnostic tools are native and written in C++. The various development helpers are mostly in Python.

Our language data is in various formats, including XML and other human-editable texts. Language data is split into single-language packages that can analyse and generate a given language, and translation pairs that perform transfer and transformation between two languages. The single-language packages are shared amongst many pairs.

If you wish to contribute to the language agnostic native tools you'll need to know C++.

If you wish to contribute language data to Apertium, your contributions should fit in our existing pipeline. That is, it should be rule-based and deterministic. We will happily help you learn our formats and methods, and we know from experience it is possible to learn and use Apertium in short time.

We do not currently include any statistical or neural machine translation tools or methods. We are often asked if contributions can be made with statistical or neural systems, but for now they cannot.

For more information about how to contribute, see Contributing.

How do I start off?[edit]

Regardless of the kind of contribution you want to do, the two things to start with are to subscribe to the mailing list apertium-stuff, which is where most of the discussion goes on. Also, come and idle on the IRC channel #apertium on irc.oftc.net.

To decide what you want to contribute, take a look at Development and Projects for some ideas we've had around programming, extending the engine, and have a look at the Incubator if you're interested in linguistic issues. If you can't find anything that interests you or piques your interest, just send an email or ask someone on IRC and they'll be happy to help.

How do I add or fix words?[edit]

If you have some words that are unknown in a certain language pair, you can help out by simply writing list of words and their translations, e.g.

house; noun; casa; noun f
dog; noun; perro; noun m

into a file, and sending that to the mailing list. Most likely you want to send to the one called "apertium-stuff"; subscribe here, then attach the file and send it to apertium-stuff@lists.sourceforge.net.

You can also send a spreadsheet file—if you prefer that.

How can I contribute my knowledge?[edit]

The Indirect contribution guide has some tips on how to contribute your knowledge of a language to create resources that we use in Apertium, such as

  • Writing contrastive analyses
  • Cataloguing resources
  • Hand-translating text
  • Converting dictionaries
  • Contributing to related projects

How do I get more involved?[edit]

The first thing you should do if you want to get more involved is to introduce yourself on the mailing list and hang out on our IRC channel. There is also a list of Apertium mentors.

Next, you should install apertium, lttoolbox and some language pair to play around with.

If you want to create or contribute to a language pair, go through the New language pair HOWTO. This is required reading for anyone who wants to get involved with developing Apertium language pairs. Also, take a look at Contributing to an existing pair, meant for those who want to contribute to existing language pairs. You can improve the quality of the translation for an existing pair by correcting errors in the dictionaries. You will find some hints on the page Finding_errors_in_dictionaries.

Next up, the Apertium EU Workshop site is a comprehensive guide to rule based machine translation with Apertium (originally made for a four-day course on Apertium for people with little background in machine translation); print this out and read it on the bus/train/boat

If you're a student, Google Summer of Code or Google Code-In for high-school students) is a good way to get involved with Apertium, and the ideas page there has lots of project tips if you're more interested in programming than linguistics/language pairs. If you are on the task of requesting a wiki account and adopting a page, contact a mentor to request an account to gain access to edit the wiki.

Why are you using XML and not a database?[edit]

XML is not a really inefficient format to store dictionaries. With all these spaces and tags, they are complicated to read. Would it not be better to have all the information in a database, like Postgres or MySQL? Or even in ordinary text files?

  • Each data item is explicitly tagged with a descriptive tag named with a clear meaning associated with it
  • Document structure can be easily validated using DTDs or schemas
  • Several technologies exist for XML (conversion to and from XML, interoperability).
  • XML is quite easy to process with word processing tools like sed, cut and awk.
  • You can read more practical and theoretical details about our format to memorize the dictionaries here: Morphological dictionary.

Does Apertium support separable verbs?[edit]

Several languages, for example, most of the Germanic languages ​​(with the exception of English) and the Hungarian have a phenomenon called "separable verbs", also called "attached prepositions" or by other names. This is when the verb's infinitive has a part that is detached and displaced when the verb is conjugated. For example in Afrikaans, the verb "to announce" is "aankondig". The part "aan" is separated when the verb is conjugated, so for example:

  • Sterrekundiges kondig [die ontdekking] aan.
  • Astronomers announce [the discovery].

The stem "kondig" does not by itself mean anything, only in conjunction with the particle "aan;" however, this is not always the case. The past participle is formed by inserting "ge" in between the particle and the stem, for example:

  • Sterrekundiges het [die ontdekking] aangekondig.
  • Astronomers have announced [the discovery].

The answer is yes, we do have a module created for exactly this purpose: Apertium-separable. However, this module has not yet been incorporated into many of our existing pairs.