Difference between revisions of "Indirect contribution guide"

From Apertium
Jump to navigation Jump to search
 
(24 intermediate revisions by 5 users not shown)
Line 7: Line 7:
 
This tutorial will teach you:
 
This tutorial will teach you:
   
* How to creating contrastive analyses.
+
* How to create contrastive analyses.
 
* How to catalog resources.
 
* How to catalog resources.
 
* How to convert dictionaries.
 
* How to convert dictionaries.
 
* How to translate.
 
* How to translate.
* How to help "Apertium" in other ways .
+
* How to help "Apertium" in other ways.
   
 
===Basics===
 
===Basics===
Line 27: Line 27:
 
====What you must do.====
 
====What you must do.====
   
Your task is to make a set of test sentences in the first language, and translate them to the other. An translation in a third language may be useful in enlisting help, but is not required.
+
Your task is to make a set of test sentences in the first language, and translate them to the other. A translation in a third language may be useful in enlisting help, but is not required.
   
 
A sample sentence in wiki markup looks like this:
 
A sample sentence in wiki markup looks like this:
Line 89: Line 89:
 
You can find some examples here: [[Bulgarian_and_Russian/Pending_tests|Bulgarian and Russian]], [[Bulgarian_and_Greek/Pending_tests|Bulgarian and Greek]].
 
You can find some examples here: [[Bulgarian_and_Russian/Pending_tests|Bulgarian and Russian]], [[Bulgarian_and_Greek/Pending_tests|Bulgarian and Greek]].
   
====How can you do it ?====
+
====How to do it?====
   
 
This is quite an easy task if you know both languages. However, if you only know one language well, concentrate on translating '''to''' that language. We can always find help with the other direction later, and it helps to know ''what'' we need to find help with.
 
This is quite an easy task if you know both languages. However, if you only know one language well, concentrate on translating '''to''' that language. We can always find help with the other direction later, and it helps to know ''what'' we need to find help with.
Line 100: Line 100:
 
#* If they do, ask them to help you.
 
#* If they do, ask them to help you.
 
#* If you know more than one person who speaks the language, you will finish the job faster and better
 
#* If you know more than one person who speaks the language, you will finish the job faster and better
#** If friend is busy, queue your questions and write them later.
+
#** If your friend is busy, queue your questions and write them later.
 
#Get a textbook for the language that you don't know.
 
#Get a textbook for the language that you don't know.
 
#* For some languages this is not a good idea.
 
#* For some languages this is not a good idea.
Line 106: Line 106:
 
#** They are expensive.
 
#** They are expensive.
 
#* Look on the Internet for textbooks.
 
#* Look on the Internet for textbooks.
#** There are many textbooks and it is a good idea to see more than one.
+
#** There are many textbooks and it is a good idea to look at more than one.
   
===Create catalog resources===
+
===Catalogue resources===
   
  +
The task is to catalogue as many of the available linguistic resources as possible. Dictionaries, grammars, accademic papers, etc., are all resources that we can use to get a general idea of how the language works, which we can use to get an idea of how to translate from that language.
Your job is to catalogue all the available linguistic resources.
 
   
 
;Lingistic resouces.
 
;Lingistic resouces.
Line 121: Line 121:
 
* papers.
 
* papers.
 
* corpora.
 
* corpora.
* and more ...
+
* and more...
   
;How to do it ?
+
;How to do it?
* Use google of course.
+
* Use Google of course.
* And anouther search engines.
+
* And other search engines, such as:
 
** ScienceDirect.
 
** ScienceDirect.
 
** JSTOR.
 
** JSTOR.
 
** Duck Duck Go.
 
** Duck Duck Go.
   
You can see all search engines [http://en.wikipedia.org/wiki/Web_search_engine|here]
+
See also Wikipedia's article about [http://en.wikipedia.org/wiki/Web_search_engine search engines].
   
===Create translation===
+
===Translate===
  +
The task is to translate text (articles on this wiki, for example) to another language. You should speak the target language (the language you are translating to) natively or near natively to translate -- if you can't, it's best to leave the translation to someone who does.
Your job is to translate text to another language.
 
To create translation the language must be native to you or you are very good at it.<br />
 
   
;Follow these tips.
+
;Some tips.
* If you don't sure about something ask native speaker or, if it's possible, a specialist.
+
* If you aren't sure about something, ask a native speaker or, if possible, a specialist.
* You must not change the meaning what you translating.
+
* You must not change the meaning of what you translating.
** Follow the meaning not he words.
+
** Follow the meaning, not the words.
 
* You must pay attention to language nuances.
* This is write translation:
 
** There is no need to hurry.
 
** More time = better translation.
 
* You must pay attention on language nuances
 
 
* You must pay attention to the tenses.
 
* You must pay attention to the tenses.
 
* Follow the original style:
 
* Follow the original style:
** If the text style is wordy,colloquial,funny... Follow it.
+
** If the text style is wordy, colloquial, funny... follow it.
** Pay attention to the punctuation marks and what is text for.
+
** Pay attention to the punctuation.
   
 
; Pay attention!
 
; Pay attention!
* Don't make translation if you don't know the language!
+
* Don't translate if you don't know the language!
* Don't take translation task if you know that you can't do it!
+
* Don't take a translation task if you know that you can't do it!
* Don't take translation task if you don't have the time to do it!
+
* Don't take a translation task if you don't have the time to do it!
  +
  +
===Post-edit===
  +
  +
Related to the above: post-edit an Apertium translation, and provide us with the result. Having a set of corrections to refer to can help us to refine the translator. We are particularly interested in Open Content text (text under Free licences, such as the GNU FDL or CC BY/BY-SA), which we can freely redistribute, and translations that are under a similar licence.
  +
  +
Even better, if you have used a Translation Memory (such as [http://www.omegat.org OmegaT]), providing us with the TMX can help us in a variety of ways.
   
 
===Convert dictionaries===
 
===Convert dictionaries===
Your job is to translate words. <br />
 
If you translate verbs, look in what tense you must translate them:
 
*perf = perfective.
 
*imperf = imperfective.
 
   
  +
There are many dictionaries available under free licences, that we would like to have converted to Apertium's format. However, it's not always as simple as taking words: Apertium (usually) allows only a single translation option per word; there are also some tagging differences that need to be present in Apertium's lexicon, to assist in grammatical operations.
;How to do it ?
 
* It is a good idea to have friends that speak that language.
 
** You can ask him some words or to check your that you had written before.
 
* Look for dictionary
 
** You can buy dictionary they are not very expansive.
 
** You can search for free dictionary in the web.
 
** It's a good idea to do the both.
 
* If something is wrote somewhere check it in different source/location
 
   
  +
;Tagging
;Where and how to write the words ?
 
  +
If the word list doesn't have part-of-speech information, you will need to add it.
* Ask your mentor about that, but you can do it in XML and send them file.
 
   
  +
;Gender
XML markup looks like this:
 
  +
If one or both of the languages have grammatical gender (male, female, neuter), the lexicon needs to have information about the gender when it is different for a set of words. It's not strictly necessary in other cases, but it '''is''' useful to have in the dictionary for other reasons, so we encourage you to add grammatical gender, always!
   
  +
;Aspect
<pre> <e><p><l>First language<s n="vblex"/><s n="perf/imperf"/><s n="tv"/></l><r>Seccond language<s n="vblex"/><s n="imperf"/></r></p></e> </pre>
 
  +
Similarly, if you are translating verbs in a language with aspect pairs (i.e., Slavic languages), tag the aspect (even though it's not usually strictly necessary between languages with similar concepts of aspect):
 
*perf = perfective.
 
*imperf = imperfective.
   
 
;How to do it?
Examples: <br />
 
  +
* If you have friends who speak the language, ask for their help.
 
* Look in a dictionary.
 
** You can buy a dictionary, they are not very expensive.
 
** You can search for a free dictionary on the web.
 
** It's a good idea to do both.
  +
* Double check: don't rely on a single source.
  +
  +
<!-- A **good** example is required here. Copy and paste one, don't try to invent junk -->
  +
 
Examples:
   
 
Bulgarian&rarr;Russian
 
Bulgarian&rarr;Russian
   
  +
Say you've found a Bulgarian-Russion dictionary, and it says that the imperfective verb обяснява translates into the imperfective обяснять, while the perfective напише translates into the perfective написать, the converted version of this should look like:
<nowiki> <e><p><l>обяснява<s n="vblex"/><s n="imperf"/><s n="tv"/></l><r>обяснять<s n="vblex"/><s n="imperf"/></r></p></e> </nowiki>
 
   
  +
<pre>
<nowiki> <e><p><l>напише<s n="vblex"/><s n="perf"/><s n="tv"/></l><r>написать<s n="vblex"/><s n="perf"/></r></p></e> </nowiki>
 
 
<e><p><l>обяснява<s n="vblex"/><s n="imperf"/><s n="tv"/></l><r>обяснять<s n="vblex"/><s n="imperf"/></r></p></e>
 
<e><p><l>напише<s n="vblex"/><s n="perf"/><s n="tv"/></l><r>написать<s n="vblex"/><s n="perf"/></r></p></e>
  +
</pre>
   
  +
However, the difficult part is not getting it into this XML format, but getting each pair of verbs, and the important information for each pair (here: aspect), and making sure it is <i>consistent</i> and machine-readable.
===Other===
 
  +
 
===Other ways to help===
   
;Other ways to help.
 
 
* If you find mistakes in translation or pending test.
 
* If you find mistakes in translation or pending test.
** Inform the mentors.
+
** [[Contact|Let us know!]].
 
** Correct them.
 
** Correct them.
*Ask mentors is there another way to help .
+
*Ask on IRC or the mailing list (or, if you have one, your mentor) if there is another way to help.
  +
* You can also help us with [[Apertium-html-tools#Localisation|website localisation]]!
  +
 
===Other projects===
  +
  +
If you want to contribute to a free/open-source language resources project, but Apertium doesn't quite fit your intentions, here are some other resources:
  +
  +
* [http://omegawiki.org OmegaWiki] (Creative Commons BY & GNU FDL)
  +
* [http://tatoeba.org Tatoeba] (Creative Commons BY-SA)
  +
* [https://www.wiktionary.org/ Wiktionary] (Creative Commons BY & GNU FDL)
  +
  +
We often use the data from these projects in our dictionaries, so by contributing to them, you are contributing to Apertium :-)
  +
  +
[[Category:Documentation]]

Latest revision as of 05:58, 18 March 2015

Many people come to us with a question like "I'm not a programmer/linguist/whatever. Is there any way I can contribute?". This document is intended to show how you can make an "indirect" contribution, by documenting language resources, helping us to build bilingual test sets, translating, promoting, etc.

About This Tutorial[edit]

This tutorial will teach you:

  • How to create contrastive analyses.
  • How to catalog resources.
  • How to convert dictionaries.
  • How to translate.
  • How to help "Apertium" in other ways.

Basics[edit]

When in doubt, ask!

If you are participating as part of a programme such as Google Summer of Code, or Google Code-In, ask your mentor. Otherwise, ask for help on the IRC channel, or on the mailing list. Use the talk pages on the wiki -- leave your questions there, so you don't need to remember them later!

Create contrastive analyses[edit]

A 'contrastive analysis' is a set of example sentences which show the differences and similarities between a pair of languages. In a sense, it's a 'feature corpus' which we can use to develop and test rule hypotheses: if we see that the pattern noun + adjective becomes adjective + noun, then we have a good basis for building a rule. Think of it as 'raw input to a linguist': when we have a good enough idea of what a pair of languages look like, we later use these analyses to build translation rules.

One thing to note is that, when we see that something happens 9/10 times, or 8/10 times, etc., then we need to expand that exceptional part of the analysis, to get a better idea of what's happening: is it a certain class of words, or just a pure exception?

What you must do.[edit]

Your task is to make a set of test sentences in the first language, and translate them to the other. A translation in a third language may be useful in enlisting help, but is not required.

A sample sentence in wiki markup looks like this:

* {{test|First language abbreviation|First language.|Second language.|Other translation.}}

Examples:

* {{test|ru|Чашка большая.|Чашата е голямата.|The cup is big.}}

* {{test|el|Τι γίνεσαι?|как си?|How are you?}}

* {{test|bg|Вера се оглежда в огледалото.|Вера смотрит на себя в зеркало.|Vera is looking at herself in the mirror.}}

The following is a suggested list of features to provide coverage for. All language pairs have slightly different needs, but this list should provide a good general guideline.

  1. Simple syntax
    • Copula
    • Reported speech
    • Clitic placement
  2. Pronouns
    • Personal
    • Demonstrative
    • Relative
    • Possesive
    • Reflexive
    • Interrogative
  3. Nouns
    • General
    • Indefinite and definite forms
    • 1 Noun phrases
    • Indefinite (a, some)
    • Definite (the)
    • Demonstrative (this, that)
    • Quantified (a few, no, all)
  4. Numerals
    • Cardinal
    • Ordinal
  5. Adjective
    • Comparative
    • Superlative
  6. Adverbs
  7. Verbs
    • To be
    • General
    • Indicative mood
      • Present tense
      • Imperfect tense
      • Aorist tense
      • Perfect tense
      • Pluperfect tense
      • Future tense
    • Conditional mood
    • Imperative mood
  8. Questions

If you want you can also add

  1. Interjections
  2. Punctuation marks

You can find some examples here: Bulgarian and Russian, Bulgarian and Greek.

How to do it?[edit]

This is quite an easy task if you know both languages. However, if you only know one language well, concentrate on translating to that language. We can always find help with the other direction later, and it helps to know what we need to find help with.

That said, if you know people who can help, ask them to help you!

Some tips:

  1. Ask your friends if they know the language you don't know.
    • If they do, ask them to help you.
    • If you know more than one person who speaks the language, you will finish the job faster and better
      • If your friend is busy, queue your questions and write them later.
  2. Get a textbook for the language that you don't know.
    • For some languages this is not a good idea.
      • They are hard to find.
      • They are expensive.
    • Look on the Internet for textbooks.
      • There are many textbooks and it is a good idea to look at more than one.

Catalogue resources[edit]

The task is to catalogue as many of the available linguistic resources as possible. Dictionaries, grammars, accademic papers, etc., are all resources that we can use to get a general idea of how the language works, which we can use to get an idea of how to translate from that language.

Lingistic resouces.
  • wordlists.
  • grammatical descriptions.
  • wordlists.
  • dictionaries.
  • spellcheckers.
  • papers.
  • corpora.
  • and more...
How to do it?
  • Use Google of course.
  • And other search engines, such as:
    • ScienceDirect.
    • JSTOR.
    • Duck Duck Go.

See also Wikipedia's article about search engines.

Translate[edit]

The task is to translate text (articles on this wiki, for example) to another language. You should speak the target language (the language you are translating to) natively or near natively to translate -- if you can't, it's best to leave the translation to someone who does.

Some tips.
  • If you aren't sure about something, ask a native speaker or, if possible, a specialist.
  • You must not change the meaning of what you translating.
    • Follow the meaning, not the words.
  • You must pay attention to language nuances.
  • You must pay attention to the tenses.
  • Follow the original style:
    • If the text style is wordy, colloquial, funny... follow it.
    • Pay attention to the punctuation.
Pay attention!
  • Don't translate if you don't know the language!
  • Don't take a translation task if you know that you can't do it!
  • Don't take a translation task if you don't have the time to do it!

Post-edit[edit]

Related to the above: post-edit an Apertium translation, and provide us with the result. Having a set of corrections to refer to can help us to refine the translator. We are particularly interested in Open Content text (text under Free licences, such as the GNU FDL or CC BY/BY-SA), which we can freely redistribute, and translations that are under a similar licence.

Even better, if you have used a Translation Memory (such as OmegaT), providing us with the TMX can help us in a variety of ways.

Convert dictionaries[edit]

There are many dictionaries available under free licences, that we would like to have converted to Apertium's format. However, it's not always as simple as taking words: Apertium (usually) allows only a single translation option per word; there are also some tagging differences that need to be present in Apertium's lexicon, to assist in grammatical operations.

Tagging

If the word list doesn't have part-of-speech information, you will need to add it.

Gender

If one or both of the languages have grammatical gender (male, female, neuter), the lexicon needs to have information about the gender when it is different for a set of words. It's not strictly necessary in other cases, but it is useful to have in the dictionary for other reasons, so we encourage you to add grammatical gender, always!

Aspect

Similarly, if you are translating verbs in a language with aspect pairs (i.e., Slavic languages), tag the aspect (even though it's not usually strictly necessary between languages with similar concepts of aspect):

  • perf = perfective.
  • imperf = imperfective.
How to do it?
  • If you have friends who speak the language, ask for their help.
  • Look in a dictionary.
    • You can buy a dictionary, they are not very expensive.
    • You can search for a free dictionary on the web.
    • It's a good idea to do both.
  • Double check: don't rely on a single source.


Examples:

Bulgarian→Russian

Say you've found a Bulgarian-Russion dictionary, and it says that the imperfective verb обяснява translates into the imperfective обяснять, while the perfective напише translates into the perfective написать, the converted version of this should look like:

<e><p><l>обяснява<s n="vblex"/><s n="imperf"/><s n="tv"/></l><r>обяснять<s n="vblex"/><s n="imperf"/></r></p></e>  
<e><p><l>напише<s n="vblex"/><s n="perf"/><s n="tv"/></l><r>написать<s n="vblex"/><s n="perf"/></r></p></e> 

However, the difficult part is not getting it into this XML format, but getting each pair of verbs, and the important information for each pair (here: aspect), and making sure it is consistent and machine-readable.

Other ways to help[edit]

  • If you find mistakes in translation or pending test.
  • Ask on IRC or the mailing list (or, if you have one, your mentor) if there is another way to help.
  • You can also help us with website localisation!

Other projects[edit]

If you want to contribute to a free/open-source language resources project, but Apertium doesn't quite fit your intentions, here are some other resources:

We often use the data from these projects in our dictionaries, so by contributing to them, you are contributing to Apertium :-)