Indirect contribution guide

From Apertium
Revision as of 16:36, 2 January 2011 by Jimregan (talk | contribs) (→‎What you must do.: use wiki markup)
Jump to navigation Jump to search

Many people come to us with a question like "I'm not a programmer/linguist/whatever. Is there any way I can contribute?". This document is intended to show how you can make an "indirect" contribution, by documenting language resources, helping us to build bilingual test sets, translating, promoting, etc.

About This Tutorial

This tutorial will teach you:

  • How to creating contrastive analyses.
  • How to catalog resources.
  • How to convert dictionaries.
  • How to translate.
  • How to help "Apertium" in other ways .


When in doubt, ask!

If you are participating as part of a programme such as Google Summer of Code, or Google Code-In, ask your mentor. Otherwise, ask for help on the IRC channel, or on the mailing list. Use the talk pages on the wiki -- leave your questions there, so you don't need to remember them later!

Create contrastive analyses

A 'contrastive analysis' is a set of example sentences which show the differences and similarities between a pair of languages. In a sense, it's a 'feature corpus' which we can use to develop and test rule hypotheses: if we see that the pattern noun + adjective becomes adjective + noun, then we have a good basis for building a rule. Think of it as 'raw input to a linguist': when we have a good enough idea of what a pair of languages look like, we later use these analyses to build translation rules.

One thing to note is that, when we see that something happens 9/10 times, or 8/10 times, etc., then we need to expand that exceptional part of the analysis, to get a better idea of what's happening: is it a certain class of words, or just a pure exception?

What you must do.

Your task is to make a set of test sentences in the first language, and translate them to the other. An translation in a third language may be useful in enlisting help, but is not required.

A sample sentence in wiki markup looks like this:

* {{test|First language abbreviation|First language.|Second language.|Other translation.}}


* {{test|ru|Чашка большая.|Чашата е голямата.|The cup is big.}} * {{test|el|Τι γίνεσαι?|как си?|How are you?}} * {{test|bg|Вера се оглежда в огледалото.|Вера смотрит на себя в зеркало.|Vera is looking at herself in the mirror.}}

The following is a suggested list of features to provide coverage for. All language pairs have slightly different needs, but this list should provide a good general guideline.

  1. Simple syntax
    • Copula
    • Reported speech
    • Clitic placement
  2. Pronouns
    • Personal
    • Demonstrative
    • Relative
    • Possesive
    • Reflexive
    • Interrogative
  3. Nouns
    • General
    • Indefinite and definite forms
    • 1 Noun phrases
    • Indefinite (a, some)
    • Definite (the)
    • Demonstrative (this, that)
    • Quantified (a few, no, all)
  4. Numerals
    • Cardinal
    • Ordinal
  5. Adjective
    • Comparative
    • Superlative
  6. Adverbs
  7. Verbs
    • To be
    • General
    • Indicative mood
      • Present tense
      • Imperfect tense
      • Aorist tense
      • Perfect tense
      • Pluperfect tense
      • Future tense
    • Conditional mood
    • Imperative mood
  8. Questions

If you want you can also add

  1. Interjections
  2. Punctuation marks

You can find some examples here: Bulgarian and Russian, Bulgarian and Greek.

How can you do it?

It is only easy if you only know the both languages.
If you don't know them it is good idea to read this.
One of the languages must be native to you or you must know it very well.
I will give you some tips, that you can help you with your work.

  1. Ask your friends did they know the language you don't know.
  • If they know it, ask them would they help you.
  • If you know more than one people who speak the language, you will finish the job faster and better