https://wiki.apertium.org/w/api.php?action=feedcontributions&user=Objectivesea&feedformat=atomApertium - User contributions [en]2024-03-28T10:26:48ZUser contributionsMediaWiki 1.34.1https://wiki.apertium.org/w/index.php?title=Bengali_and_English/BugsAndIssues&diff=30444Bengali and English/BugsAndIssues2011-12-23T00:11:03Z<p>Objectivesea: Made a few suggestions for indicating increased emphasis in English equivalents; these should be checked by a person fluent in Bengali and English</p>
<hr />
<div>{{TOCD}}<br />
<br />
== Nouns ==<br />
<br />
<s># Only 800 tagged pure nouns from anubadok dictionary matched against CRBLP's 20K most freq used word list<br />
:* need to tag more manually (en-es package has 5K approx. need to reach there)</s><br />
:* Anubadok has about 2000 Nouns in its own list<br />
:* Anubadok has about 2300 Proper Nouns in its own list<br />
# Some nouns are always pl or sg, need to tag those<br />
# <s>We are excluding Proper nouns now</s><br />
# We are excluding adjectives that can be used as nouns, right now<br />
# We are keeping track of the plural form generation through animacy; this is good, but in the long run we need to come up with something more sophisticated<br />
# Some nouns can have hybrid animacy; need to tag those later<br />
# Should we tag the subtype of Noun?<br />
<br />
=== Number ===<br />
# মা - মারা , জনক - জনকরা - These are wrong; need to add rule to fix that. Either mark them as irregular and entry in a separate table or just find the adequate rule for them; right is মা - মায়েরা, জনক - জনকেরা<br />
#There is still some confusion on how to treat definite articles. In the case of the indefinite article, '''a, an''' is translated as একটা/একটি. Now, for the definite article ''the'', number needs to be taken into account. For singular number, we add টা/টি. e.g. বই - বইটা, মানুষ - মানুষটা. But this is only used if the noun has a low animacy. We can safely say, বইটা (The book), বিড়ালটা (The cat), মানুষটা (The man), পাগলটা (The mad man). But we cannot say, রাষ্ট্রপতিটা - (gloss, the president); apparently, the affix is dropped as the animacy gets higher, so রাষ্ট্রপতি can mean 'both president' and 'the president'. For plural number, things are somewhat similar, adding গুলা/গুলি/গুলো at the end of a noun makes it plural and also has an implicit 'the'. So, বইগুলো - The books, বিড়ালগুলো - The cats, মানুষগুলো - The men. But we cannot say সন্যাসীগুলো - (gloss, the saints). For higher animacy plurality, রা or গণ is generally used, but these affixes express indefiniteness. For example, সন্যাসীরা/সন্যসীগণ means 'saints', NOT 'the saints'. This issue needs to be resolved.<br />
<br />
== Pronouns ==<br />
<br />
== Adjective ==<br />
<br />
* Adjectives can have genitive forms, eg. অল্পের জন্য বেঁচে গেছি। But this is only when the adjective is used as nouns, so we need to add these adjectives as nouns too<br />
<br />
== Verb ==<br />
<br />
* The gerund form of the verb can be used as nouns, so we need to add these gerunds into noun table, and mark them as inanimate.<br />
* Some verbs have alternate spelling that is equally acceptable, for eg. দেই - দিই for the verb দি - দে, apparently both forms are acceptable, so for the analyzing part, we'll need to be able to analyze both, Some more example would be উলটা - ওলটা, ঝুলা - ঝোলা, গুছা - গোছা। Right now will focus on only one of the forms<br />
<br />
== Adverb ==<br />
<br />
* We are marking all the adverbs as <adv> and have not marked <cnjadv> properly; this needs to be changed ASAP<br />
<br />
== Determiner ==<br />
<br />
== Enclitic/Proclitic ==<br />
<br />
=== ও (O)===<br />
<br />
* ও (0): e.g. করে - করেও, পড়ে - পড়েও, when added to past participles, adds the meaning of 'Despite' or 'In spite of'<br />
:* সে '''পড়ে''' পাস করতে পারল না - He could not pass by studying. সে '''পড়েও''' পাস করতে পারল না - He could not pass, despite studying.<br />
* ও (0): The same enclitic as above, when added to nouns and pronouns, bears the sense of 'also'/'too'<br />
:* বাড়িটা - বাড়িটাও -> সে বাড়িটাও বিক্রি করে দিল - He sold the house too.<br />
:* আমি - আমিও -> সবার সাথে আমিও সেখানে গেলাম - I, too, went there '''or''' I, along with others, went there. <br />
* ও has the same effect on adjectives, adverbs and verbs<br />
:* Verb - সে কাজ করে এবং খায়ও খুব - He works and also eats a lot.<br />
:* Adjective - সে সুন্দরী এবং বুদ্ধিমতিও - She is pretty and intelligent as well.<br />
:* Adverb - তুমি এভাবেও কাজটি করতে পার - You can also do the work in this way.<br />
* ও, When added after a gerund, it has the meaning of 'even' (adverb)<br />
:* সে পড়ারও সময় পেল না - He did not even get the time to read.<br />
<br />
=== ই (I) ===<br />
* When added after a verb, it acts as an emphasizer. e.g করব - করবই. আমি কাজটি করব - I shall do the work, আমি কাজটি করবই - I will do the work/ I shall surely do the work, same is for infinitive - করে - করেই e.g. আমি কাজটি করতে গেলাম - I went to do the work, আমি কাজটি করতেই গেলাম - I went only to do the work.<br />
* Adding after gerund is somewhat cosmetic, nevertheless it adds emphasis, ওখানে যাওয়াটাই ভুল ছিল - (emph) Going there was a mistake [Can anyone suggest a better translation? :(]<br>: How about this? --> Geing there was indeed a mistake.<br />
* ই, added after nouns or pronouns, similarly adds emphasis. রহিমই দোষী - (emph) Rahim is guilty.<br>: How about this? --> Rahim is indeed guilty.<br />
<br />
== Misc ==<br />
<br />
* The word 'কাছ':<br />
: সে আমার কাছে আসল - He came to me/ He came near me (Anubadok translates 'He came to me' - সে আমাকেতে আসেছিল, which is wrong ...)<br />
: সে আমার কাছের লোক - He is a close person of mine (The translation is still incorrect, I don't know the exact translation ...)<br>: How about this? --> He is a person close to me.<br />
: সে আমার কাছ থেকে বইটা নিল - He took the book from me.<br />
<br />
* Another word 'অল্প':<br />
: সে অল্পে খুশি (রয়েছে) - He is satisfied with less.<br />
: আমি অল্প খাই - I eat less<br />
: আমি অল্পের জন্য কাজটা করতে পারলাম না - I could not do the work for (less, or something) [it's tough doing word for word translation :(]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Building_dictionaries&diff=30441Building dictionaries2011-12-22T21:13:06Z<p>Objectivesea: Punctuation improvements; updated reference to Wikipedia</p>
<hr />
<div>{{TOCD}}<br />
Some of you have been brave enough to start to write new language pairs<br />
for Apertium. That makes me (and all of the Apertium troop) very happy<br />
and thankful, but more importantly, it makes Apertium useful to more<br />
people.<br />
<br />
I want to share some lessons I have learned after building<br />
some dictionaries: the importance of frequency estimates. For the new<br />
pairs to have the best possible coverage with a minimum of effort, it is<br />
very important to '''add words and rules in decreasing frequency''', starting<br />
with the most frequent words and phenomena.<br />
<br />
The reason that words should be added in order of frequency is quite intuitive: <br />
the higher the frequency, the more likely the word is to appear in the text you are<br />
trying to translate (see below for Zipf's law).<br />
<br />
For example, in English you can almost be sure that the words "the" or<br />
"a" will appear in all but the most basic sentences; however, how many<br />
times have you seen "hypothyroidism" or "obelisk" written? The higher the frequency of<br />
the word, the more you "gain" from adding it.<br />
<br />
==Frequency==<br />
A person's intuition on which words are important or frequent can be<br />
very deceptive. Therefore, the best one can do is collect a lot of text<br />
(millions of words, if possible) which is representative of what one<br />
wants to translate, and study the frequencies of words and phenomena.<br />
Get it from Wikipedia or from a newspaper archive, or write a robot that harvests<br />
it from the Web. <br />
<br />
It is quite easy to make a crude "hit parade" of words using a simple<br />
Unix command sequence (a single line):<br />
<br />
<pre><br />
$ cat mybigrepresentative.txt | tr ' ' '\012' | sort -f | uniq -c | sort -nr > hitparade.txt<br />
</pre><br />
<br />
[I took this from ''Unix for Poets'', I think.]<br />
<br />
Of course, this may be improved a lot but serves for illustration<br />
purposes.<br />
[[Image:Wikipedia-n-zipf.png|thumb|320px|right|'''Word frequency vs. Word rank''': A plot of word frequency in Wikipedia. The plot is in log-log coordinates. ''X'' is the rank of a word in the frequency table; ''Y'' is the total number of the word’s occurences. Zipf's law corresponds to the upper linear portion of the curve, roughly following the green (1/''x'') line.]]<br />
<br />
You will find interesting properties in this list. One is that in multiplying the rank of a word by its frequency, you get a number which is pretty constant. That's called [http://en.wikipedia.org/wiki/Zipf%27s_law Zipf's Law].<br />
<br />
Another one is that '''half of the list''' are ''hapax legomena'' (words that appear only once).<br />
<br />
Third, with about 1,000 words you may have 75% of the text covered.<br />
<br />
So use lists like these when you are building dictionaries.<br />
<br />
If one of your languages is English, there are interesting lists:<br />
<br />
* [http://ogden.basic-english.org/words.html Ogden's Basic English] (850 words)<br />
* [http://www.voanews.com/specialenglish Voice of America's Special English]<br />
<br />
Bear in mind, of course, that these lists are also based on a particular usage model of English, which is not "naturally occurring" English.<br />
<br />
The same applies for other linguistic phenomena. Linguists tend to focus<br />
on very infrequent phenomena which are key to the identity of a<br />
language, or on what is different between languages. But these "jewels"<br />
are usually not the "building blocks" you would use to build translation<br />
rules. So do not get carried away. Trust only frequencies and lots of<br />
real text.<br />
<br />
==Corpus catcher==<br />
<br />
* http://translate.sourceforge.net/wiki/corpuscatcher/index<br />
<br />
==Wikipedia dumps==<br />
<br />
* http://download.wikimedia.org/backup-index.html<br />
<br />
For help in processing them, see:<br />
<br />
* http://meta.wikimedia.org/wiki/Help:Export<br />
<br />
The dumps need cleaning up (removing Wiki syntax and XML etc.), but can<br />
provide a ''substantial'' amount of text &mdash; both for frequency analysis and<br />
as a source of sentences for POS [[tagger training]]. It can take some work, and isn't as<br />
easy as getting a nice corpus, but on the other hand they're available<br />
in some [http://meta.wikimedia.org/wiki/List_of_Wikipedias 275 languages] with at least 100 articles written in each. <br />
<br />
You'll want the one entitled "Articles, templates, image descriptions,<br />
and primary meta-pages. -- This contains current versions of article<br />
content, and is the archive most mirror sites will probably want."<br />
<br />
Something like (for Afrikaans):<br />
<br />
<pre><br />
$ bzcat afwiki-20070508-pages-articles.xml.bz2 | grep '^[A-Z]' | sed<br />
's/$/\n/g' | sed 's/\[\[.*|//g' | sed 's/\]\]//g' | sed 's/\[\[//g' |<br />
sed 's/&.*;/ /g'<br />
</pre><br />
<br />
This will give you approximately useful lists of one sentence per line<br />
(stripping out most of the extraneous formatting). Note, this presumes that your<br />
language uses the Latin alphabet; if it uses another writing system,<br />
you'll need to change that.<br />
<br />
Try something like (for Afrikaans):<br />
<br />
<pre><br />
$ bzcat afwiki-20070508-pages-articles.xml.bz2 | grep '^[A-Z]' | sed 's/$/\n/g' | <br />
sed 's/\[\[.*|//g' | sed 's/\]\]//g' | sed 's/\[\[//g' | sed 's/&.*;/ /g' | tr ' ' '\012' | <br />
sort -f | uniq -c | sort -nr > hitparade.txt<br />
</pre><br />
<br />
Once you have this 'hitparade' of words, it is first probably best to skim <br />
off the top 20,000&ndash;30,000 into a separate file.<br />
<br />
<pre><br />
$ cat hitparade.txt | head -20000 > top.lista.20000.txt<br />
</pre><br />
<br />
Now, if you already have been working on a dictionary, chances are that there<br />
will exist in this 'top list' words you have already added. You can remove word forms<br />
you are already able to analyse using (for example Afrikaans):<br />
<br />
<pre><br />
$ cat top.lista.20000.txt | apertium-destxt | lt-proc af-en.automorf.bin | apertium-retxt | grep '\/\*' > words_to_be_added.txt<br />
</pre><br />
<br />
(here <code>lt-proc af-en.automorf.bin</code> will analyse the input stream of Afrikaans words and put an asterisk * on those it doesn't recognise)<br />
<br />
For every 10 words or so you add, it's probably worth going back and repeating this step, especially <br />
for highly inflected languages &mdash; as one lemma can produce many word forms, and the wordlist<br />
is not lemmatised.<br />
<br />
==Getting cheap bilingual dictionary entries==<br />
<br />
A cheap way of getting bilingual dictionary entries between a pair of <br />
languages is as follows:<br />
<br />
First grab yourself a wordlist of ''nouns'' in language ''x''; for<br />
example, grab them out of the Apertium dictionary you are using:<br />
<br />
<pre><br />
$ cat <monolingual dictionary> | grep '<i>' | grep '__n\"' | awk -F'"' '{print $2}' <br />
</pre><br />
<br />
Next, write a basic script, something like:<br />
<br />
<pre><br />
#!/bin/sh<br />
<br />
#language to translate from<br />
LANGF=$2 <br />
#language to translate to<br />
LANGT=$3<br />
#filename of wordlist<br />
LIST=$1<br />
<br />
for LWORD in `cat $LIST`; do <br />
TEXT=`wget -q http://$LANGF.wikipedia.org/wiki/$LWORD -O - | grep 'interwiki-'$LANGT`; <br />
if [ $? -eq '0' ]; then<br />
RWORD=`echo $TEXT | <br />
cut -f4 -d'"' | cut -f5 -d'/' | <br />
python -c 'import urllib, sys; print urllib.unquote(sys.stdin.read());' |<br />
sed 's/(\w*)//g'`;<br />
echo '<e><p><l>'$LWORD'<s n="n"/></l><r>'$RWORD'<s n="n"/></r></p></e>'; <br />
fi;<br />
sleep 8;<br />
done<br />
</pre><br />
<br />
''Note: The "sleep 8" is so that we don't put undue strain on the Wikimedia servers.''<br />
<br />
If you save this as <code>iw-word.sh</code>, then you can use it at the command line:<br />
<br />
<pre><br />
$ sh iw-word.sh <wordlist> <language code from> <language code to><br />
</pre><br />
Fr example, to retrieve a bilingual wordlist from English to Afrikaans, use:<br />
<br />
<pre><br />
$ sh iw-word.sh en-af.wordlist en af<br />
</pre><br />
<br />
The method is of variable reliability. Reports of between 70% and 80% <br />
accuracy are common. It is best for unambiguous terms, but works all right where<br />
terms retain ambiguity through languages.<br />
<br />
Any correspondences produced by this method '''must''' be checked by native or <br />
fluent speakers of the language pairs in question.<br />
<br />
==Monodix==<br />
{{main|Monodix}}<br />
<br />
If the language you're working with is fairly regular, and noun inflection is quite easy (for example English or Afrikaans), then the following script may be useful:<br />
<br />
You'll need a large wordlist (of all forms, not just lemmata) and some existing paradigms. It works by first taking all singular forms out of the list, then looking for plural forms, then printing out those which have both singular and plural forms in Apertium format.<br />
<br />
''Note: These will need to be checked, as no language except Esperanto is that regular.''<br />
<br />
<pre><br />
# set this to the location of your wordlist<br />
WORDLIST=/home/spectre/corpora/afrikaans-meester-utf8.txt<br />
<br />
# set the paradigm, and the singular and plural endings.<br />
PARADIGM=sa/ak__n<br />
SINGULAR=aak<br />
PLURAL=ake<br />
# set this to the number of characters that need to be kept from the singular form.<br />
# e.g. [0:-1] means 'cut off one character', [0:-2] means 'cut off two characters' etc.<br />
ECHAR=`echo -n $SINGULAR | python -c 'import sys; print sys.stdin.read().decode("utf8")[0:-1];'<br />
<br />
PLURALS=`cat $WORDLIST | grep $PLURAL$`<br />
SINGULARS=`cat $WORDLIST | grep $SINGULAR$`<br />
CROSSOVER=""<br />
<br />
for word in $PLURALS; do <br />
SFORM=`echo $word | sed "s/$PLURAL/$SINGULAR/g"`<br />
cat $WORDLIST | grep ^$SFORM$ > /dev/null<br />
# if the form is found then append it to the list<br />
if [ $? -eq 0 ]; then<br />
CROSSOVER=$CROSSOVER" "$SFORM<br />
fi<br />
done<br />
<br />
# print out the list<br />
for pair in $CROSSOVER; do<br />
echo ' <e lm="'$pair'"><i>'`echo $pair | sed "s/$SINGULAR/$ECHAR/g"`'</i><par n="'$PARADIGM'"/></e>';<br />
done<br />
</pre><br />
<br />
==See also==<br />
<br />
* [[Crossdics|How to cross language pairs]]<br />
* [[Getting bilingual dictionaries from OmegaWiki|Getting cheap bilingual dictionaries from OmegaWiki]]<br />
<br />
==Further reading==<br />
<br />
* Mark Pagel, Quentin D. Atkinson & Andrew Meade (2007) "Frequency of word-use predicts rates of lexical evolution throughout Indo-European history". ''Nature'' 449, 665<br />
:"Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora, and accounts for approximately 50% of the variation in historical rates of lexical replacement. We propose that the frequency with which specific words are used in everyday language exerts a general and law-like influence on their rates of evolution."<br />
<br />
[[Category:Documentation]]<br />
[[Category:Writing dictionaries]]<br />
[[Category:Documentation in English]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Probl%C3%A8mes_lors_de_l%27installation&diff=30440Problèmes lors de l'installation2011-12-22T20:15:44Z<p>Objectivesea: /* PCRE ((Expressions regulières compatible POSIX) */ removed superfluous parenthesis</p>
<hr />
<div>Erreurs qui peuvent se produire durant l'[[Installation (français)|installation]] et leur solutions.<br />
<br />
== Erreurs ==<br />
<br />
=== ''Package not found'' (Paquet non trouvable) ===<br />
<br />
==== Durant l'exécution du script configure d'Apertium ====<br />
<br />
Voius pouvez rencontrer une erreur semblable à la suivante,<br />
le paquet mentionné dans l'erreur error peut varier<br />
(Ex: la liste d'erreurs ci dessous provenait de Mandriva Linux 2009).<br />
<br />
<pre><br />
checking pkg-config is at least version 0.9.0... yes<br />
checking for APERTIUM... configure: error: Package<br />
requirements (lttoolbox-3.0 >= 3.0.0 libxml-2.0 >= 2.6.17<br />
libpcre >= 6.4) were not met:<br />
<br />
No package 'lttoolbox-3.0' found<br />
<br />
Consider adjusting the PKG_CONFIG_PATH environment variable<br />
if you installed software in a non-standard prefix.<br />
<br />
Alternatively, you may set the environment variables<br />
APERTIUM_CFLAGS and APERTIUM_LIBS to avoid the need to<br />
call pkg-config. See the pkg-config man page for more details.<br />
</pre><br />
<br />
C'est parce que Apertium ne peut pas trouver l'endroit où le fichier <code>lttoolbox-3.0.pc</code> a été installé. Si vous avez installé lttoolbox (ce que vous devriez avoir fait avant de tenter d'installer Apertium) avec un prefixe (répertoire parent) non standard (ou quelquefois même dans <code>/usr/local</code>) le script configure ne sera pas capable de le trouver.<br />
<br />
D'abord, localisez ce fichier (il devrait être dans <code>$(PREFIX)/lib/pkgconfig</code>) puis lancez cette commande :<br />
<br />
<pre><br />
$ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig<br />
</pre><br />
<br />
Remplacez <code>/usr/local</code> avec le préfixe approprié.<br />
<br />
==== Durant l'exécution du script configure pour les données d'une paire de langues ====<br />
<br />
<pre><br />
checking pkg-config is at least version 0.9.0... yes<br />
checking for APERTIUM... configure: error: Package<br />
requirements (apertium-3.0 >= 3.0.0) were not met:<br />
<br />
No package 'apertium-3.0' found<br />
</pre><br />
<br />
Similaire à celle de l'exécution du script configure pour l'installation d'Apertium, mais maintenant <code>apertium-3.0.pc</code> n'est pas trouvé. Ajustez la variable d'environment PKG_CONFIG_PATH avec le bon chemin d'accès.<br />
<br />
==== Solution de contournement pour apertium-3.1 ====<br />
<br />
Si vous avez fait l'installation depuis SVN, il se peut que vous ayez récupéré apertium-3.1 qui pour certaines raisons (inconnues de l'auteur de ce paragraphe) fait planter autogen .<br />
<br />
Voici le contournement utilisé :<br />
<pre><br />
cd /usr/local/lib/pkgconfig/<br />
sudo cp lttoolbox-3.1.pc lttoolbox-3.0.pc<br />
sudo cp apertium-3.1.pc apertium-3.0.pc<br />
</pre><br />
<br />
=== ''Command not found'' (Commande non trouvable) ===<br />
<br />
==== Durant l'exécution de make pour les données d'une paire de langues ====<br />
<br />
Vous pouvez rencontrer une erreur comme la suivante lorsque vous tentez de compiler les données d'une paire de langues (exemple: en-fr):<br />
<br />
<pre><br />
$ make<br />
make all-am<br />
make[1]: Entering directory `/<path>/apertium-en-fr'<br />
apertium-validate-dictionary apertium-en-fr.en.dixtmp1<br />
make[1]: apertium-validate-dictionary: Command not found<br />
make[1]: *** [en-fr.automorf.bin] Error 127<br />
make[1]: Leaving directory `/<path>/apertium-en-fr'<br />
make: *** [all] Error 2<br />
</pre><br />
<br />
Celà ce produit parce que les applications Apertium ne sont pas trouvées à partie de votre variable PATH. Rajoutez leur chemin d'accès dans PATH (Ex pour l'interpréteur de commandes ''bash'' : <code>export PATH=$PATH:/usr/local/bin</code> en ligne de commandes ou quelque-chose de similaire dans votre fichier utilisateur .bash_profile .<br />
<br />
=== ''Shared libraries'' (Répertoires partagés de macro-instructions) ===<br />
<br />
<pre><br />
lt-comp: error while loading shared libraries:<br />
liblttoolbox3-3.0.so.0: cannot open shared object file:<br />
No such file or directory<br />
</pre><br />
<br />
C'est parce que lt-comp ne peut pas trouver où les répertoires de macro-instructions liblttoolbox sont installées. Vous pouvez avoir besoin de faire une ou plusieurs choses :<br />
<br />
# Si vous l'avez installé dans un endroit inhabituel, faites : <code>export LD_LIBRARY_PATH=/chemin/vers/cet/endroit</code><br />
# Si vous l'avez installé dans <code>/usr/local</code><br />
## Regardez si <code>/usr/local/lib</code> est dans <code>/etc/ld.so.conf</code>, si c'est le cas, lancez <code>ldconfig</code><br />
## sinon, ajoutez <code>/usr/local/lib</code> à <code>/etc/ld.so.conf</code> et relancez <code>ldconfig</code>, ou faites l'étape 1.<br />
<br />
=== PCRE (Expressions regulières compatible POSIX) ===<br />
<br />
<pre><br />
checking for pcreposix.h... no<br />
configure: error: *** unable to locate pcreposix.h include<br />
file ***<br />
</pre><br />
<br />
Vous n'avez pas les fichiers d'inclusion de la bibliothèque PCRE (Expressions regulières compatible POSIX) installés. Si vous êtes sur Debian ou Ubuntu, faites :<br />
<br />
<pre><br />
# apt-get install libpcre3-dev<br />
</pre><br />
<br />
sur Fedora, faites :<br />
<br />
<pre><br />
# yum install pcre-devel<br />
</pre><br />
<br />
=== ''Missing pair'' (Paire de langues non installée) ===<br />
<br />
<pre><br />
$ echo "Eso es un test" | apertium es-ca<br />
Error: Mode es-ca does not exist. Try one of:<br />
README<br />
</pre><br />
<br />
Il semble que vous n'avez pas la paire de langues installée. Avez-vous lancé <code>make install</code> depuis le répertoire de la paire de langues ?<br />
<br />
Si vous l'avez fait, envoyez la sortie de<br />
<br />
<pre><br />
$ cat /usr/local/bin/apertium | grep -e APERTIUM -e DEFAULT<br />
</pre><br />
<br />
... évidemment remplacez /usr/local/bin/apertium avec le chemin de $(prefix)/bin/apertium<br />
<br />
et les étapes que vous avez faites pour compiler apertium à la mailing list apertium-stuff.<br />
<br />
=== ''You don't have cg-proc installed'' (cg-proc non installée) ===<br />
<br />
''Apertium for Welsh'' (Apertium pour gallois) nécessite maintenant le paquet constraint grammar pour aider à la disambiguisation. Pour les instructions d'installation [[Apertium_et_l'outil_Constraint_Grammar_(vislcg3)#Installation_de_VISL_CG3|allez ici]]<br />
<br />
[[Category:Installation]]<br />
[[Category:Documentation en français]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Probl%C3%A8mes_lors_de_l%27installation&diff=30439Problèmes lors de l'installation2011-12-22T20:14:34Z<p>Objectivesea: Ajouté quelques traductions françaises</p>
<hr />
<div>Erreurs qui peuvent se produire durant l'[[Installation (français)|installation]] et leur solutions.<br />
<br />
== Erreurs ==<br />
<br />
=== ''Package not found'' (Paquet non trouvable) ===<br />
<br />
==== Durant l'exécution du script configure d'Apertium ====<br />
<br />
Voius pouvez rencontrer une erreur semblable à la suivante,<br />
le paquet mentionné dans l'erreur error peut varier<br />
(Ex: la liste d'erreurs ci dessous provenait de Mandriva Linux 2009).<br />
<br />
<pre><br />
checking pkg-config is at least version 0.9.0... yes<br />
checking for APERTIUM... configure: error: Package<br />
requirements (lttoolbox-3.0 >= 3.0.0 libxml-2.0 >= 2.6.17<br />
libpcre >= 6.4) were not met:<br />
<br />
No package 'lttoolbox-3.0' found<br />
<br />
Consider adjusting the PKG_CONFIG_PATH environment variable<br />
if you installed software in a non-standard prefix.<br />
<br />
Alternatively, you may set the environment variables<br />
APERTIUM_CFLAGS and APERTIUM_LIBS to avoid the need to<br />
call pkg-config. See the pkg-config man page for more details.<br />
</pre><br />
<br />
C'est parce que Apertium ne peut pas trouver l'endroit où le fichier <code>lttoolbox-3.0.pc</code> a été installé. Si vous avez installé lttoolbox (ce que vous devriez avoir fait avant de tenter d'installer Apertium) avec un prefixe (répertoire parent) non standard (ou quelquefois même dans <code>/usr/local</code>) le script configure ne sera pas capable de le trouver.<br />
<br />
D'abord, localisez ce fichier (il devrait être dans <code>$(PREFIX)/lib/pkgconfig</code>) puis lancez cette commande :<br />
<br />
<pre><br />
$ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig<br />
</pre><br />
<br />
Remplacez <code>/usr/local</code> avec le préfixe approprié.<br />
<br />
==== Durant l'exécution du script configure pour les données d'une paire de langues ====<br />
<br />
<pre><br />
checking pkg-config is at least version 0.9.0... yes<br />
checking for APERTIUM... configure: error: Package<br />
requirements (apertium-3.0 >= 3.0.0) were not met:<br />
<br />
No package 'apertium-3.0' found<br />
</pre><br />
<br />
Similaire à celle de l'exécution du script configure pour l'installation d'Apertium, mais maintenant <code>apertium-3.0.pc</code> n'est pas trouvé. Ajustez la variable d'environment PKG_CONFIG_PATH avec le bon chemin d'accès.<br />
<br />
==== Solution de contournement pour apertium-3.1 ====<br />
<br />
Si vous avez fait l'installation depuis SVN, il se peut que vous ayez récupéré apertium-3.1 qui pour certaines raisons (inconnues de l'auteur de ce paragraphe) fait planter autogen .<br />
<br />
Voici le contournement utilisé :<br />
<pre><br />
cd /usr/local/lib/pkgconfig/<br />
sudo cp lttoolbox-3.1.pc lttoolbox-3.0.pc<br />
sudo cp apertium-3.1.pc apertium-3.0.pc<br />
</pre><br />
<br />
=== ''Command not found'' (Commande non trouvable) ===<br />
<br />
==== Durant l'exécution de make pour les données d'une paire de langues ====<br />
<br />
Vous pouvez rencontrer une erreur comme la suivante lorsque vous tentez de compiler les données d'une paire de langues (exemple: en-fr):<br />
<br />
<pre><br />
$ make<br />
make all-am<br />
make[1]: Entering directory `/<path>/apertium-en-fr'<br />
apertium-validate-dictionary apertium-en-fr.en.dixtmp1<br />
make[1]: apertium-validate-dictionary: Command not found<br />
make[1]: *** [en-fr.automorf.bin] Error 127<br />
make[1]: Leaving directory `/<path>/apertium-en-fr'<br />
make: *** [all] Error 2<br />
</pre><br />
<br />
Celà ce produit parce que les applications Apertium ne sont pas trouvées à partie de votre variable PATH. Rajoutez leur chemin d'accès dans PATH (Ex pour l'interpréteur de commandes ''bash'' : <code>export PATH=$PATH:/usr/local/bin</code> en ligne de commandes ou quelque-chose de similaire dans votre fichier utilisateur .bash_profile .<br />
<br />
=== ''Shared libraries'' (Répertoires partagés de macro-instructions) ===<br />
<br />
<pre><br />
lt-comp: error while loading shared libraries:<br />
liblttoolbox3-3.0.so.0: cannot open shared object file:<br />
No such file or directory<br />
</pre><br />
<br />
C'est parce que lt-comp ne peut pas trouver où les répertoires de macro-instructions liblttoolbox sont installées. Vous pouvez avoir besoin de faire une ou plusieurs choses :<br />
<br />
# Si vous l'avez installé dans un endroit inhabituel, faites : <code>export LD_LIBRARY_PATH=/chemin/vers/cet/endroit</code><br />
# Si vous l'avez installé dans <code>/usr/local</code><br />
## Regardez si <code>/usr/local/lib</code> est dans <code>/etc/ld.so.conf</code>, si c'est le cas, lancez <code>ldconfig</code><br />
## sinon, ajoutez <code>/usr/local/lib</code> à <code>/etc/ld.so.conf</code> et relancez <code>ldconfig</code>, ou faites l'étape 1.<br />
<br />
=== PCRE ((Expressions regulières compatible POSIX) ===<br />
<br />
<pre><br />
checking for pcreposix.h... no<br />
configure: error: *** unable to locate pcreposix.h include<br />
file ***<br />
</pre><br />
<br />
Vous n'avez pas les fichiers d'inclusion de la bibliothèque PCRE (Expressions regulières compatible POSIX) installés. Si vous êtes sur Debian ou Ubuntu, faites :<br />
<br />
<pre><br />
# apt-get install libpcre3-dev<br />
</pre><br />
<br />
sur Fedora, faites :<br />
<br />
<pre><br />
# yum install pcre-devel<br />
</pre><br />
<br />
=== ''Missing pair'' (Paire de langues non installée) ===<br />
<br />
<pre><br />
$ echo "Eso es un test" | apertium es-ca<br />
Error: Mode es-ca does not exist. Try one of:<br />
README<br />
</pre><br />
<br />
Il semble que vous n'avez pas la paire de langues installée. Avez-vous lancé <code>make install</code> depuis le répertoire de la paire de langues ?<br />
<br />
Si vous l'avez fait, envoyez la sortie de<br />
<br />
<pre><br />
$ cat /usr/local/bin/apertium | grep -e APERTIUM -e DEFAULT<br />
</pre><br />
<br />
... évidemment remplacez /usr/local/bin/apertium avec le chemin de $(prefix)/bin/apertium<br />
<br />
et les étapes que vous avez faites pour compiler apertium à la mailing list apertium-stuff.<br />
<br />
=== ''You don't have cg-proc installed'' (cg-proc non installée) ===<br />
<br />
''Apertium for Welsh'' (Apertium pour gallois) nécessite maintenant le paquet constraint grammar pour aider à la disambiguisation. Pour les instructions d'installation [[Apertium_et_l'outil_Constraint_Grammar_(vislcg3)#Installation_de_VISL_CG3|allez ici]]<br />
<br />
[[Category:Installation]]<br />
[[Category:Documentation en français]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Category:English_machine_translation_examples&diff=30437Category:English machine translation examples2011-12-22T19:42:07Z<p>Objectivesea: Created page with 'Category:English'</p>
<hr />
<div>[[Category:English]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Category:Funny_machine_translation_examples&diff=30436Category:Funny machine translation examples2011-12-22T19:38:35Z<p>Objectivesea: cat</p>
<hr />
<div>[[Category:Humour]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Category:Humour&diff=30435Category:Humour2011-12-22T19:36:24Z<p>Objectivesea: cat</p>
<hr />
<div>[[Category:Top-level categories]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Format_dictionaries&diff=30434Format dictionaries2011-12-22T19:29:46Z<p>Objectivesea: deleted a superfluous closing parenthesis, replaced a period with a colon</p>
<hr />
<div>You can use the [[Apertium-dixtools]] package to format each '''<code><e></code>''' tag in the dictionary.<br />
<br />
$ apertium-dixtools '''format-1line''' <dic> <dic.out><br />
<br />
(''Note that the first character in the '''1line''' parameter is the digit '''1''' (one), not the lowercase "L".'')<br />
<br />
For example, these lines:<br />
<br />
<pre><br />
...<br />
<e><br />
<p><br />
<l>estilo<s n="n"/></l><br />
<r>estil<s n="n"/></r><br />
</p><br />
</e><br />
...<br />
</pre><br />
<br />
will be displayed in one line, instead of being indented to various levels on six lines:<br />
<br />
<pre><br />
...<br />
<e><p><l>estilo<s n="n"/></l><r>estil<s n="n"/></r></p></e><br />
...<br />
</pre><br />
<br />
This single-line format can be useful if you use [http://es.wikipedia.org/wiki/Grep grep] or any similar tool to process dictionaries.<br />
<br />
== Aligned formatting ==<br />
You can also add two parameters, namely the positon of the &lt;p&gt; element and the position of the &lt;r&gt; element.<br />
Here alignP = 10 and alignR = 50:<br />
<pre><br />
<!-- Conjunctions - Conjunctive adverb --><br />
<br />
<e> <p><l>antaŭ<b/>ol<s n="cnjadv"/></l> <r>before<s n="cnjadv"/></r></p></e><br />
<e> <p><l>tiel<b/>ke<s n="cnjadv"/></l> <r>so<b/>that<s n="cnjadv"/></r></p></e><br />
<e> <p><l>krom<b/>se<s n="cnjadv"/></l> <r>unless<s n="cnjadv"/></r></p></e><br />
<e> <p><l>dum<s n="cnjadv"/></l> <r>whereas<s n="cnjadv"/></r></p></e><br />
<e> <p><l>ĉar<s n="cnjadv"/></l> <r>because<s n="cnjadv"/></r></p></e><br />
<e r="RL"><p><l>dum<s n="cnjadv"/></l> <r>while<s n="cnjadv"/></r></p></e><br />
<e> <p><l>ĝis<s n="cnjadv"/></l> <r>until<s n="cnjadv"/></r></p></e><br />
<e> <p><l>kiam<s n="cnjadv"/></l> <r>when<s n="cnjadv"/></r></p></e><br />
<e i="yes"><p><l>kiam<s n="cnjadv"/></l> <r>as<s n="cnjadv"/></r></p></e><br />
<e> <p><l>kiel<s n="cnjadv"/></l> <r>as<s n="cnjadv"/></r></p></e><br />
<e r="LR"><p><l>pro<b/>tio<b/>ke<s n="cnjadv"/></l><r>since<s n="cnjadv"/></r></p></e><br />
</pre><br />
<br />
If either value is zero or negative, no alignment will be done.<br />
<br />
==Usage==<br />
<pre><br />
Usage: dictools format-1line [alignP alignR] <input-dic> <output-dic><br />
where alignP / alignR: column to align <p> and <r> entries. 0 = no indent.<br />
<br />
Example: ' format-1line old.dix new.dix ' will give indent à la<br />
<e><p><l>dum<s n="cnjadv"/></l><r>whereas<s n="cnjadv"/></r></p></e><br />
<br />
Example: ' format-1line 10 50 old.dix new.dix ' will give indent à la<br />
<e> <p><l>dum<s n="cnjadv"/></l> <r>whereas<s n="cnjadv"/></r></p></e><br />
<br />
Example: ' format-1line 0 50 old.dix new.dix ' will give indent à la<br />
<e><p><l>dum<s n="cnjadv"/></l> <r>whereas<s n="cnjadv"/></r></p></e><br />
<br />
Example: ' format-1line 10 0 old.dix new.dix ' will give indent à la<br />
<e> <p><l>dum<s n="cnjadv"/></l><r>whereas<s n="cnjadv"/></r></p></e><br />
</pre><br />
<br />
<br />
[[Category:Tools]]<br />
[[Category:Dixtools]]<br />
[[Category:Documentation in English]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Format_dictionaries&diff=30433Format dictionaries2011-12-22T19:25:09Z<p>Objectivesea: /* Aligned formatting */ replaced tag delimiter characters with HTML entities so they wuld display properly in text description</p>
<hr />
<div>You can use the [[Apertium-dixtools]] package) to format each '''<code><e></code>''' tag in the dictionary.<br />
<br />
$ apertium-dixtools '''format-1line''' <dic> <dic.out><br />
<br />
(''Note that the first character in the '''1line''' parameter is the digit '''1''' (one), not the lowercase "L".'')<br />
<br />
For example, these lines:<br />
<br />
<pre><br />
...<br />
<e><br />
<p><br />
<l>estilo<s n="n"/></l><br />
<r>estil<s n="n"/></r><br />
</p><br />
</e><br />
...<br />
</pre><br />
<br />
will be displayed in one line, instead of being indented to various levels on six lines.<br />
<br />
<pre><br />
...<br />
<e><p><l>estilo<s n="n"/></l><r>estil<s n="n"/></r></p></e><br />
...<br />
</pre><br />
<br />
This single-line format can be useful if you use [http://es.wikipedia.org/wiki/Grep grep] or any similar tool to process dictionaries.<br />
<br />
== Aligned formatting ==<br />
You can also add two parameters, namely the positon of the &lt;p&gt; element and the position of the &lt;r&gt; element.<br />
Here alignP = 10 and alignR = 50:<br />
<pre><br />
<!-- Conjunctions - Conjunctive adverb --><br />
<br />
<e> <p><l>antaŭ<b/>ol<s n="cnjadv"/></l> <r>before<s n="cnjadv"/></r></p></e><br />
<e> <p><l>tiel<b/>ke<s n="cnjadv"/></l> <r>so<b/>that<s n="cnjadv"/></r></p></e><br />
<e> <p><l>krom<b/>se<s n="cnjadv"/></l> <r>unless<s n="cnjadv"/></r></p></e><br />
<e> <p><l>dum<s n="cnjadv"/></l> <r>whereas<s n="cnjadv"/></r></p></e><br />
<e> <p><l>ĉar<s n="cnjadv"/></l> <r>because<s n="cnjadv"/></r></p></e><br />
<e r="RL"><p><l>dum<s n="cnjadv"/></l> <r>while<s n="cnjadv"/></r></p></e><br />
<e> <p><l>ĝis<s n="cnjadv"/></l> <r>until<s n="cnjadv"/></r></p></e><br />
<e> <p><l>kiam<s n="cnjadv"/></l> <r>when<s n="cnjadv"/></r></p></e><br />
<e i="yes"><p><l>kiam<s n="cnjadv"/></l> <r>as<s n="cnjadv"/></r></p></e><br />
<e> <p><l>kiel<s n="cnjadv"/></l> <r>as<s n="cnjadv"/></r></p></e><br />
<e r="LR"><p><l>pro<b/>tio<b/>ke<s n="cnjadv"/></l><r>since<s n="cnjadv"/></r></p></e><br />
</pre><br />
<br />
If either value is zero or negative, no alignment will be done.<br />
<br />
==Usage==<br />
<pre><br />
Usage: dictools format-1line [alignP alignR] <input-dic> <output-dic><br />
where alignP / alignR: column to align <p> and <r> entries. 0 = no indent.<br />
<br />
Example: ' format-1line old.dix new.dix ' will give indent à la<br />
<e><p><l>dum<s n="cnjadv"/></l><r>whereas<s n="cnjadv"/></r></p></e><br />
<br />
Example: ' format-1line 10 50 old.dix new.dix ' will give indent à la<br />
<e> <p><l>dum<s n="cnjadv"/></l> <r>whereas<s n="cnjadv"/></r></p></e><br />
<br />
Example: ' format-1line 0 50 old.dix new.dix ' will give indent à la<br />
<e><p><l>dum<s n="cnjadv"/></l> <r>whereas<s n="cnjadv"/></r></p></e><br />
<br />
Example: ' format-1line 10 0 old.dix new.dix ' will give indent à la<br />
<e> <p><l>dum<s n="cnjadv"/></l><r>whereas<s n="cnjadv"/></r></p></e><br />
</pre><br />
<br />
<br />
[[Category:Tools]]<br />
[[Category:Dixtools]]<br />
[[Category:Documentation in English]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Format_dictionaries&diff=30432Format dictionaries2011-12-22T19:22:30Z<p>Objectivesea: Corrected misspelling of element and clarified a potential ambiguity</p>
<hr />
<div>You can use the [[Apertium-dixtools]] package) to format each '''<code><e></code>''' tag in the dictionary.<br />
<br />
$ apertium-dixtools '''format-1line''' <dic> <dic.out><br />
<br />
(''Note that the first character in the '''1line''' parameter is the digit '''1''' (one), not the lowercase "L".'')<br />
<br />
For example, these lines:<br />
<br />
<pre><br />
...<br />
<e><br />
<p><br />
<l>estilo<s n="n"/></l><br />
<r>estil<s n="n"/></r><br />
</p><br />
</e><br />
...<br />
</pre><br />
<br />
will be displayed in one line, instead of being indented to various levels on six lines.<br />
<br />
<pre><br />
...<br />
<e><p><l>estilo<s n="n"/></l><r>estil<s n="n"/></r></p></e><br />
...<br />
</pre><br />
<br />
This single-line format can be useful if you use [http://es.wikipedia.org/wiki/Grep grep] or any similar tool to process dictionaries.<br />
<br />
== Aligned formatting ==<br />
You can also add two parameters, namely the positon of the <p> element and the position of the <r> element.<br />
Here alignP = 10 and alignR = 50:<br />
<pre><br />
<!-- Conjunctions - Conjunctive adverb --><br />
<br />
<e> <p><l>antaŭ<b/>ol<s n="cnjadv"/></l> <r>before<s n="cnjadv"/></r></p></e><br />
<e> <p><l>tiel<b/>ke<s n="cnjadv"/></l> <r>so<b/>that<s n="cnjadv"/></r></p></e><br />
<e> <p><l>krom<b/>se<s n="cnjadv"/></l> <r>unless<s n="cnjadv"/></r></p></e><br />
<e> <p><l>dum<s n="cnjadv"/></l> <r>whereas<s n="cnjadv"/></r></p></e><br />
<e> <p><l>ĉar<s n="cnjadv"/></l> <r>because<s n="cnjadv"/></r></p></e><br />
<e r="RL"><p><l>dum<s n="cnjadv"/></l> <r>while<s n="cnjadv"/></r></p></e><br />
<e> <p><l>ĝis<s n="cnjadv"/></l> <r>until<s n="cnjadv"/></r></p></e><br />
<e> <p><l>kiam<s n="cnjadv"/></l> <r>when<s n="cnjadv"/></r></p></e><br />
<e i="yes"><p><l>kiam<s n="cnjadv"/></l> <r>as<s n="cnjadv"/></r></p></e><br />
<e> <p><l>kiel<s n="cnjadv"/></l> <r>as<s n="cnjadv"/></r></p></e><br />
<e r="LR"><p><l>pro<b/>tio<b/>ke<s n="cnjadv"/></l><r>since<s n="cnjadv"/></r></p></e><br />
</pre><br />
<br />
If either value is zero or negative, no alignment will be done.<br />
<br />
==Usage==<br />
<pre><br />
Usage: dictools format-1line [alignP alignR] <input-dic> <output-dic><br />
where alignP / alignR: column to align <p> and <r> entries. 0 = no indent.<br />
<br />
Example: ' format-1line old.dix new.dix ' will give indent à la<br />
<e><p><l>dum<s n="cnjadv"/></l><r>whereas<s n="cnjadv"/></r></p></e><br />
<br />
Example: ' format-1line 10 50 old.dix new.dix ' will give indent à la<br />
<e> <p><l>dum<s n="cnjadv"/></l> <r>whereas<s n="cnjadv"/></r></p></e><br />
<br />
Example: ' format-1line 0 50 old.dix new.dix ' will give indent à la<br />
<e><p><l>dum<s n="cnjadv"/></l> <r>whereas<s n="cnjadv"/></r></p></e><br />
<br />
Example: ' format-1line 10 0 old.dix new.dix ' will give indent à la<br />
<e> <p><l>dum<s n="cnjadv"/></l><r>whereas<s n="cnjadv"/></r></p></e><br />
</pre><br />
<br />
<br />
[[Category:Tools]]<br />
[[Category:Dixtools]]<br />
[[Category:Documentation in English]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=User_talk:Objectivesea&diff=30422User talk:Objectivesea2011-12-22T12:14:17Z<p>Objectivesea: </p>
<hr />
<div>Bienvenue au Wiki d'Apertium. Merci pour vôtre aide! / Welcome to the Apertium Wiki. Thanks for your help! - [[User:Francis Tyers|Francis Tyers]] 20:06, 21 December 2011 (UTC)<br />
<br />
===General comments on copy-editing===<br />
Hmm, I was aiming for "if you're wanting" instead of "if you want", it feels more familiar. But if you think it's more standard the other way, that's fine. (And yes, I really do say "so" that much in real life) :) - [[User:Francis Tyers|Francis Tyers]] 21:08, 21 December 2011 (UTC)<br />
<br />
: I am having no trouble with the present progressive, but I am thinking it sounds a bit like an Indo-Pakistani dialectal version when not being used with an action verb. I may have removed too many instances of "so", but I like to restrict the word to where there is a causal relationship with the preceding sentence. In my day job I do verbatim transcriptions of debates in the British Columbia Legislative Assembly, and I tend to see "so" used in speeches mainly as a paragraph marker.<br>&mdash; [[User:Objectivesea|Objectivesea]] 21:16, 21 December 2011 (UTC)<br />
<br />
:: For me, "If you're wanting" is an Irishism (e.g. [http://www.mei.ie/index.php?option=com_content&task=view&id=19&Itemid=32 like here]), whereas "I am thinking"/"I am having" would be an Indo-Pakistanism. :) I tend to try and write how I speak, it's sometimes nice to hear someone's voice in their words. But I know that for second-language speakers of English it can be a problem. Btw, if you have some time, could you have a gleg at [[Starting a new language with lttoolbox]]. I wrote it today and yesterday, and it could probably do with a bit of sprucing up! - [[User:Francis Tyers|Francis Tyers]] 22:52, 21 December 2011 (UTC)<br />
<br />
::: English is my second language too (I spoke only Danish and Norwegian till I was 8), so I certainly appreciate and value regional modes of speaking it.<br />
::: I've made the edits to the page you asked me to look at, and I have e-mailed you about a possible error where my subject knowledge is inadequate.<br>&mdash; [[User:Objectivesea|Objectivesea]] 12:14, 22 December 2011 (UTC)</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=User:Objectivesea&diff=30421User:Objectivesea2011-12-22T11:13:30Z<p>Objectivesea: added cat</p>
<hr />
<div>Editor and indexer for ''Hansard,'' Legislative Assembly of British Columbia. Has studied linguistics, computer science and technical writing at Simon Fraser University, College of the Rockies, Okanagan University College and North Island College. Has worked in graphic design and prepress in British Columbia and elsewhere. Interested in web development, especially XHTML and CSS.<br />
<br />
[[Category:Users|Objectivesea]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=User:Objectivesea&diff=30420User:Objectivesea2011-12-22T11:11:55Z<p>Objectivesea: added cat</p>
<hr />
<div>Editor and indexer for ''Hansard,'' Legislative Assembly of British Columbia. Has studied linguistics, computer science and technical writing at Simon Fraser University, College of the Rockies, Okanagan University College and North Island College. Has worked in graphic design and prepress in British Columbia and elsewhere. Interested in web development, especially XHTML and CSS.<br />
<br />
[[Category:Users]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Starting_a_new_language_with_lttoolbox&diff=30419Starting a new language with lttoolbox2011-12-22T10:46:06Z<p>Objectivesea: Minor English punctuation improvements throughout</p>
<hr />
<div>{{TOCD}}<br />
:''For information on how to install lttoolbox, see [[lttoolbox]] and [[minimal installation from SVN]]''<br />
<br />
This page is going to describe how to start a new language with [[lttoolbox]]. As lttoolbox is not really suited to agglutinative languages, or languages with complex and regular morphophonology (or at least no-one has written a dictionary from scratch using lttoolbox for one of these languages yet), we're going to work on one with simpler and less regular morphology. We particularly encourage people to use lttoolbox wherever possible; it has a straightforward syntax, has some very useful features for validation and is a canonical part of Apertium, not requiring any special software to be installed.<br />
<br />
==Preliminaries==<br />
<br />
A morphological transducer in lttoolbox has typically one file, a <code>.dix</code> file. This defines both how morphemes in the language are joined together, ''morphotactics'', and how changes happen when these morphemes are joined together, ''morphographemics'' (or ''morphophonology''). For example, <br />
<br />
* Morphotactics: wolf<n><pl> → wolf + s <br />
* Morphographemics: wolf + s → wolves <br />
<br />
These two phenomena are treated in the same file.<br />
<br />
==The language==<br />
<br />
The language we will be modelling is Upper Sorbian, a Slavic language spoken in Germany. There is a limited grammar available in English [http://serbscina.w.interia.pl/iso/eindex.htm here], and that is what we will be basing our analysis on. The part of speech we're going to look at for this small tutorial is nouns. Nouns in Upper Sorbian have seven cases (nominative, genitive, dative, accusative, locative, instrumental, vocative), three numbers (singular, dual, plural) and three genders (masculine, feminine, neuter). Like other Slavic languages, the category of animacy is distinguished in the masculine.<ref>This description is simplistic; the reality is more complicated, but it will do for a tutorial.</ref><br />
<br />
===Paradigms===<br />
<br />
Here we give four example paradigms; these will form the basis of our implementation.<br />
<br />
;Masculine animate (''nan'' "father")<br />
<br />
{|class=wikitable<br />
! !! Singular !! Dual !! Plural<br />
|-<br />
| Nominative || nan || nan'''aj''' || nan'''ojo'''<br />
|-<br />
| Genitive || nan'''a''' || nan'''ow''' || nan'''ow'''<br />
|-<br />
| Dative || nan'''ej''' || nan'''omaj''' || nan'''am'''<br />
|-<br />
| Accusative || nan'''a''' || nan'''ow''' || nan'''ow'''<br />
|- <br />
| Instrumental || nan'''om''' || nan'''omaj''' || nan'''ami'''<br />
|-<br />
| Locative || nan'''je''' || nan'''omaj''' || nan'''ach'''<br />
|- <br />
| Vocative || nan'''o'''! || nan'''aj'''! || nan'''ojo'''!<br />
|-<br />
|}<br />
<br />
;Masculine inanimate (''hrěch'' "sin")<br />
<br />
The differences from the masculine animate paradigm are indicated in blue.<br />
<br />
{|class=wikitable<br />
! !! Singular !! Dual !! Plural<br />
|-<br />
| Nominative || hrěch || hrěch'''aj''' || <span style="background-color:#cceeff">hrěch'''i'''</span><br />
|-<br />
| Genitive || hrěch'''a''' || hrěch'''ow''' || hrěch'''ow'''<br />
|-<br />
| Dative || hrěch'''ej''' || hrěch'''omaj''' || hrěch'''am'''<br />
|-<br />
| Accusative || <span style="background-color:#cceeff">hrěch</span> || <span style="background-color:#cceeff">hrěch'''aj'''</span> || <span style="background-color:#cceeff">hrěch'''i'''</span><br />
|- <br />
| Instrumental || hrěch'''om''' || hrěch'''omaj''' || hrěch'''ami'''<br />
|-<br />
| Locative || <span style="background-color:#cceeff">hrěch'''u'''</span> || hrěch'''omaj''' || hrěch'''ach'''<br />
|- <br />
| Vocative || hrěch'''o'''! || hrěch'''aj'''! || <span style="background-color:#cceeff">hrěch'''i'''</span>!<br />
|-<br />
|}<br />
<br />
;Feminine (''wróna'' "crow")<br />
<br />
The parts in common with the masculine paradigms are highlighted in green.<br />
<br />
{|class=wikitable<br />
! !! Singular !! Dual !! Plural<br />
|-<br />
| Nominative || wrón'''a''' || wrón'''je''' || wrón'''y'''<br />
|-<br />
| Genitive || wrón'''u''' || <span style="background-color:#ccffcc">wrón'''ow'''</span> || <span style="background-color:#ccffcc">wróna'''ow'''</span><br />
|-<br />
| Dative || wrón'''je''' || <span style="background-color:#ccffcc">wrón'''omaj'''</span> || <span style="background-color:#ccffcc">wróna'''am'''</span><br />
|-<br />
| Accusative || wrón'''u''' || wrón'''je''' || wrón'''y'''<br />
|- <br />
| Instrumental || wrón'''u''' || <span style="background-color:#ccffcc">wrón'''omaj'''</span> || <span style="background-color:#ccffcc">wróna'''ami'''</span><br />
|-<br />
| Locative || wrón'''je''' || <span style="background-color:#ccffcc">wrón'''omaj'''</span> || <span style="background-color:#ccffcc">wróna'''ach'''</span><br />
|- <br />
| Vocative || wrón'''a'''! || wrón'''je'''! || wrón'''u'''!<br />
|-<br />
|}<br />
<br />
;Neuter (''trašidło'' "monster")<br />
<br />
Forms in common with both the masculine and feminine paradigms are highlighted in red.<br />
<br />
{|class=wikitable<br />
! !! Singular !! Dual !! Plural<br />
|-<br />
| Nominative || trašidł'''o''' || trašidł'''e''' || trašidł'''a'''<br />
|-<br />
| Genitive || trašidł'''a''' || <span style="background-color:#ffcccc">trašidł'''ow'''</span> || <span style="background-color:#ffcccc">trašidł'''ow'''</span><br />
|-<br />
| Dative || trašidł'''u''' || <span style="background-color:#ffcccc">trašidł'''omaj'''</span> || <span style="background-color:#ffcccc">trašidł'''am'''</span><br />
|-<br />
| Accusative || trašidł'''o''' || trašidł'''e''' || trašidł'''a'''<br />
|- <br />
| Instrumental || trašidł'''om''' || <span style="background-color:#ffcccc">trašidł'''omaj'''</span> || <span style="background-color:#ffcccc">trašidł'''ami'''</span><br />
|-<br />
| Locative || trašidł'''e''' || <span style="background-color:#ffcccc">trašidł'''omaj'''</span> || <span style="background-color:#ffcccc">trašidł'''ach'''</span><br />
|- <br />
| Vocative || trašidł'''o'''! || trašidł'''e'''! || trašidł'''a'''!<br />
|-<br />
|}<br />
<br />
==Lexicon==<br />
<br />
Given the description above, how do we start to write a morphological description in [[lttoolbox]]? Well, first we start with our filename, <code>hsb.dix</code>, so open up a text editor and save an empty document with that name.<br />
<br />
===The basics===<br />
<br />
;The skeleton<br />
<br />
The basic skeleton of an lttoolbox dictionary looks like the following:<br />
<br />
<pre><br />
<br />
<dictionary><br />
<alphabet>abc...</alphabet><br />
<sdefs><br />
...<br />
</sdefs><br />
<pardefs><br />
...<br />
</pardefs><br />
<section id="main" type="standard"><br />
...<br />
</section><br />
</dictionary><br />
<br />
</pre><br />
<br />
So type this up into the file, and this gives the outline of our the main parts of our morphology: the alphabet (used for tokenisation); the symbols (or ''tags''), which give us useful mnemonics for grammatical features; the {{tag|pardefs}} section, which gives our inflectional paradigms; and finally the main section of the file, which contains our lexical items.<br />
<br />
;Symbol (tag) definitions<br />
<br />
The first thing we'll start with is the list of symbols which are going to encode our grammatical features (part-of-speech, gender, number, case). The page [[list of symbols]] gives some common tags in Apertium. Generally we try and keep features which are named the same thing among languages tagged the same; thus, for example, the tag for "nominative" will be {{tag|nom}}, regardless of if we are talking about Romanian, Serbo-Croatian, Icelandic or Albanian. Symbols are defined in the {{tag|sdefs}} section with {{tag|sdef}} elements.<br />
<br />
<pre><br />
<br />
<sdefs><br />
<sdef n="n" c="Noun"/><br />
<br />
<sdef n="ma" c="Masculine (animate)"/><br />
<sdef n="mi" c="Masculine (inanimate)"/><br />
<sdef n="nt" c="Neuter"/><br />
<sdef n="f" c="Feminine"/><br />
<br />
<sdef n="sg" c="Singular"/><br />
<sdef n="du" c="Dual"/><br />
<sdef n="pl" c="Plural"/><br />
<br />
<sdef n="nom" c="Nominative"/><br />
<sdef n="gen" c="Genitive"/><br />
<sdef n="dat" c="Dative"/><br />
<sdef n="acc" c="Accusative"/><br />
<sdef n="ins" c="Instrumental"/><br />
<sdef n="loc" c="Locative"/><br />
<sdef n="voc" c="Vocative"/><br />
</sdefs><br />
<br />
</pre><br />
<br />
The <code>c</code> after each symbol definition stands for comment and is optional but quite convenient if you have a lot of tags and want a quick reference to what they mean.<br />
<br />
;Our first paradigm<br />
<br />
After we've defined our symbols, then the next thing to do is to write our first paradigm. We'll start with the paradigm for ''nan'' "father". There is a convention in Apertium that each major paradigm identifier is made up of at least the name of an exemplar word and its part of speech. In this case we will also add the gender.<br />
<br />
A paradigm is made up of a series of entries. Each entry has a ''pair'' ({{tag|p}}), which in turn has a ''left'' side ({{tag|l}}) and a ''right'' side ({{tag|r}}). Normally, the [[surface form]] is found on the left and the [[lexical form]] on the right.<br />
<br />
We can use the symbols we defined earlier with {{tag|sdef}} tags by calling them with the {{tag|s}} element.<br />
<br />
<pre><br />
<pardefs><br />
<pardef n="nan__n_ma"><br />
<e><p><l></l><r><s n="n"/><s n="ma"/><s n="sg"/><s n="nom"/></r></p></e><br />
<e><p><l>a</l><r><s n="n"/><s n="ma"/><s n="sg"/><s n="gen"/></r></p></e><br />
<e><p><l>ej</l><r><s n="n"/><s n="ma"/><s n="sg"/><s n="dat"/></r></p></e><br />
<e><p><l>a</l><r><s n="n"/><s n="ma"/><s n="sg"/><s n="acc"/></r></p></e><br />
<e><p><l>om</l><r><s n="n"/><s n="ma"/><s n="sg"/><s n="ins"/></r></p></e><br />
<e><p><l>je</l><r><s n="n"/><s n="ma"/><s n="sg"/><s n="loc"/></r></p></e><br />
<e><p><l>o</l><r><s n="n"/><s n="ma"/><s n="sg"/><s n="voc"/></r></p></e><br />
<br />
<e><p><l>aj</l><r><s n="n"/><s n="ma"/><s n="du"/><s n="nom"/></r></p></e><br />
<e><p><l>ow</l><r><s n="n"/><s n="ma"/><s n="du"/><s n="gen"/></r></p></e><br />
<e><p><l>omaj</l><r><s n="n"/><s n="ma"/><s n="du"/><s n="dat"/></r></p></e><br />
<e><p><l>ow</l><r><s n="n"/><s n="ma"/><s n="du"/><s n="acc"/></r></p></e><br />
<e><p><l>omaj</l><r><s n="n"/><s n="ma"/><s n="du"/><s n="ins"/></r></p></e><br />
<e><p><l>omaj</l><r><s n="n"/><s n="ma"/><s n="du"/><s n="loc"/></r></p></e><br />
<e><p><l>aj</l><r><s n="n"/><s n="ma"/><s n="du"/><s n="voc"/></r></p></e><br />
<br />
<e><p><l>ojo</l><r><s n="n"/><s n="ma"/><s n="pl"/><s n="nom"/></r></p></e><br />
<e><p><l>ow</l><r><s n="n"/><s n="ma"/><s n="pl"/><s n="gen"/></r></p></e><br />
<e><p><l>am</l><r><s n="n"/><s n="ma"/><s n="pl"/><s n="dat"/></r></p></e><br />
<e><p><l>ow</l><r><s n="n"/><s n="ma"/><s n="pl"/><s n="acc"/></r></p></e><br />
<e><p><l>ami</l><r><s n="n"/><s n="ma"/><s n="pl"/><s n="ins"/></r></p></e><br />
<e><p><l>ach</l><r><s n="n"/><s n="ma"/><s n="pl"/><s n="loc"/></r></p></e><br />
<e><p><l>ojo</l><r><s n="n"/><s n="ma"/><s n="pl"/><s n="voc"/></r></p></e><br />
</pardef><br />
</pardefs><br />
</pre><br />
<br />
;Using the paradigm<br />
<br />
Now that we've defined a paradigm, we can add a word that uses it. The obvious choice is "nan", being as that is the name of the paradigm.<br />
<br />
<pre><br />
<br />
<section id="main" type="standard"><br />
<e lm="nan"><i>nan</i><par n="nan__n_ma"/></e> <br />
</section><br />
<br />
</pre><br />
<br />
The {{tag|e}} element is the same as in the paradigm, but in the case of lexical entries (as opposed to morphological entries), it commonly contains the attribute <code>lm</code> "lemma". The {{tag|i}} tag stands for "invariant" and means that the left side is the same as the right side.<br />
<br />
So by this point we should have a whole dictionary with a single word in it. Save the file.<br />
<br />
===Compiling===<br />
<br />
Once you've saved the file, you can go to the command line and try to validate it. Presuming that the file is called <code>hsb.dix</code>, then the following will check it against the definition:<br />
<br />
<pre><br />
$ apertium-validate-dictionary hsb.dix <br />
</pre><br />
<br />
If the dictionary is valid, you should get no output.<br />
<br />
This is a major benefit over related software (e.g. [[HFST]]). If you leave out a symbol definition, then you will get an angry message from the validation script, such as the following:<br />
<br />
<pre><br />
$ apertium-validate-dictionary hsb.dix <br />
hsb.dix:25: element s: validity error : IDREF attribute n references an unknown ID "nom"<br />
hsb.dix:33: element s: validity error : IDREF attribute n references an unknown ID "nom"<br />
hsb.dix:41: element s: validity error : IDREF attribute n references an unknown ID "nom"<br />
Document hsb.dix does not validate against /home/fran/local/share/apertium/dix.dtd<br />
</pre><br />
<br />
In this case, it's best to go back and check that all your symbols are defined.<br />
<br />
Assuming that our dictionary is valid, we can move to the next step and compile it.<br />
<br />
<pre><br />
$ lt-comp lr hsb.dix hsb-mor.bin<br />
main@standard 29 45<br />
<br />
$ lt-comp rl hsb.dix hsb-gen.bin<br />
main@standard 29 45<br />
</pre><br />
<br />
The <code>lr</code> and <code>rl</code> in the compilation command stand for "left to right" and "right to left", respectively. Presuming that we have our surface form on the left and our lexical form on the right, compiling <code>lr</code> will make a morphological ''analyser'', and compiling <code>rl</code> will make a ''generator''.<br />
<br />
===Usage===<br />
{{see-also|lttoolbox}}<br />
We can then test them both as follows:<br />
<br />
<pre><br />
$ echo "nanow" | lt-proc hsb-mor.bin <br />
^nanow/nan<n><ma><du><gen>/nan<n><ma><du><acc>/nan<n><ma><pl><gen>/nan<n><ma><pl><acc>$<br />
<br />
$ echo "^nan<n><ma><pl><gen>$" | lt-proc -g hsb-gen.bin <br />
nanow<br />
</pre><br />
<br />
To get a full listing of the dictionary, the command <code>lt-expand</code> can be used:<br />
<br />
<pre><br />
$ lt-expand hsb.dix <br />
nan:nan<n><ma><sg><nom><br />
nana:nan<n><ma><sg><gen><br />
nanej:nan<n><ma><sg><dat><br />
nana:nan<n><ma><sg><acc><br />
nanom:nan<n><ma><sg><ins><br />
nanje:nan<n><ma><sg><loc><br />
nano:nan<n><ma><sg><voc><br />
nanaj:nan<n><ma><du><nom><br />
nanow:nan<n><ma><du><gen><br />
...<br />
</pre><br />
<br />
We've got everything in place for building the dictionary. Now on to our next word.<br />
<br />
==Organising paradigms==<br />
<br />
The obvious thing to do when adding the word ''hrěch'' "sin" would be to duplicate the <code>nan__n_ma</code> paradigm but change the gender and the surface forms, which are different. Then we would end up with a new paradigm, something like:<br />
<br />
<pre><br />
<pardef n="hrěch__n_mi"><br />
<e><p><l></l><r><s n="n"/><s n="mi"/><s n="sg"/><s n="nom"/></r></p></e><br />
<e><p><l>a</l><r><s n="n"/><s n="mi"/><s n="sg"/><s n="gen"/></r></p></e><br />
<e><p><l>ej</l><r><s n="n"/><s n="mi"/><s n="sg"/><s n="dat"/></r></p></e><br />
<e><p><l></l><r><s n="n"/><s n="mi"/><s n="sg"/><s n="acc"/></r></p></e><br />
<e><p><l>om</l><r><s n="n"/><s n="mi"/><s n="sg"/><s n="ins"/></r></p></e><br />
<e><p><l>u</l><r><s n="n"/><s n="mi"/><s n="sg"/><s n="loc"/></r></p></e><br />
<e><p><l>o</l><r><s n="n"/><s n="mi"/><s n="sg"/><s n="voc"/></r></p></e><br />
<br />
<e><p><l>aj</l><r><s n="n"/><s n="mi"/><s n="du"/><s n="nom"/></r></p></e><br />
<e><p><l>ow</l><r><s n="n"/><s n="mi"/><s n="du"/><s n="gen"/></r></p></e><br />
<e><p><l>omaj</l><r><s n="n"/><s n="mi"/><s n="du"/><s n="dat"/></r></p></e><br />
<e><p><l>oj</l><r><s n="n"/><s n="mi"/><s n="du"/><s n="acc"/></r></p></e><br />
<e><p><l>omaj</l><r><s n="n"/><s n="mi"/><s n="du"/><s n="ins"/></r></p></e><br />
<e><p><l>omaj</l><r><s n="n"/><s n="mi"/><s n="du"/><s n="loc"/></r></p></e><br />
<e><p><l>aj</l><r><s n="n"/><s n="mi"/><s n="du"/><s n="voc"/></r></p></e><br />
<br />
<e><p><l>i</l><r><s n="n"/><s n="mi"/><s n="pl"/><s n="nom"/></r></p></e><br />
<e><p><l>ow</l><r><s n="n"/><s n="mi"/><s n="pl"/><s n="gen"/></r></p></e><br />
<e><p><l>am</l><r><s n="n"/><s n="mi"/><s n="pl"/><s n="dat"/></r></p></e><br />
<e><p><l>i</l><r><s n="n"/><s n="mi"/><s n="pl"/><s n="acc"/></r></p></e><br />
<e><p><l>ami</l><r><s n="n"/><s n="mi"/><s n="pl"/><s n="ins"/></r></p></e><br />
<e><p><l>ach</l><r><s n="n"/><s n="mi"/><s n="pl"/><s n="loc"/></r></p></e><br />
<e><p><l>i</l><r><s n="n"/><s n="mi"/><s n="pl"/><s n="voc"/></r></p></e><br />
</pardef><br />
</pre><br />
<br />
We add an entry in the main section:<br />
<br />
<pre><br />
<e lm="hrěch"><i>hrěch</i><par n="hrěch__n_mi"/></e> <br />
</pre><br />
<br />
All is fine, and it's a good place to start, but if we look at the tables above, the paradigm for ''nan'' and the paradigm for ''hrěch'' share many suffixes. We can call paradigms from other paradigms, so why should we duplicate them? <br />
<br />
As an alternative, the first thing we do is to split out the common suffixes into a separate paradigm. Let's call it <code>common__m</code> (for common masculine suffixes).<br />
<br />
<pre><br />
<pardef n="common__m"><br />
<e><p><l></l><r><s n="sg"/><s n="nom"/></r></p></e><br />
<e><p><l>a</l><r><s n="sg"/><s n="gen"/></r></p></e><br />
<e><p><l>ej</l><r><s n="sg"/><s n="dat"/></r></p></e><br />
<e><p><l>om</l><r><s n="sg"/><s n="ins"/></r></p></e><br />
<e><p><l>o</l><r><s n="sg"/><s n="voc"/></r></p></e><br />
<br />
<e><p><l>aj</l><r><s n="du"/><s n="nom"/></r></p></e><br />
<e><p><l>ow</l><r><s n="du"/><s n="gen"/></r></p></e><br />
<e><p><l>omaj</l><r><s n="du"/><s n="dat"/></r></p></e><br />
<e><p><l>omaj</l><r><s n="du"/><s n="ins"/></r></p></e><br />
<e><p><l>omaj</l><r><s n="du"/><s n="loc"/></r></p></e><br />
<br />
<e><p><l>aj</l><r><s n="du"/><s n="voc"/></r></p></e><br />
<e><p><l>ow</l><r><s n="pl"/><s n="gen"/></r></p></e><br />
<e><p><l>am</l><r><s n="pl"/><s n="dat"/></r></p></e><br />
<e><p><l>ami</l><r><s n="pl"/><s n="ins"/></r></p></e><br />
<e><p><l>ach</l><r><s n="pl"/><s n="loc"/></r></p></e><br />
</pardef><br />
</pre><br />
<br />
(Note: We don't include the part of speech or gender, as that is different depending on the lemma.)<br />
<br />
Now with this "common" paradigm available, we can simplify both the <code>nan__n_ma</code> and <code>hrěch__n_mi</code> paradigms, thusly:<br />
<br />
<pre><br />
<pardef n="nan__n_ma"><br />
<e><p><l></l><r><s n="n"/><s n="ma"/></r></p><par n="common__m"/></e><br />
<e><p><l>a</l><r><s n="n"/><s n="ma"/><s n="sg"/><s n="acc"/></r></p></e><br />
<e><p><l>je</l><r><s n="n"/><s n="ma"/><s n="sg"/><s n="loc"/></r></p></e><br />
<br />
<e><p><l>ow</l><r><s n="n"/><s n="ma"/><s n="du"/><s n="acc"/></r></p></e><br />
<br />
<e><p><l>ojo</l><r><s n="n"/><s n="ma"/><s n="pl"/><s n="nom"/></r></p></e><br />
<e><p><l>ow</l><r><s n="n"/><s n="ma"/><s n="pl"/><s n="acc"/></r></p></e><br />
<e><p><l>ojo</l><r><s n="n"/><s n="ma"/><s n="pl"/><s n="voc"/></r></p></e><br />
</pardef><br />
<br />
<pardef n="hrěch__n_mi"><br />
<e><p><l></l><r><s n="n"/><s n="mi"/></r></p><par n="common__m"/></e><br />
<e><p><l></l><r><s n="n"/><s n="mi"/><s n="sg"/><s n="acc"/></r></p></e><br />
<e><p><l>u</l><r><s n="n"/><s n="mi"/><s n="sg"/><s n="loc"/></r></p></e><br />
<br />
<e><p><l>oj</l><r><s n="n"/><s n="mi"/><s n="du"/><s n="acc"/></r></p></e><br />
<br />
<e><p><l>i</l><r><s n="n"/><s n="mi"/><s n="pl"/><s n="nom"/></r></p></e><br />
<e><p><l>i</l><r><s n="n"/><s n="mi"/><s n="pl"/><s n="acc"/></r></p></e><br />
<e><p><l>i</l><r><s n="n"/><s n="mi"/><s n="pl"/><s n="voc"/></r></p></e><br />
</pardef><br />
</pre><br />
<br />
Factoring out common suffixes makes paradigms more maintainable but also more complicated to understand. The features of the language, the depth of the description and the intuitions of the person writing the dictionary will dictate to what extent parts can be factored out in this way.<br />
<br />
Now try and add the other two words to the dictionary, along with their inflectional paradigms. A solution can be found on the [[Talk:Starting a new language with lttoolbox|talk page]].<br />
<br />
You can also try adding the alternative forms (for example ''hrěchu'' as a possible genitive singular of ''hrěch'').<br />
<br />
==Notes==<br />
<references/><br />
<br />
==Further reading==<br />
<br />
==See also==<br />
<br />
* [[Monodix basics]]<br />
* [[Starting a new language with HFST]]<br />
<br />
[[Category:Documentation]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Morphological_dictionary&diff=30403Morphological dictionary2011-12-21T22:14:02Z<p>Objectivesea: Mainly punctuation improvements; corrected apparent error in caption for ''Image:Finite-state transducer wound wounds.svg''; changed ''on the right'' to''above'' to prevent confusion</p>
<hr />
<div>{{TOCD}}<br />
<br />
This page intends to describe how and why the '''morphological dictionaries''' in Apertium work the way they do. Descriptions will be presented, along with examples in code. It is hoped that this will help people understand how the dictionaries work without needing to wade through pages of equations. Morphological dictionaries in Apertium (or more properly [[lttoolbox]]) are based on finite-state transducer technology; in this way they can also be referred to as ''lexical transducers''. The task of a morphological dictionary is to model the rules that govern the internal structure of words in a language. <br />
<br />
For example, speakers of English realise that the words "dog" and "dogs" are related, that "dogs" is to "dog" as "cats" is to "cat". The rules understood by the speaker reflect specific patterns and regularities in the way in which words are formed from smaller units and how those smaller units interact. <br />
<br />
Finite-state transducers are not the only way to model these rules. It is also possible to write the rules in scripting languages such as perl or python, or as a lexer (examples include the Buckwalter morphological analyser for Arabic, or IceMorphy for Icelandic). There are however a number of downsides to this method:<br />
<br />
* The analysers created are not reversible; that is, you cannot use the same model to analyse ''and'' generate.<br />
* As the rule content may be both imperative and declarative, programs can be more complicated for non-experts to understand and edit.<br />
<br />
In contrast, finite-state transducers are both reversible (the same description can be used for both analysis and generation) and declarative (a ''description'' of the morphological rules is separate from the algorithm which processes them). Note that analysers may also be described as decorated tries or finite-state acceptors (for example [[hunmorph]]); this may be declarative but non-reversible (i.e. not applicable to generation).<br />
[[Image:Finite-state acceptor beer.svg|right|thumb|125px|A finite-state acceptor for the string "beer".]]<br />
==Finite-state automata==<br />
<br />
To start with, it is worth defining a finite-state automaton and how the two main types differ. This will not be an exhaustive description, just an overview so that the difference can be distinguished for the purposes of this article. To begin with, some terminology; if you are familiar with graphs (as in the data structure), this might help. A finite-state automaton can be visualised as a graph, with the nodes representing '''states''' and the arcs representing '''transitions'''. <br />
<br />
===Acceptors===<br />
<br />
A finite-state acceptor (or '''recogniser'''), as seen to the right, is an automaton which accepts or rejects input strings. Taking the example at the right, we can see the automaton has:<br />
<br />
* a number of possible '''input characters''', or '''alphabet''' (the characters 'b', 'e' and 'r'), denoted as <math>\Sigma</math><br />
* a '''start state'''; in formal definitions this is usually labelled <math>s_0</math><br />
* a number of intermediate '''states''', often denoted by <math>S</math><br />
* two '''final states''', denoted by <math>F</math><br />
* a number of '''transitions'''<br />
<br />
We can crudely emulate this in a programming language such as python in order to get an idea of the behaviour of these automata.<br />
<br />
<div style="padding: 1em;border: 1px dashed #2f6fab;color: black;background-color: #f9f9f9;line-height: 1.1em; font-size: 85%"><br />
<source lang="python"><br />
states = ['b', 'e', 'e', 'r']; # Set of states<br />
current_state = 0; # Set current state to start state <br />
<br />
c = sys.stdin.read(1);<br />
<br />
while c: # Input loop<br />
if current_state == len(states): # If we've reached the final state<br />
sys.stdout.write('Yes');<br />
sys.exit(0);<br />
elif c == states[current_state]: # If the input matches the value of the current state<br />
current_state += 1;<br />
else: # If the input does not match the current state and we're not final<br />
sys.stdout.write('No');<br />
sys.exit(1);<br />
<br />
c = sys.stdin.read(1);<br />
</source><br />
</div><br />
When the input on <code>stdin</code> is "beer", output {{sc|yes}}; otherwise output {{sc|no}},<br />
[[Image:Finite-state acceptor be ast r.svg|thumb|125px|right|This finite-state acceptor will accept any string defined by the regular expression <code>be*r</code>, i.e. br, ber, beer, beeer, beeeer, ...]]<br />
<pre><br />
$ echo "beer" | python fsa.py<br />
Yes<br />
<br />
$ echo "bee" | python fsa.py<br />
No<br />
</pre><br />
It is worth noting that a finite-state acceptor can accept any string that can be defined by a regular expression. For example, if we want to accept the expression <code>bee*r</code>, which can also be written <code>be+r</code> &mdash; that is, "b" followed by one or more "e" followed by "r" (e.g. ber, beer, beeer, ...) &mdash; we could do it with a finite-state acceptor. Finite-state acceptors can be used in applications such as spell-checking, where one of the basic tasks is to check if a word exists or not in a list of words. Using an acceptor is more efficient than the equivalent list, for reasons which will be outlined below.<br />
<br />
<br />
{{comment|:fill in more formal stuff here}}<br />
<br />
===Transducers===<br />
[[Image:Finite-state transducer beer.svg|thumb|right|125px|A finite-state transducer for the strings "beer" and "beers"; the output is the ''lemma'' of the word, "beer", the ''part-of-speech'', <code><n></code> for "noun" and then the number, either singular (<code><sg></code>) or plural (<code><pl></code>).]]<br />
Whilst acceptors are useful, for morphological analysis we need something that will give us an output for a given input. For example, given a [[surface form]] of a word, it will give us the [[lexical form]] (analysis); or given the lexical form, it will give us the surface form (generation). For this, we need a '''transducer'''. A transducer is very much like an acceptor, with this main difference: instead of each transition consuming a character from the input, each transition consumes a character and outputs a character. So instead of having a symbol on each arc, we have a tuple, input and output (see diagram to the right).<br />
<br />
The diagram to the right shows a finite-state transducer<ref>In particular this is a "letter transducer"; that is, each transition is modelled as an arc between two letters in the alphabet</ref> for the strings "beer" and "beers". This transducer has:<br />
<br />
* an '''input alphabet''', <math>\Sigma</math> (the characters 'b', 'e', 'r' and 's')<br />
* an '''output alphabet''', <math>\Gamma</math> (the characters 'b', 'e', 'r' and the multi-character symbols <code><n></code>, <code><sg></code> and <code><pl></code>)<br />
* a '''start state''', <math>s_0</math><br />
* a number of intermediate '''states''', <math>S</math><br />
* a set of '''final states''', <math>F</math><br />
* a number of '''transitions'''<br />
<br />
Note how in the diagram, the "non-accepting" state ({{sc|no}}) has been left out.<br />
<br />
Again, we can emulate this transducer with some python code: <br />
<br />
<div style="padding: 1em;border: 1px dashed #2f6fab;color: black;background-color: #f9f9f9;line-height: 1.1em; font-size: 85%"><br />
<source lang="python"><br />
transitions = {(0,'b'):1, (1,'e'):2, (2,'e'):3, (3,'r'):4, (4,''):5, (4,'s'):6, (5,''):7, (6,''):7};<br />
states = {0:'b', 1:'e', 2:'e', 3:'r', 4:'<n>', 5:'<sg>', 6:'<pl>', 7:''};<br />
<br />
current_state = 0; # Start state<br />
<br />
def step(state, symbol): # The current state and input symbol<br />
sys.stdout.write(states[state]); # Print the output symbol of the transition<br />
return transitions[(state, symbol)]; # Return the next state<br />
<br />
c = sys.stdin.read(1);<br />
while states[current_state] != '': # While we aren't in a final state<br />
current_state = step(current_state, c); # Step to the next state<br />
<br />
c = sys.stdin.read(1).replace('\n', '');<br />
</source><br />
</div><br />
<br />
<pre><br />
$ echo "beer" | python fst.py <br />
beer<n><sg><br />
<br />
$ echo "beers" | python fst.py <br />
beer<n><pl><br />
</pre><br />
<br />
====Determinism====<br />
[[Image:Finite-state transducer wound wounds.svg|thumb|200px|right|A non-deterministic finite-state transducer for three strings: wind, wound, wounds.]]<br />
The above transducer is '''deterministic'''; that is, for any given state and input symbol, it can only pass on to one state, or accept. In contrast, a '''non-deterministic''' transducer (as on the right) is one which from a given state and input symbol can branch to more than one state. That is, it can take more than one path through the transducer simultaneously. Note that although the transducer on the right branches, it branches in a deterministic way; after consuming the string "beer", it can either step to <code>(s:<n>)</code> if the next input symbol is 's', or <code>(θ:<n>)</code> otherwise.<br />
<br />
A non-deterministic finite-state transducer is modelled as:<br />
<br />
<br />
<br />
<br />
The benefit of non-deterministic transducers over deterministic transducers is that they allow us to capture the ambiguity inherent in human language. For example the word "wound" in English can be a noun ("I have a wound") or a verb ("He wound the clock", "He wounded me", or "The clock was wound"). The following code implements the transducer on the right, for the following parts of a dictionary:<br />
<pre><br />
<e><p><l>wound</l><r>wound<s n="n"/><s n="sg"/></r></p></e><br />
<e><p><l>wounds</l><r>wound<s n="n"/><s n="pl"/></r></p></e><br />
<e><p><l>wound</l><r>wind<s n="vblex"/><s n="pp"/></r></p></e><br />
</pre><br />
<br />
<br />
<div style="padding: 1em;border: 1px dashed #2f6fab;color: black;background-color: #f9f9f9;line-height: 1.1em; font-size: 85%"><br />
<source lang="python"><br />
states = set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]);<br />
<br />
transitions = {<br />
(0,'w'):[('w',1)],<br />
(1,'o'):[('i',2), ('o',3)],<br />
(2,'u'):[('',4)], (4,'n'):[('n',5)], (5,'d'):[('d',6)], (6,''):[('<vblex>',7)], (7,''):[('<pp>',8)],<br />
(3,'u'):[('u',9)], (9,'n'):[('n',10)], (10,'d'):[('d',11)], <br />
<br />
(11,''):[('<n>',12)], <br />
(11,'s'):[('<n>',13)],<br />
<br />
(12,''):[('<sg>',8)],<br />
(13,''):[('<pl>',8)]<br />
};<br />
<br />
initial_state = 0;<br />
accepting_states = set([8]);<br />
current_states = set([initial_state]); # Set containing the set of current states<br />
state_output_pairs = {}; # A structure to contain the list of "alive state-output pairs" <br />
state_output_pairs[0] = set([('', 0)]); <br />
accepting_output_pairs = set(); # The set of state-output pairs that are accepting<br />
<br />
input = c = sys.stdin.read(1);<br />
<br />
def closure(S, reached_states): # Calculate epsilon closure over state S<br />
global state_output_pairs;<br />
<br />
if S not in state_output_pairs:<br />
state_output_pairs[S] = set();<br />
<br />
if (S, '') in transitions:<br />
for state in transitions[(S, '')]:<br />
reached_states.add(state[1]);<br />
<br />
if state[1] not in state_output_pairs:<br />
state_output_pairs[state[1]] = set();<br />
<br />
for pair in state_output_pairs[S]:<br />
state_output_pairs[state[1]].add((pair[0] + state[0], state[1]));<br />
<br />
closure(state[1], reached_states);<br />
<br />
return reached_states;<br />
<br />
def step(S, c): # Step the transducer<br />
global accepting_states, state_output_pairs;<br />
reached_states = set();<br />
<br />
if S in accepting_states:<br />
return set([S]);<br />
<br />
if (S, c) in transitions: <br />
for state in transitions[(S, c)]:<br />
closure(state[1], reached_states);<br />
reached_states.add(state[1]);<br />
<br />
if state[1] not in state_output_pairs:<br />
state_output_pairs[state[1]] = set();<br />
<br />
for pair in state_output_pairs[S]:<br />
state_output_pairs[state[1]].add((pair[0] + state[0], state[1]));<br />
<br />
closure(state[1], reached_states);<br />
<br />
return reached_states;<br />
<br />
while c != '': # Loop until no input remains<br />
reached_states = set();<br />
<br />
for state in current_states:<br />
if state not in state_output_pairs:<br />
state_output_pairs[state] = set();<br />
reached_states |= step(state, c);<br />
del state_output_pairs[state];<br />
<br />
current_states = reached_states;<br />
<br />
c = sys.stdin.read(1).replace('\n','');<br />
input += c;<br />
<br />
print('^' + '/'.join([input] + [analysis[0] for analysis in state_output_pairs[8]]) + '$')<br />
</source><br />
</div><br />
<br />
<pre><br />
$ echo "wound" | python nfst.py <br />
^wound/wind<vblex><pp>/wound<n><sg>$<br />
<br />
$ echo "wounds" | python nfst.py <br />
^wounds/wound<n><pl>$<br />
</pre><br />
<br />
A nice exercise might be to extend the state/transition structures to add the missing analyses.<br />
<br />
===Determinisation===<br />
<br />
===Minimisation===<br />
<br />
===Subsequential transducers===<br />
<br />
===<math>p</math>-Subsequential transducers===<br />
<br />
==Application==<br />
[[Image:Finite-state transducer -s.svg|thumb|right|125px|A transducer for the regular ''-s'' plural paradigm in English. Transducers generated from paradigms can be re-used.]]<br />
<br />
===Paradigms===<br />
<br />
<div style="padding: 1em;border: 1px dashed #2f6fab;color: black;background-color: #f9f9f9;line-height: 1.1em; font-size: 85%"><br />
<source lang="xml"><br />
<pardef n="-s"><br />
<e><br />
<p><br />
<l/><br />
<r><s n="n"/><s n="sg"/></r><br />
</p><br />
</e> <br />
<e><br />
<p><br />
<l>s</l><br />
<r><s n="n"/><s n="pl"/></r><br />
</p><br />
</e><br />
</pardef><br />
</source><br />
</div><br />
===Sections===<br />
<br />
;Standard<br />
<br />
;Inconditional section<br />
{{see-also|Inconditional section}}<br />
<br />
;Postblank<br />
<br />
;Preblank<br />
<br />
===Entries===<br />
<br />
==Behaviour==<br />
<br />
* Determinism<br />
* Minimisation<br />
* Tokenisation<br />
<br />
==Terminology==<br />
<br />
* string<br />
* alphabet<br />
* symbol<br />
* empty string<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
==See also==<br />
<br />
* [[Monodix basics]]<br />
<br />
==Notes==<br />
<references/><br />
<br />
==Further reading==<br />
<br />
* Ortiz-Rojas, S., Forcada, M. L., and Ramírez-Sánchez, G. (2005) "Construcción y minimizacion eficiente de transductores de letras a partir de diccionarios con paradigmas". ''Procesamiento del Lenguaje Natural'', 35, 51–57. [http://www.sepln.org/revistaSEPLN/revista/35/07.pdf PDF]<br />
* A. Garrido-Alenda, M.L. Forcada, (2002) "Comparing nondeterministic and quasideterministic finite-state transducers built from morphological dictionaries", Procesamiento del Lenguaje Natural, (XVIII Congreso de la Sociedad Española de Procesamiento del Lenguaje Natural, Valladolid, Spain, 11-13.09.2002) [http://www.sepln.org/revistaSEPLN/revista/29/29-Pag73.pdf PDF]<br />
* R.C. Carrasco, M.L. Forcada, (2002) "Incremental construction and maintenance of minimal finite-state automata", ''Computational Linguistics'', 28:2, 207-216 [http://www.dlsi.ua.es/~mlf/docum/carrasco02j.pdf PDF]<br />
* Alicia Garrido-Alenda, Mikel L. Forcada, Rafael C. Carrasco, (2002) "Incremental construction and maintenance of morphological analysers based on augmented letter transducers", in ''Proceedings of TMI 2002'' (Theoretical and Methodological Issues in Machine Translation, Keihanna/Kyoto, Japan, March 2002), p. 53-62 [http://www.dlsi.ua.es/~mlf/docum/garrido02p.pdf PDF]<br />
* J. Daciuk, S. Mihov, B. W. Watson, R. E. Watson (2000). "Incremental construction of minimal acyclic finite-state automata", in ''Computational Linguistics'', 26(1):3-16. [http://www.eti.pg.gda.pl/~jandac/incr_fst.ps.gz PS]<br />
<br />
[[Category:Documentation]]<br />
[[Category:Theoretical background]]<br />
[[Category:Documentation in English]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=User:Objectivesea&diff=30400User:Objectivesea2011-12-21T21:35:06Z<p>Objectivesea: Began page with self-characterization</p>
<hr />
<div>Editor and indexer for ''Hansard,'' Legislative Assembly of British Columbia. Has studied linguistics, computer science and technical writing at Simon Fraser University, College of the Rockies, Okanagan University College and North Island College. Has worked in graphic design and prepress in British Columbia and elsewhere. Interested in web development, especially XHTML and CSS.</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=User_talk:Objectivesea&diff=30399User talk:Objectivesea2011-12-21T21:16:51Z<p>Objectivesea: replied to Francis Tyers</p>
<hr />
<div>Bienvenue au Wiki d'Apertium. Merci pour vôtre aide! / Welcome to the Apertium Wiki. Thanks for your help! - [[User:Francis Tyers|Francis Tyers]] 20:06, 21 December 2011 (UTC)<br />
<br />
:Hmm, I was aiming for "if you're wanting" instead of "if you want", it feels more familiar. But if you think it's more standard the other way, that's fine. (And yes, I really do say "so" that much in real life) :) - [[User:Francis Tyers|Francis Tyers]] 21:08, 21 December 2011 (UTC)<br />
<br />
:: I am having no trouble with the present progressive, but I am thinking it sounds a bit like an Indo-Pakistani dialectal version when not being used with an action verb. I may have removed too many instances of "so", but I like to restrict the word to where there is a causal relationship with the preceding sentence. In my day job I do verbatim transcriptions of debates in the British Columbia Legislative Assembly, and I tend to see "so" used in speeches mainly as a paragraph marker.<br>&mdash; [[User:Objectivesea|Objectivesea]] 21:16, 21 December 2011 (UTC)</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Monodix_basics&diff=30394Monodix basics2011-12-21T21:04:04Z<p>Objectivesea: Minor English syntactic and punctuation improvement throughout</p>
<hr />
<div>:''[[Bases sur les dictionnaires unilingues|En français]]''<br />
<br />
{{TOCD}}<br />
We've been told that the Apertium format for dictionaries is rather counterintuitive, which is fair enough if you're not used to thinking of dictionaries in a particular way. This page hopes to be a '''basic''' introduction to how they work and how you can get started reading and writing them!<br />
<br />
This page assumes you are comfortable with HTML and XML, and assumes you can distinguish an element from an attribute and can recognise character data. If you want a quick recap, this should help:<br />
<br />
:<element attribute="value">character data</element><br />
<br />
If that doesn't make sense, you should probably read up some more on XML.<br />
<br />
==Introduction==<br />
<br />
On a global level, the most basic dictionary needs three sections. We're going to, step by step, define a dictionary that will analyse and generate the English word "beer" and its plural form, "beers". The first section defines the alphabet that is used with the dictionary. This is fairly self-explanatory; it will look something like:<br />
<br />
<pre><br />
<alphabet>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet><br />
</pre><br />
<br />
The second section defines the grammatical symbols<ref>In other linguistic literature these are sometimes referred to as "features", or "categories" and "sub-categories".</ref> of the language you are working with. This is normally where people say: "Hang on. What are grammatical symbols?" Well, they're pretty much ways of describing words, and the different forms that words can take. I assume you know what the ''parts of speech''<ref>A part of speech (or lexical category, word class, lexical class, etc.) is a linguistic category of words, which is generally defined by the syntactic or morphological behaviour of the word in question. Common linguistic categories include noun and verb, among others. There are open word classes, which constantly acquire new members, and closed word classes, which acquire new members only infrequently, if at all.</ref> are &mdash; for example, nouns (house, beer, boat, cat, ...) &mdash; and that you can distinguish them from adjectives (red, good, transparent, ...) and verbs (eat, multiply, write, ...). The way we specify these is as follows:<br />
<br />
<pre><br />
<sdefs><br />
<sdef n="noun"/><br />
<sdef n="verb"/><br />
<sdef n="adjective"/><br />
</sdefs><br />
</pre><br />
<br />
People often complain about the brevity of the tags, and typically even the values are abbreviated, so noun becomes "n", verb becomes "vb" and adjective becomes "adj" etc. (see [[list of symbols]] for some common abbreviations). The brevity serves a purpose, however; when you're writing or copying, you want the tags to get in the way as little as possible. For reference, <code><sdef></code> means "symbol definition", and <code><sdefs></code> is simply this in the plural.<br />
<br />
After we've specified the alphabet and symbols, we need to specify the actual words &mdash; the important part of the dictionary. To hold the words we use a section. There can be more than one section in a dictionary, and there is more than one type of section. We will not go into the details here, but traditionally, the largest section is called "main" and is of the "standard" type.<br />
<br />
<pre><br />
<section id="main" type="standard"><br />
<br />
</section><br />
</pre><br />
<br />
The next step is to add an entry. This is slightly more involved, so please read on.<br />
<br />
==Entries==<br />
<br />
The monolingual dictionaries in Apertium are ''morphological''.<ref>A morphological dictionary models the rules that govern the internal structure of words in a language. For example, speakers of English realise that the words "dog" and "dogs" are related, that "dogs" is to "dog" as "cats" is to "cat". The rules understood by the speaker reflect specific patterns and regularities in the way in which words are formed from smaller units and how those smaller units interact.</ref> This means that they not only hold words but also the ways that they inflect and what it means when they inflect. In Apertium we use the morphological dictionaries for two tasks:<br />
<br />
# Analysis &mdash; retrieving all of the possible lexical units from the [[surface form]] of a word.<br />
# Generation &mdash; producing the [[surface form]] of a word from the lexical unit.<br />
<br />
Okay, now to explain ''lexical unit'' and ''surface form''. Remember the example of "beer" and "beers"? We know that "beer" is a noun; we know that it is in the singular; we also know that the only difference between "beer" and "beers" is that "beers" is in the plural. Summarising this knowledge below, we find the following two facts:<br />
<br />
# beer &mdash; is a singular noun;<br />
# beers &mdash; is the plural form of the noun "beer".<br />
<br />
What we mean by ''lexical unit'' is the combination of the lemma,<ref>The lemma (or citation form, base form, head word) is the canonical form of a word. It is the form of the word that is typically used in paper dictionaries.</ref> e.g. "beer", and the grammatical symbols. The surface form of a word is the word as you read it.<ref>Surface forms can be ambiguous, but lexical units cannot. A surface form may have many analyses; for example, "run" can be a verb (''They run on weekends'') or a noun (''I'm going for a run'').</ref> In Apertium style these would be represented something like the following:<br />
<br />
:{|class=wikitable<br />
! Surface form !! Lexical unit<br />
|-<br />
| beer || beer<noun><singular><br />
|-<br />
| beers || beer<noun><plural><br />
|-<br />
|}<br />
<br />
In order to convert between these two forms, we need to define them as a pair. Pairs of surface forms and lexical units in Apertium are indicated by the <code>&lt;p&gt;</code> element. This is rather intuitive, so long as you know the abbreviation. These pair elements may contain a "left side" (<code>&lt;l&gt;</code>) and a "right side" (<code>&lt;r&gt;</code>). The left side almost always contains the surface form of the word, while the right side contains the lexical unit. Our first entry (<code>&lt;e&gt;</code>) might look something like the following:<br />
<br />
<pre><br />
<e><br />
<p><br />
<l>beer</l><br />
<r>beer<s n="noun"/><s n="singular"/></r><br />
</p><br />
</e><br />
</pre><br />
<br />
Now, roughly, you need as many of these entries as there are surface forms in the language; however, the astute among you will have realised that creating entries for ''all'' the words in the language is an impossible task. The next section will show how this can be avoided, but in the meantime we now have enough information to compile our first dictionary:<br />
<br />
<pre><br />
<dictionary><br />
<alphabet>abcdefghijklmnopqrstuvwxyz</alphabet><br />
<sdefs><br />
<sdef n="noun"/><br />
<sdef n="singular"/><br />
<sdef n="plural"/><br />
</sdefs><br />
<br />
<section id="main" type="standard"><br />
<e><br />
<p><br />
<l>beer</l><br />
<r>beer<s n="noun"/><s n="singular"/></r><br />
</p><br />
</e><br />
<e><br />
<p><br />
<l>beers</l><br />
<r>beer<s n="noun"/><s n="plural"/></r><br />
</p><br />
</e><br />
</section><br />
</dictionary><br />
</pre><br />
<br />
The entries above will enable us to retrieve the lexical units for "beer" and "beers", and to generate these two surface forms from the same lexical units. <br />
<br />
The dictionary is functional but is intended for teaching purposes; actual dictionary files look somewhat different, because defining each word completely separately from other words which follow the same rules is rather inefficient. <br />
<br />
===Compilation===<br />
{{see-also|lttoolbox}}<br />
Save this into a file called <code>dictionary.dix</code>, then we'll compile the dictionary into a binary form<ref>See [[Dictionaries]] for more complete information on the format</ref> using the tool <code>lt-comp</code>. The command takes three arguments; the first is "direction", then the input file and the output file. The "direction" option is important. <br />
<br />
If we specify the direction as "lr" (left → right), we get an analyser (that is, a dictionary that takes surface forms and outputs lexical units. If we specify the reverse ("rl", right → left), we get a generator, which takes lexical units and outputs surface forms. We might as well generate both:<br />
<br />
<pre><br />
$ lt-comp lr dictionary.dix analyser.bin<br />
main@standard 7 6<br />
<br />
$ lt-comp rl dictionary.dix generator.bin<br />
main@standard 7 6<br />
</pre><br />
<br />
We can now use the dictionary to analyse the noun "beers":<br />
<br />
<pre><br />
$ echo "beers" | lt-proc analyser.bin<br />
^beers/beer<noun><plural>$<br />
</pre><br />
<br />
The analysis gives us the surface form, followed by the lexical unit. If we want to generate the surface form from the lexical unit, we just do:<br />
<br />
<pre><br />
$ echo "^beer<noun><plural>$" | lt-proc -g generator.bin<br />
beers<br />
</pre><br />
<br />
==Paradigms==<br />
<br />
Great! We have a dictionary, and we can analyse and generate the two forms of the words "beer". But what happens when we want to add more words, say "school" or "computer"? Well, one thing we could do is add four more entries in the main section (one for each of "school", "schools", "computer" and "computers"). On the other hand, this would be pretty inefficient. Instead, we can generalise a rule, which in this case is "add ''-s'' to make the plural", using a ''paradigm'', which is literally, "an example serving as a model or pattern". <br />
<br />
In order to define paradigms, we typically take a word that can serve as an example for how other words inflect. In this case, we can say, "the words ''school'' and ''computer'' inflect like ''beer''".<br />
<br />
Paradigms go in a section called <code><pardefs></code> (paradigm definitions), below the <code><sdefs></code> and above the main section. They are defined in <code><pardef></code> (paradigm definition) elements. Each paradigm definition must have an attribute "id", which contains a unique name. This id can be anything, but it conventionally takes the form of:<br />
<br />
:<code><lemma>__<part of speech></code>, (e.g. <code>beer__n</code>)<br />
<br />
In order to make the lexical units for beer, beers, computer, computers, etc., we need to distinguish between the part of the surface form that doesn't change (the ''identical'' part), and the part that does change. In the example already given, it is quite straightforward that the identical part is always the singular form. However, this might not always be the case (e.g. "wolf, wolves" or "tooth, teeth").<br />
<br />
You probably guessed already what the paradigm definition is going to look like, so here it is:<br />
<br />
<pre><br />
<pardef n="beer__n"><br />
<e><br />
<p><br />
<l/><br />
<r><s n="noun"/><s n="singular"/></r><br />
</p><br />
</e><br />
<e><br />
<p><br />
<l>s</l><br />
<r><s n="noun"/><s n="plural"/></r><br />
</p><br />
</e><br />
</pardef><br />
</pre><br />
<br />
The only thing that has changed between these two entries and the first ones we made is that the ''identical'' part has been removed from both sides of the pair.<br />
<br />
The paradigm definition goes into its own part of the dictionary, enclosed in <code><pardefs></code> tags; for example:<br />
<br />
<pre><br />
<pardefs><br />
<br />
... <br />
<br />
</pardefs><br />
</pre><br />
<br />
We can see where this fits in with the rest of the dictionary below:<br />
<br />
<pre><br />
<dictionary><br />
<alphabet>abcdefghijklmnopqrstuvwxyz</alphabet><br />
<sdefs><br />
<br />
...<br />
<br />
</sdefs><br />
<pardefs><br />
<br />
... <br />
<br />
</pardefs><br />
<section id="main" type="standard"><br />
<e lm="beer"><i>beer</i><par n="beer__n"/></e><br />
<e lm="school"><i>school</i><par n="beer__n"/></e><br />
<e lm="computer"><i>computer</i><par n="beer__n"/></e><br />
<e lm="house"><i>house</i><par n="beer__n"/></e><br />
</section><br />
</dictionary><br />
</pre><br />
<br />
==Notes==<br />
<references/><br />
<br />
[[Category:Documentation in English]]<br />
[[Category:Writing dictionaries]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Using_an_lttoolbox_dictionary&diff=30391Using an lttoolbox dictionary2011-12-21T20:26:29Z<p>Objectivesea: Minor English syntactic improvement in description</p>
<hr />
<div>{{TOCD}}<br />
This page is intended as an answer to the question "I've found one of these <code>.dix</code> files; how can I use it to analyse text?" First of all, it is worth explaining what a <code>.dix</code> file is: a finite-state transducer for a language encoded in XML. More information on this can be found at the page [[lttoolbox]] and [[monodix basics]], but this page only concerns how it is used.<br />
<br />
==Requirements==<br />
<br />
The most basic requirements are:<br />
<br />
* lttoolbox &mdash; A finite-state toolkit<br />
* apertium &mdash; A machine translation software platform<br />
<br />
The second is necessary for the [[deformatters]]. The tools in [[lttoolbox]] have a set of escaped characters which must be escaped in running text (see [[Apertium stream format]]).<br />
<br />
If you have a machine running GNU/Linux or Mac/OS then you can probably install both of these programs fairly easily. For lttoolbox:<br />
<br />
<pre><br />
$ svn co http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/lttoolbox<br />
cd lttoolbox/<br />
sh autogen.sh<br />
./configure<br />
make<br />
make install<br />
</pre><br />
<br />
And for apertium:<br />
<br />
<pre><br />
$ svn co http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium<br />
cd apertium/<br />
sh autogen.sh<br />
./configure<br />
make<br />
make install<br />
</pre><br />
<br />
Subversion (<code>svn</code>) is a version control system. If you don't have it installed, on Debian/Ubuntu GNU/Linux you can use <code>apt-get install subversion</code> (or get it through Synaptic). On Mac/OS you can use <code>port install subversion</code> (requires [http://www.macports.org/ MacPorts]).<br />
<br />
See [[Installation]] for more information and troubleshooting.<br />
<br />
==Using the dictionary==<br />
<br />
Then, you take the <code>.dix</code> file (e.g. <code>apertium-bn-en.bn.dix</code>) that you have downloaded, and compile it:<br />
<br />
===Compile===<br />
{{see-also|Compiling dictionaries}}<br />
<br />
<pre><br />
$ lt-comp lr apertium-bn-en.bn.dix bn.analyser.bin<br />
final@inconditional 8 75<br />
main@standard 6403 13351<br />
</pre><br />
<br />
===Use===<br />
<br />
Note that the <code>apertium-destxt</code> command is important. <br />
<br />
<pre><br />
$ echo "উইকিপিডিয়ার বাংলা সংস্করণে স্বাগতম। এই বিশ্বকোষে যে কেউ অবদান রাখতে পারেন। ২১,২৫৫টি ভুক্তির ওপর কাজ চলছে।" | apertium-destxt | lt-proc bn.analyser.bin <br />
^উইকিপিডিয়ার/*উইকিপিডিয়ার$ ^বাংলা/বাংলা<adj><mf>/বাংলা<n><mf><nn><sg><nom>/বাংলা<n><mf><nn><sg><obj>$ ^সংস্করণে/*সংস্করণে$ ^স্বাগতম/*স্বাগতম$^।/।<sent>$ <br />
^এই/এই<det><dem>$ ^বিশ্বকোষে/*বিশ্বকোষে$ ^যে/যা<prn><p3><infml><rel><aa><mf><sg><nom>$ ^কেউ/কেউ<prn><p3><aa><mf><sp><nom>$ <br />
^অবদান/অবদান<n><nt><nn><sg><nom>/অবদান<n><nt><nn><sg><obj>$ ^রাখতে/রাখ<vblex><inf>/রাখ<vblex><past><hbtl><p2><fam>$ <br />
^পারেন/পার<vblex><pres><smpl><p3><pol>/পার<vblex><pres><smpl><p2><pol>$^।/।<sent>$ ^২১/২১<num>$, ^২৫৫টি/২৫৫<num>$ ^ভুক্তির/*ভুক্তির$ <br />
^ওপর/ওপর<adv>/ওপর<n><mf><nn><sg><nom>/ওপর<n><mf><nn><sg><obj>$ ^কাজ/কাজ<n><nt><nn><sg><nom>/কাজ<n><nt><nn><sg><obj>$ <br />
^চলছে/চল<vblex><pres><cnt><impers>/চল<vblex><pres><cnt><p3><infml>$^।/।<sent>$^./.<sent>$[][<br />
]<br />
</pre><br />
<br />
because if unescaped special characters appear in the [[Apertium stream format|stream]], you will get a <code>std::exception</code>:<br />
<br />
<pre><br />
$ echo "This is a test ^500" | lt-proc bn.analyser.bin <br />
This is a test std::exception<br />
</pre><br />
<br />
(on a Mac, you'll typically see a <code>9Exception</code>)<br />
<br />
==See also==<br />
<br />
* [[List of dictionaries]]<br />
<br />
[[Category:Documentation in English]]<br />
[[Category:Lttoolbox|*]]<br />
[[Category:Morphological analysers]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Task_ideas_for_Google_Code-in&diff=30382Task ideas for Google Code-in2011-12-21T20:03:40Z<p>Objectivesea: removed need for redirect</p>
<hr />
<div>This is the task ideas page for Google Code-in (http://www.google-melange.com/gci/homepage/google/gci2011), here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.<br />
<br />
The people column lists people who you should get in contact with to request further information. The time column gives the minimum estimated amount of time that should be spent on the task. '''It does not include time taken to install / set up apertium'''.<br />
<br />
Если ты не понимаешь английский язык или предпочитаешь работать над русским языком или другими языками России, смотри: [[Task ideas for Google Code-in/Russian]]<br />
<br />
==Task list==<br />
<br />
{|class="wikitable sortable"<br />
! Area !! Difficulty !! Title !! Description !! Time<br/>(hours) !! People<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Convert existing resource: Urdu morphological analyser || Take Muhammad Humayoun's [http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ Urdu Morphology] and convert to lttoolbox format. ||align=center| 8&ndash;10 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Convert existing resource: Punjabi morphological analyser || Take Muhammad Humayoun's [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi Morphology] and convert to lttoolbox format. ||align=center| 8&ndash;10 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Convert existing resource: Kurdish morphological analyser || Take the Alexina [https://gforge.inria.fr/scm/viewvc.php/kurlex/trunk/?root=alexina Kurdish Morphology] and convert to lttoolbox format. ||align=center| 8&ndash;10 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Belarusian-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_be_a.html Belarusian-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Breton-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_br_a.html Breton-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jacob_Nordfalk|Jacob_Nordfalk]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Bulgarian-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_bg_a.html Bulgarian-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Czech-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_cs_a.html Czech-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Dutch-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_nl_a.html Dutch-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Finnish-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_fi_a.html Finnish-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro German-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_de_a.html German-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Greek-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_el_a.html Greek-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jacob_Nordfalk|Jacob_Nordfalk]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Hebrew-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_he_a.html Hebrew-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jacob_Nordfalk|Jacob_Nordfalk]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Hungarian-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_hu_a.html Hungarian-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jacob_Nordfalk|Jacob_Nordfalk]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Italian-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_it_a.html Italian-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Persian-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_fa_alef_madde.html Persian-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Portuguese-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_pt_a.html Portuguese-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Polish-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_pl_a.html Polish-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Portuguese-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_pt_a.html Portuguese-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Russian-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_ru_a.html Russian-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Slovakian-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_sk_a.html Slovakian-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Swedish-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_sv_a.html Swedish-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jacob_Nordfalk|Jacob_Nordfalk]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: Reta Vortaro Turkish-Esperanto || Take the [http://www.reta-vortaro.de/revo/inx/lx_tr_a.html Turkish-Esperanto lexicon] and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Convert Apertium resources: nn-nb for Freedict || Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Nynorsk-Bokmal dictionary. ||align=center| 2&ndash;4 || [[User:Piotr Bański|Piotr Bański]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Convert Apertium resources: es-ca for Freedict || Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Spanish-Catalan dictionary. ||align=center| 2&ndash;4 || [[User:Piotr Bański|Piotr Bański]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Convert Apertium resources: is-en for Freedict || Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Icelandic-English dictionary. ||align=center| 2&ndash;4 || [[User:Piotr Bański|Piotr Bański]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Convert Apertium resources: es-ast for Freedict || Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Asturian-Spanish dictionary. ||align=center| 2&ndash;4 || [[User:Piotr Bański|Piotr Bański]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Convert Apertium resources: oc-ca for Freedict || Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Occitan-Catalan dictionary. ||align=center| 2&ndash;4 || [[User:Piotr Bański|Piotr Bański]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Convert Apertium resources: mk-bg for Freedict || Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Macedonian-Bulgarian dictionary. ||align=center| 2&ndash;4 || [[User:Piotr Bański|Piotr Bański]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Convert Apertium resources: mk-en for Freedict || Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Macedonian-English dictionary. ||align=center| 2&ndash;4 || [[User:Piotr Bański|Piotr Bański]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: English-Slovakian dictionary || Take [http://www.sk-spell.sk.cx/mass-msas MSAS/MASS] and convert to lttoolbox format. ||align=center| 1&ndash;4 || [[User:Zdenko Podobný|Zdenko Podobný]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Convert existing resource: Slovakian morphological analyser || Take the morphological analyser distributed with LanguageTool and convert to lttoolbox format. ||align=center| 1&ndash;4 || [[User:Zdenko Podobný|Zdenko Podobný]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict afr-deu || Take the [http://sf.net/projects/freedict Freedict] afr-deu dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict ckb-kmr || Take the [http://sf.net/projects/freedict Freedict] ckb-kmr dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict dan-eng || Take the [http://sf.net/projects/freedict Freedict] dan-eng dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-ell || Take the [http://sf.net/projects/freedict Freedict] eng-ell dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-hin || Take the [http://sf.net/projects/freedict Freedict] eng-hin dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-hrv || Take the [http://sf.net/projects/freedict Freedict] eng-hrv dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-pol || Take the [http://sf.net/projects/freedict Freedict] eng-pol dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-rom || Take the [http://sf.net/projects/freedict Freedict] eng-rom dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-rus || Take the [http://sf.net/projects/freedict Freedict] eng-rus dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict gla-deu || Take the [http://sf.net/projects/freedict Freedict] gla-deu dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict hrv-eng || Take the [http://sf.net/projects/freedict Freedict] hrv-eng dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict jpn-deu || Take the [http://sf.net/projects/freedict Freedict] jpn-deu dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict kha-deu || Take the [http://sf.net/projects/freedict Freedict] kha-deu dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict kha-eng || Take the [http://sf.net/projects/freedict Freedict] kha-eng dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict kur-eng || Take the [http://sf.net/projects/freedict Freedict] kur-eng dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict kur-tur || Take the [http://sf.net/projects/freedict Freedict] kur-tur dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict lat-deu || Take the [http://sf.net/projects/freedict Freedict] lat-deu dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict san-deu || Take the [http://sf.net/projects/freedict Freedict] san-deu dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict slo-eng || Take the [http://sf.net/projects/freedict Freedict] slo-eng dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict swh-pol || Take the [http://sf.net/projects/freedict Freedict] swh-pol dictionary and convert to lttoolbox format. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict ara-eng and eng-ara || Take the [http://sf.net/projects/freedict Freedict] ara-eng and eng-ara dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict bre-fra and fra-bre || Take the [http://sf.net/projects/freedict Freedict] bre-fra and fra-bre dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict deu-fra and fra-deu || Take the [http://sf.net/projects/freedict Freedict] deu-fra and fra-deu dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict deu-ita and ita-deu || Take the [http://sf.net/projects/freedict Freedict] deu-ita and ita-deu dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict deu-kur and kur-deu || Take the [http://sf.net/projects/freedict Freedict] deu-kur and kur-deu dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict deu-nld and nld-deu || Take the [http://sf.net/projects/freedict Freedict] deu-nld and nld-deu dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict deu-por and por-deu || Take the [http://sf.net/projects/freedict Freedict] deu-por and por-deu dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict deu-tur and tur-deu || Take the [http://sf.net/projects/freedict Freedict] deu-tur and tur-deu dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-afr and afr-eng || Take the [http://sf.net/projects/freedict Freedict] eng-afr and afr-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-cym and cym-eng || Take the [http://sf.net/projects/freedict Freedict] eng-cym and cym-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-cze and ces-eng || Take the [http://sf.net/projects/freedict Freedict] eng-cze and ces-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-deu and deu-eng || Take the [http://sf.net/projects/freedict Freedict] eng-deu and deu-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-fra and fra-eng || Take the [http://sf.net/projects/freedict Freedict] eng-fra and fra-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-gle and gle-eng || Take the [http://sf.net/projects/freedict Freedict] eng-gle and gle-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-hun and hun-eng || Take the [http://sf.net/projects/freedict Freedict] eng-hun and hun-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-ita and ita-eng || Take the [http://sf.net/projects/freedict Freedict] eng-ita and ita-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-lat and lat-eng || Take the [http://sf.net/projects/freedict Freedict] eng-lat and lat-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-lit and lit-eng || Take the [http://sf.net/projects/freedict Freedict] eng-lit and lit-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-nld and nld-eng || Take the [http://sf.net/projects/freedict Freedict] eng-nld and nld-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-por and por-eng || Take the [http://sf.net/projects/freedict Freedict] eng-por and por-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-scr and scr-eng || Take the [http://sf.net/projects/freedict Freedict] eng-scr and scr-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-spa and spa-eng || Take the [http://sf.net/projects/freedict Freedict] eng-spa and spa-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-swa and swa-eng || Take the [http://sf.net/projects/freedict Freedict] eng-swa and swa-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-swe and swe-eng || Take the [http://sf.net/projects/freedict Freedict] eng-swe and swe-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-tur and tur-eng || Take the [http://sf.net/projects/freedict Freedict] eng-tur and tur-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict eng-wel and wel-eng || Take the [http://sf.net/projects/freedict Freedict] eng-wel and wel-eng dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict fra-nld and nld-fra || Take the [http://sf.net/projects/freedict Freedict] fra-nld and nld-fra dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Convert existing resource: FreeDict gle-pol and pol-gle || Take the [http://sf.net/projects/freedict Freedict] gle-pol and pol-gle dictionaries, convert to lttoolbox format, and merge them. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Convert existing resource: Polish-Slovakian transfer rules || Much of the existing rules in Apertium's pl-cs system originated in pl-sk. Take the new rules in pl-cs and apply them to pl-sk. No knowledge of Polish, Slovakian, or Czech is required, though it will help ||align=center| 1&ndash;4 || [[User:Zdenko Podobný|Zdenko Podobný]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Apertium on Macedonian Wikipedia || Bulgarian WP has 107,355 articles, Macedonian WP has 42,112, less than half as many. Translate some articles from Bulgarian Wikipedia to Macedonian Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand. ||align=center| 1&ndash;4 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Apertium on Occitan Wikipedia || Catalan WP has 350,000 articles, Occitan WP has 55,000. Translate some articles from Catalan Wikipedia to Occitan Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand. ||align=center| 1&ndash;4 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Apertium on Asturian Wikipedia || Spanish WP has 840,000 articles, Asturian WP has 15,000, almost a fiftieth as few. Translate some articles from Spanish Wikipedia to Asturian Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand. ||align=center| 1&ndash;4 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Apertium on Aragonese Wikipedia || Spanish WP has 840,000 articles, Aragonese WP has 26,000. Translate some articles from Spanish Wikipedia to Aragonese Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand. ||align=center| 1&ndash;4 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Apertium on Esperanto Wikipedia: Catalan || Catalan WP has 350,000 articles, Esperanto WP has 150,000. Translate some articles from Catalan Wikipedia to Esperanto Wikipedia using Apertium, and then postedit them. You can use the utility [http://vikitraduko.saluton.dk:8080/vikitraduko/ Vikitradukilo]. Explain to the Esperanto Wikipedia community what you are doing beforehand. ||align=center| 1&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Apertium on Esperanto Wikipedia: Spanish || Spanish WP has 840,000 articles, Esperanto WP has 150,000. Translate some articles from Spanish Wikipedia to Esperanto Wikipedia using Apertium, and then postedit them. You can use the utility [http://vikitraduko.saluton.dk:8080/vikitraduko/ Vikitradukilo]. Explain to the Esperanto Wikipedia community what you are doing beforehand. ||align=center| 1&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Apertium on Esperanto Wikipedia: French || French WP has 1,200,000 articles, Esperanto WP has 150,000. Translate some articles from French Wikipedia to Esperanto Wikipedia using Apertium, and then postedit them. You can use the utility [http://vikitraduko.saluton.dk:8080/vikitraduko/ Vikitradukilo]. Explain to the Esperanto Wikipedia community what you are doing beforehand. ||align=center| 1&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Apertium on Esperanto Wikipedia: English|| English WP has 3,800,000 articles, Esperanto WP has 150,000. Translate some articles from English Wikipedia to Esperanto Wikipedia using Apertium, and then postedit them. You can use the utility [http://vikitraduko.saluton.dk:8080/vikitraduko/ Vikitradukilo]. Explain to the Esperanto Wikipedia community what you are doing beforehand. ||align=center| 1&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Apertium on Portuguese Wikitravel: Spanish||Translate some articles from Spanish Wikitravel to Portuguese Wikitravel using Apertium, and then postedit them. Explain to the Portuguese Wikitravel community what you are doing beforehand. ||align=center| 1&ndash;4 || [[User:Gramirez|Gramirez]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Medium || LUG Flyer || Design a flyer that briefly explains Apertium, suitable for handing out at Linux User Group meetings||align=center| 1&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Medium || School Flyer || Design a flyer that briefly explains Apertium, suitable for handing out at your school||align=center| 1&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Thorough checkup of bn-en morphological analyser || While the current bn-en morphological analyser has a pretty good coverage, it should have been higher. Part of the reason is that a lot of verbs have one/two slight different surface forms that differ from the regular ones and the analyser misses them. Using lt-expand it's possible to generate all forms of the verbs, then manually check these and using another script (already in the pair) rebuild the analyser file. This checking will require a native speaker/expert on Bengali language ||align=center| 2&ndash;4 || [[User:Darthxaher|Abu&nbsp;Zaher]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Dixtools: TEI export || Take the code from Dix2CC.java or Dix2Tiny.java and adapt to export TEI P5 format dictionaries, suitable for FreeDict. This project is suitable for someone interested in learning Java. ||align=center| 2&ndash;4 ||[[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Macedonian and Albanian || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Macedonian and Albanian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Kurdish and Persian || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Kurdish and Persian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Hindi and Urdu || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Hindu and Urdu. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Finnish and Estonian || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Finnish and Estonian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Spanish and Italian || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Spanish and Italian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Gramirez|Gramirez]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Catalan and Sardinian || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Catalan and Sardinian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Italian and Sardinian || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Italian and Sardinian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Deadbeef|Deadbeef]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Belorussian and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Belorussian and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Breton and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Breton and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Bulgarian and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Bulgarian and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Czech and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Czech and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Dutch and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Dutch and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: German and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between German and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Greek and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Greek and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Italian and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Italian and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Persian and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Persian and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Polish and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Polish and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Portuguese and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Portuguese and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Russian and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Russian and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Slovak and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Slovak and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Swedish and Esperanto || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Swedish and Esperanto. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Spanish and Aragonese || Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Spanish and Aragonese. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. ||align=center| 4&ndash;6 || [[User:Juanpabl|Juan Pablo Martínez]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Русский язык и эсперанто || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с русского на эсперанто. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Чувашский и русский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с чувашского на русский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Чувашский и татарский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с татарского на чувашский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Чувашский и башкирский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с башкирского на чувашский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Чувашский и турецкий языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с турецкого на чувашский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Татарский и русский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с татарского на русский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Башкирский и татарский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с башкирского на татарский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Башкирский и турецкий языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с турецкого на башкирский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Якутский и татарский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с якутского на русский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Якутский и татарский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с татарского на якутский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Якутский и турецкий языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с турецкого на якутский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Якутский и русский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с русского на якутский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Halan|Halan]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Якутский и английский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с английского на якутский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Halan|Halan]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Кумыкский и ногайский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с кумыкского на ногайский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Кумыкский и татарский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с татарского на кумыкский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Кумыкский и турецкий языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с турецкого на кумыкский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Карачаево-балкарский и татарский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с татарского на карачаево-балкарский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Карачаево-балкарский и турецкий языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с турецкого на карачаево-балкарский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Тувинский и хакасский языки || Создать множество тестовых фраз (посмотрите 'Pending tests' и 'Regression tests' в Вики) для перевода с тувинского на хакасский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Тувинский и татарский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с татарского на тувинский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Тувинский и турецкий языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с турецкого на тувинский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Осетинский и русский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с осетинского на русский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Amikeco|Amikeco]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Осетинский и английский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с осетинского на английский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Amikeco|Amikeco]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Осетинский язык и эсперанто || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с осетинского языка на эсперанто. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Amikeco|Amikeco]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Бурятский и калмыцкий языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с бурятского на калмыцкий язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Jargal|Jargal]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Бурятский и якутский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с бурятского на якутский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Jargal|Jargal]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Бурятский и тувинский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с бурятского на тувинский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Jargal|Jargal]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Удмурский и русский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с удмурстского на русский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Удмурский и финский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с финского на удмурский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Удмурский и коми языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с коми на удмурский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Луговомарийский и горномарийский языки || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с горномарийского на луговомарийский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Луговомарийский язык и эрзя || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с эрзя на луговомарийский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|исследование}} || 2.&nbsp;Нормальное || Противопоставление: Луговомарийский язык и мокша || Создать множество тестовых фраз (посмотрите страницы 'Pending tests' и 'Regression tests' в Вики) для перевода с мокша на луговомарийский язык. Тесты должны содержать как можно больше черт языков. Некоторые из примеров можно найти в грамматике, другие могут быть придуманы. Без использования программирования, только грамматический анализ. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|research}} || 3.&nbsp;Easy || Catalogue resources: Aromanian || Catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for Aromanian, along with the licences they are under. || || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|translation}} || 2.&nbsp;Medium || Translate the HOWTO: Norwegian || Translate the [[new language pair HOWTO]] into Nynorsk. ||align=center| 5&ndash;8 || [[User:Unhammer|Unhammer]]<br />
|-<br />
|align=center| {{sc|translation}} || 2.&nbsp;Medium || Translate the HOWTO: Dutch || Translate the [[new language pair HOWTO]] into Dutch. ||align=center| 5&ndash;8 || [[User:AureiAnimus|Pim&nbsp;Otte]]<br />
|-<br />
|align=center| {{sc|translation}} || 2.&nbsp;Medium || Translate the HOWTO: Aragonese || Translate the [[new language pair HOWTO]] into Aragonese. ||align=center| 5&ndash;8 || [[User:juanpabl|Juan Pablo Martínez]]<br />
|-<br />
|align=center| {{sc|translation}} || 2.&nbsp;Medium || Translate the HOWTO: Turkish || Translate the [[new language pair HOWTO]] into Turkish. ||align=center| 5&ndash;8 || [[User:Zfe|Zfe]]<br />
|-<br />
|align=center| {{sc|translation}} || 2.&nbsp;Medium || Translate the HOWTO: Esperanto|| Finish the translation of the [[Kiel aldoni novan lingvoparon]] into Esperanto. ||align=center| 4&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Update test pages: Esperanto and Catalan || Test the outstanding tests in the [[Catalan_and_Esperanto/Outstanding_tests|outstanding test page]] page and put the ones which work in the [[Catalan_and_Esperanto/Regression_tests|regression test page]]. Test the regression tests in the regression test page and put the ones which don't work in the outstanding test page ||align=center| 1&ndash;2 ||[[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Update test pages: Esperanto and Spanish || Test the outstanding tests in the [[Spanish_and_Esperanto/Outstanding_tests|outstanding test page]] and put the ones which work in the [[Spanish_and_Esperanto/Regression_tests|regression test page]]. Test the regression tests in the regression test page and put the ones which don't work in the outstanding test page ||align=center| 1&ndash;2 ||[[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Update test pages: Esperanto and French || Test the outstanding tests in the [[French_and_Esperanto/Outstanding_tests|outstanding test page]] and put the ones which work in the [[French_and_Esperanto/Regression_tests|regression test page]]. Test the regression tests in the regression test page and put the ones which don't work in the outstanding test page ||align=center| 1&ndash;2 ||[[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Add new tests: Esperanto and Catalan || Add 10 new constructions which aren't correctly translated in the [[Catalan_and_Esperanto/Outstanding_tests|outstanding test page]]. ||align=center| 1&ndash;2 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Add new tests: Esperanto and Spanich|| Add 10 new constructions which aren't correctly translated in the [[Spanish_and_Esperanto/Outstanding_tests|outstanding test page]]. ||align=center| 1&ndash;2 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Add new tests: Esperanto and French || Add 10 new constructions which aren't correctly translated in the [[French_and_Esperanto/Outstanding_tests|outstanding test page]]. ||align=center| 1&ndash;2 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Quality evaluation: Spanish and French || Perform a human post-edition evaluation of the Spanish and French language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 2,000 words. ||align=center| 4&ndash;8 ||[[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Quality evaluation: Spanish and Occitan || Perform a human post-edition evaluation of the Spanish and Occitan language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 2,000 words. ||align=center| 4&ndash;8 ||[[User:mginesti|Mireia Ginestí]] <br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Quality evaluation: Spanish and Asturian || Perform a human post-edition evaluation of the Spanish and Asturian language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 2,000 words. ||align=center| 4&ndash;8 ||[[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|user interface}} || 2.&nbsp;Medium || Design a user-friendly Glade interface for Apertium || Apertium does not currently have a friendly user interface for translators. Look at other translation software on the market, and sketch out some ideas for how to design a user interface. We don't require an implementation, just the XML-based interface mockup ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|user interface}} || 2.&nbsp;Medium || Design a user-friendly web interface for Apertium || Apertium has a friendly user interface for translators, but more attention needs to be paid to its visual appearance. This will involve either a user interface mockup (preferably using GWT), or a "theme" using CSS for the existing interface. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|user interface}} || 2.&nbsp;Medium || Design a user-friendly interface for a web-based dictionary management tool. || Apertium does not currently have a friendly user interface for adding new words to the dictionaries. We need someone with a good sense of design to provide us with a mockup for a web interface for managing a dictionary. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|user interface}} || 2.&nbsp;Medium || Design a user-friendly interface for an Android version of TinyLex || TinyLex is a dictionary tool for J2ME. We would like to port it to Android, but as we are not UI designers, we would prefer it if someone with a sense for visual design took on this task. There are tools available for drawing an interface in XML - it would be better if they were used. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|user interface}} || 2.&nbsp;Medium || Design a user-friendly Android interface for Apertium || Design a mockup of a GUI for Apertium for Android. We don't run on Android yet, but work is ongoing. We would like some ideas for an interface that makes sense on phones, primarily, but taking the tablet form factor into account is also an option. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|training}} || 3.&nbsp;Easy || Step-by-step "become a developer" guide || Write a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install a development version of Apertium and make a single change in a language pair. This should include everything, from checking out with [[SVN]] to requesting committer access on SourceForge. Document everything you do! ||align=center| 2&ndash;3 || [[User:mlforcada|Mikel&nbsp;L.&nbsp;Forcada]]<br />
|-<br />
|align=center| {{sc|training}} || 3.&nbsp;Easy || Step-by-step "constraint grammar" guide || Write a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install Constraint Grammar and fix 5 disambiguation problems in a single sentence, then committing to the [[incubator]]. This should include everything, from checking out with [[SVN]] to requesting committer access on SourceForge. Document everything you do! ||align=center| 2&ndash;3 || [[User:Unhammer|Unhammer]]<br />
|-<br />
|align=center| {{sc|training}} || 2.&nbsp;Easy || Basics of grammar guide || Write a basic guide that teaches the basics of grammar, with reference to the part of speech tags used in Apertium ||align=center| 2&ndash;3 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|training}} || 2.&nbsp;Medium || Moodle course || Design a Moodle-based course for beginning a new language pair. The New Language Pair HOWTO can be used as a guide ||align=center| 4&ndash;6 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|training}} || 2.&nbsp;Easy || Apertium AWI Screencase || Create a screencast that gives a step by step guide to using Apertium via Apertium AWI. It's OK to assume that it has already been set up ||align=center| 2&ndash;3 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|training}} || 1.&nbsp;Hard || Apertium Regular Expressions guide || lttoolbox allows a limited subset of POSIX regular expressions. Create a guide to the regexes allowed, and to using them for common tasks, such as matching dates. ||align=center| 2&ndash;3 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Release freshness || Go through all the 25 released pairs and note down their date of last release and how many dictionary entries and rules they have. Then go to SVN and look at the module for the released pair and find out how many dictionary entries and rules it has. Put this into a spreadsheet and email the mailing list. Why? Our release cycle is very slow, and often we get pairs in trunk which have substantial improvements but have not been released. ||align=center| 2&ndash;4 ||[[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Aragonese || Translate the article on Apertium into Aragonese for the Aragonese Wikipedia ||align=center| 1h || [[User:juanpabl|Juan Pablo Martínez]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Chuvash || Translate the article on Apertium into Chuvash for the Chuvash Wikipedia ||align=center| 1h || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Tatar || Translate the article on Apertium into Tatar for the Tatar Wikipedia ||align=center| 1h || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Bashkir || Translate the article on Apertium into Bashkir for the Bashkir Wikipedia ||align=center| 1h || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Yakut || Translate the article on Apertium into Yakut for the Yakut Wikipedia ||align=center| 1h || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Halan|Halan]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Komi || Translate the article on Apertium into Komi for the Komi Wikipedia ||align=center| 1h || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Udmurt || Translate the article on Apertium into Udmurt for the Udmurt Wikipedia ||align=center| 1h || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Meadow Mari || Translate the article on Apertium into Meadow Mari for the Meadow Mari Wikipedia ||align=center| 1h || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Hill Mari || Translate the article on Apertium into Hill Mari for the Hill Mari Wikipedia ||align=center| 1h || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Osetian || Translate the article on Apertium into Osetian for the Osetian Wikipedia ||align=center| 1h || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Translate the Wikipedia article on Apertium: Buryat || Translate the article on Apertium into Osetian for the Buryat Wikipedia ||align=center| 1h || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Jargal|Jargal]]<br />
|-<br />
|align=center| {{sc|documentation}} || 3.&nbsp;Easy || Create a dictionary crossing guide || Create a full guide to crossing dictionaries, using notes that will be provided. ||align=center| 2&ndash;3 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|documentation}} || 3.&nbsp;Easy || Create an installation guide for Windows users || We have some installation notes, but they were not written with an average user in mind. Write a new installation guide, specifically for Windows users, that don't presume a high level of technical knowledge. ||align=center| 2&ndash;3 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|documentation}} || 3.&nbsp;Easy || Create an installation guide for Mac users || We have some installation notes, but they were not written with an average user in mind. Write a new installation guide, specifically for Mac users, that don't presume a high level of technical knowledge. ||align=center| 2&ndash;3 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|documentation}} || 3.&nbsp;Easy || Create an installation guide for Ubuntu users || We have some installation notes, but they were not written with an average user in mind. Write a new installation guide, specifically for Ubuntu users, that don't presume a high level of technical knowledge. Specifically, steer people away from installing the dated Debian/Ubuntu packages.||align=center| 2&ndash;3 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|outreach}} || 3.&nbsp;Easy || Writing a quick guide on 'What Apertium can and cannot do to help you with your homework'. || Students around the world use Apertium (and other MT systems) to do their second-language homework. The documents would summarize the do's and don'ts, and could even elaborate on how students using Apertium for their homework could discover ways in which Apertium could be improved. ||align=center| 2&ndash;3 ||[[User:mlforcada|Mikel L. Forcada]]<br />
|-<br />
|align=center| {{sc|documentation}} || 3.&nbsp;Easy || Document undocumented features: manpages || Work through each of the manpages in apertium and lttoolbox, checking that each of the options listed by --help is documented. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]] <br />
|-<br />
|align=center| {{sc|research}} || 3.&nbsp;Easy || Create manually tagged corpora: Occitan || Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. ||align=center| 2&ndash;4 || [[User:mginesti|Mireia Ginestí]] <br />
|-<br />
|align=center| {{sc|research}} || 3.&nbsp;Easy || Create manually tagged corpora: Italian || Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. ||align=center| 2&ndash;4 || [[User:mginesti|Mireia Ginestí]] <br />
|-<br />
|align=center| {{sc|research}} || 3.&nbsp;Easy || Create manually tagged corpora: Catalan || Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking the corpus in the es-ca package, and adapting it in terms of the multiwords present in en-ca, but absent in es-ca. ||align=center| 2&ndash;4 || [[User:mginesti|Mireia Ginestí]]<br />
|-<br />
|align=center| {{sc|research}} || 3.&nbsp;Easy || Create manually tagged corpora: Polish || Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|research}} || 3.&nbsp;Easy || Create manually tagged corpora: Czech || Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|research}} || 3.&nbsp;Easy || Create manually tagged corpora: Slovakian || Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|research}} || 3.&nbsp;Easy || Create manually tagged corpora: Russian || Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|research}} || 3.&nbsp;Easy || Create manually tagged corpora: Ukrainian || Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger. ||align=center| 2&ndash;4 || [[User:Jimregan|Jimregan]]<br />
|-<br />
|align=center| {{sc|quality}} || 1.&nbsp;Hard || Improve a language pair: Welsh-English || Find some faults in Welsh-English and fix them. ||align=center| 8&ndash;12 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|quality}} || 1.&nbsp;Hard || Improve a language pair: Breton-French || Find some faults in Breton-French and fix them. ||align=center| 8&ndash;12 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|quality}} || 1.&nbsp;Hard || Improve a language pair: Basque-Spanish || Find some faults in Basque-Spanish and fix them. ||align=center| 8&ndash;12 || [[User:mginesti|Mireia Ginestí]]<br />
|-<br />
|align=center| {{sc|quality}} || 1.&nbsp;Hard || Improve a language pair: French-Esperanto || Find some faults in French-Esperanto and fix them. ||align=center| 8&ndash;12 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 1.&nbsp;Hard || Improve a language pair: Spanish-Esperanto || Find some faults in Spanish-Esperanto and fix them. ||align=center| 8&ndash;12 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 1.&nbsp;Hard || Improve a language pair: Catalan-Esperanto || Find some faults in Catalan-Esperanto and fix them. ||align=center| 8&ndash;12 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 1.&nbsp;Hard || Improve a language pair: English-Esperanto || Find some faults in English-Esperanto and fix them. ||align=center| 8&ndash;12 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|documentation}} || 2.&nbsp;Medium || Document undocumented features: cascaded interchunk || Update the Apertium manual to document cascaded interchunk. ||align=center| 4&ndash;8 || [[User:mlforcada|Mikel L. Forcada]]<br />
|-<br />
|align=center| {{sc|documentation}} || 2.&nbsp;Medium || Document undocumented features: transliteration || Update the Apertium manual to document the transliteration features in lttoolbox. ||align=center| 4&ndash;8 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|quality}} || 1.&nbsp;Hard || Fix some tagger errors in Swedish->Danish || [[apertium-sv-da]] could be improved with a [[Constraint Grammar]]. Find 10 sentences that get wrong translations due to tagging, and write CG rules to fix them. The student should have good knowledge of Swedish, or at least some Scandinavian language. ||align=center| 8&ndash;12 ||[[User:Unhammer|Unhammer]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve Swedish-Danish dictionaries || Add 50 nouns you feel are missing in translations from [[apertium-sv-da|Swedish to Danish]]. ||align=center| 3&ndash;6 || [[User:Jacob Nordfalk|Jacob Nordfalk]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve English-Esperanto dictionaries || Add 50 words you feel are missing in translations from [[apertium-eo-en|English to Esperanto]]. ||align=center| 3&ndash;6 || [[User:Jacob Nordfalk|Jacob Nordfalk]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve Spanish-Esperanto dictionaries || Add 50 words you feel are missing in translations from [[apertium-eo-es|Spanish to Esperanto]]. ||align=center| 3&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve Catalan-Esperanto dictionaries || Add 50 words you feel are missing in translations from [[apertium-eo-ca|Catalan to Esperanto]]. ||align=center| 3&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve French-Esperanto dictionaries || Add 50 words you feel are missing in translations from [[apertium-eo-fr|French to Esperanto]]. ||align=center| 3&ndash;6 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve Spanish-Aragonese || Add 50 nouns you feel are missing in translations from [[apertium-eo-ca|Aragonese to Spanish]]. ||align=center| 3&ndash;6 || [[User:Juanpabl|Juan Pablo Martínez]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve Spanish-Aragonese || Add 50 verbs you feel are missing in translations from [[apertium-eo-ca|Aragonese to Spanish]]. ||align=center| 3&ndash;6 || [[User:Juanpabl|Juan Pablo Martínez]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve Spanish-Aragonese || Add 50 adjectives you feel are missing in translations from [[apertium-eo-ca|Aragonese to Spanish]]. ||align=center| 3&ndash;6 || [[User:Juanpabl|Juan Pablo Martínez]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve Afrikaans-Dutch dictionaries || Add 50 words you feel are missing in translations from [[apertium-af-nl|Afrikaans-Dutch]]. A list of unknown words can be provided.||align=center| 3&ndash;6 || [[User:AureiAnimus|Pim&nbsp;Otte]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve Spanish-Portuguese dictionaries for tourist domain|| Add 50 nouns you feel are missing in translations in the touristic domain from [[apertium-es-pt|Spanish to Portuguese]]. ||align=center| 3&ndash;6 || [[User:Gramirez|Gramirez]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Afrikaans-Dutch tests || Finish the Afrikaans-Dutch pending tests and move the passing tests to a seperate page for regression testing. ||align=center| 3&ndash;6 || [[User:AureiAnimus|Pim&nbsp;Otte]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Afrikaans-Dutch cleanup || Add about 100 missing words to the bidix of Afrikaans-Dutch and possibly the Dutch side ||align=center| 3&ndash;6 || [[User:AureiAnimus|Pim&nbsp;Otte]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать чувашско-русский словарь || Создать чувашско-русский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать чувашско-татарский словарь || Создать чувашско-татарский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать чувашско-башкирский словарь || Создать чувашско-башкирский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать чувашско-якутский словарь || Создать чувашско-якутский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать татарско-русский словарь || Создать татарско-русский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать татарско-турецский словарь || Создать татарско-турецский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать татарско-башкирский словарь || Создать татарско-башкирский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать татарско-якутский словарь || Создать чувашско-якутский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать башкирско-русский словарь || Создать татарско-русский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать башкирско-турецский словарь || Создать татарско-турецский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать башкирско-якутский словарь || Создать чувашско-якутский словарь сиз 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать осетинско-русский словарь || Создать осетинско-русский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Amikeco|Amikeco]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать осетинско-английский словарь || Создать осетинско-английский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Amikeco|Amikeco]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать осетинско-эсперанто словарь || Создать осетинско-эсперанто словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Amikeco|Amikeco]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать бурятско-калмыцкий словарь || Создать бурятско-калмыцкий словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Jargal|Jargal]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать бурятско-якутский словарь || Создать бурятско-якутский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Jargal|Jargal]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать бурятско-тувинский словарь || Создать бурятско-тувинский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Jargal|Jargal]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать русско-якутский словарь || Создать русско-якутский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Halan|Halan]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать англо-якутский словарь || Создать англо-якутский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]] [[User:Halan|Halan]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать русско-удмурстский словарь || Создать русско-удмурстский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать финско-удмурстский словарь || Создать финско-удмурстский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать коми-удмурстский словарь || Создать коми-удмурстский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать луговомарийско-горномарийский словарь || Создать луговомарийско-горномарийский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать финско-горномарийский словарь || Создать луговомарийский-горномарийский словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать луговомарийский-эрзя словарь || Создать луговомарийский-эрзя словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|код}} || 3.&nbsp;Лёгкое || Создать луговомарийский-мокша словарь || Создать луговомарийский-мокша словарь из 100 слов на формате lttoolbox Апертиума. ||align=center| 2&ndash;4 || [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Improve Irish-Manx Gaelic coverage || I can provide a list of the most common Irish words not covered by the bilingual dictionary, and their English translations. Manx translations needed for these. ||align=center| 3&ndash;6 || [[User:Kevin Scannell|Kevin Scannell]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Add gender information to Manx dictionary || Most of the nouns in the Manx dictionary have gender information in place - look up and add any that are missing. ||align=center| 3&ndash;6 || [[User:Kevin Scannell|Kevin Scannell]]<br />
|-<br />
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Proofread Albanian analyser || We have a morphological analyser for Albanian, but it has been written by a non-native speaker and needs to be checked. ||align=center| 6&ndash;10 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|translation}} || 3.&nbsp;Easy || Proofread Catalan-Sardinian dictionary || Go through the Catalan-Sardinian dictionary and check the entries, there are only around a thousand or so. ||align=center| 1&ndash;2 || [[User:Francis Tyers|Francis&nbsp;Tyers]]<br />
|-<br />
|align=center| {{sc|quality}} || 2.&nbsp;Medium || Improve Spanish-Aragonese coverage || Create a corpus from Aragonese Wikipedia. Then, add the top 50-100 frequently used words which are not covered in the Apertium es-an dictionaries.||align=center| 6&ndash;10 || [[User:Juanpabl|Juan Pablo Martínez]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Add toponyms to the Spanish-Aragonese dictionaries || Extract from wikipedia the names in Aragonese for countries in the world, their capital cities, main Spanish cities and municipalities in Aragon, and add them to the es-an dictionaries.||align=center| 6&ndash;10 || [[User:Juanpabl|Juan Pablo Martínez]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Update the Apertium TinyLex J2ME apps for one language pair || Update Apertium TinyLex J2ME packages (http://www.tinylex.com/) to contain the most recent versions of dictionaries for one language pair ||align=center| 4&ndash;6 per package || [[User:mlforcada|Mikel&nbsp;L.&nbsp;Forcada]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Create an Apertium TinyLex J2ME app for a new language pair || Create an Apertium TinyLex J2ME package (http://www.tinylex.com/) from an existing Apertium language pair for a language pair not offered yet in Tinylex ||align=center| 6&ndash;10 per package || [[User:mlforcada|Mikel&nbsp;L.&nbsp;Forcada]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Russian--Spanish (Simple noun phrases) || Write a contrastive grammar of Russian and Spanish for the translation of noun phrases from Russian to Spanish. The grammar should be written as a series of human readable rules, with example sentences. ||align=center| 3 hours || [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Russian--Spanish (Prepositional phrases) || Write a contrastive grammar of Russian and Spanish for the translation of prepositions/prepositional phrases from Russian to Spanish. The grammar should be written as a series of human readable rules, with example sentences. ||align=center| 3 hours || [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Russian--Spanish (Tenses) || Write a contrastive grammar of Russian and Spanish for the translation of verb tenses from Russian to Spanish. The grammar should be written as a series of human readable rules, with example sentences. ||align=center| 3 hours || [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Russian--Spanish (Aspect) || Write a contrastive grammar of Russian and Spanish for the translation of verbal aspect from Russian to Spanish. The grammar should be written as a series of human readable rules, with example sentences. || 3 hours || [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Russian--Spanish (Pronouns) || Write a comprehensive contrastive grammar of Russian and Spanish for the translation of pronouns from Russian to Spanish. The grammar should be written as a series of human readable rules, with example sentences. ||align=center| 3 hours || [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Russian--Spanish (Impersonal constructions) || Write a comprehensive contrastive grammar of Russian and Spanish for the translation of impersonal constructions from Russian to Spanish. The grammar should be written as a series of human readable rules, with example sentences. ||align=center| 3 hours || [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Russian--Spanish (Verbs of motion) || Write a comprehensive contrastive grammar of Russian and Spanish for the translation of verbs of motion from Russian to Spanish. The grammar should be written as a series of human readable rules, with example sentences. ||align=center| 3 hours || [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|research}} || 2.&nbsp;Medium || Contrastive analysis: Russian--Spanish (Particles and adverbs) || Write a comprehensive contrastive grammar of Russian and Spanish for the translation of particles and adverbs from Russian to Spanish, paying special attention to word/constituent order. The grammar should be written as a series of human readable rules, with example sentences. ||align=center| 3 hours || [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>010-exception_deye.fst</code> || Convert the TRmorph <code>010-exception_deye.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>015-exception_obs.fst</code> || Convert the TRmorph <code>015-exception_obs.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>017-exception_i.fst</code> || Convert the TRmorph <code>017-exception_i.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>020-compn.fst</code> || Convert the TRmorph <code>020-compn.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>040-exception_ben.fst</code> || Convert the TRmorph <code>040-exception_ben.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>050-exception_su.fst</code> || Convert the TRmorph <code>050-exception_su.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>060-xception_del_bS.fst</code> || Convert the TRmorph <code>060-xception_del_bS.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>070-exception_del_buff.fst</code> || Convert the TRmorph <code>070-exception_del_buff.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>080-vowel_epenth.fst</code> || Convert the TRmorph <code>080-vowel_epenth.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>090-duplication.fst</code> || Convert the TRmorph <code>090-duplication.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>100-fs_devoicing.fst</code> || Convert the TRmorph <code>100-fs_devoicing.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>110-v_assimilation.fst</code> || Convert the TRmorph <code>110-v_assimilation.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>120-passive_ln.fst</code> || Convert the TRmorph <code>120-passive_ln.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>130-exception_yor.fst</code> || Convert the TRmorph <code>130-exception_yor.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>140-v_harmony.fst</code> || Convert the TRmorph <code>140-v_harmony.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>phon+bm.fst</code> || Convert the TRmorph <code>phon+bm.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph phonological rule conversion: <code>phon.fst</code> || Convert the TRmorph <code>phon.fst</code> into XFST syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph lexicon conversion: Nouns || Convert the TRmorph noun lexicon into [[lexc]] syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph lexicon conversion: Verbs || Convert the TRmorph verb lexicon into [[lexc]] syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph lexicon conversion: Adjectives || Convert the TRmorph adjective lexicon into [[lexc]] syntax and test it. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || TRmorph lexicon conversion: Closed categories and adverbs || Convert the TRmorph closed categories and adverb lexicons into [[lexc]] syntax and test them. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]], [[User:Zfe|Zfe]], [[User:Hectoralos|Hèctor&nbsp;Alòs]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 nouns (1) || Add 500 nouns to the Kazakh lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 nouns (2) || Add 500 nouns to the Kazakh lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 nouns (3) || Add 500 nouns to the Kazakh lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 nouns (4) || Add 500 nouns to the Kazakh lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 verbs (1) || Add 500 verbs to the Kazakh lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 verbs (2) || Add 500 verbs to the Kazakh lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 verbs (3) || Add 500 verbs to the Kazakh lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 verbs (4) || Add 500 verbs to the Kazakh lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 adjectives (1) || Add 500 adjectives to the Kazakh lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 adjectives (2) || Add 500 adjectives to the Kazakh lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 adjectives (3) || Add 500 adjectives to the Kazakh lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 500 adjectives (4) || Add 500 adjectives to the Kazakh lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh lexicon: add 50 adverbs || Add 50 adverbs to the Kazakh lexicon in [[lexc]], avoiding compositional forms of verbs and nouns. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Kazakh--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Mongolian bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Kazakh--Mongolian [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Kazakh--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Russian bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Kazakh--Russian [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kazakh--Kyrgyz bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Kazakh--Kyrgyz [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Kyrgyz--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Kyrgyz--Russian bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Kyrgyz--Russian [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 nouns (1) || Add 500 nouns to the Mongolian lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 nouns (2) || Add 500 nouns to the Mongolian lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 nouns (3) || Add 500 nouns to the Mongolian lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 nouns (4) || Add 500 nouns to the Mongolian lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 100 nouns with complete paradigms (1)|| Add 100 nouns to the Mongolian lexicon in [[lexc]], along with all paradigm information. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 100 nouns with complete paradigms (2)|| Add 100 nouns to the Mongolian lexicon in [[lexc]], along with all paradigm information. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 100 nouns with complete paradigms (3) || Add 100 nouns to the Mongolian lexicon in [[lexc]], along with all paradigm information. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 verbs (1) || Add 500 verbs to the Mongolian lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 verbs (2) || Add 500 verbs to the Mongolian lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 verbs (3) || Add 500 verbs to the Mongolian lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 verbs (4) || Add 500 verbs to the Mongolian lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 adjectives (1) || Add 500 adjectives to the Mongolian lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 adjectives (2) || Add 500 adjectives to the Mongolian lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 adjectives (3) || Add 500 adjectives to the Mongolian lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 500 adjectives (4) || Add 500 adjectives to the Mongolian lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Mongolian lexicon: add 50 adverbs || Add 50 adverbs to the Mongolian lexicon in [[lexc]], avoiding compositional forms of verbs and nouns. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 nouns (1) || Add 500 nouns to the Buriad lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 nouns (2) || Add 500 nouns to the Buriad lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 nouns (3) || Add 500 nouns to the Buriad lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 nouns (4) || Add 500 nouns to the Buriad lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 verbs (1) || Add 500 verbs to the Buriad lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 verbs (2) || Add 500 verbs to the Buriad lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 verbs (3) || Add 500 verbs to the Buriad lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 verbs (4) || Add 500 verbs to the Buriad lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 adjectives (1) || Add 500 adjectives to the Buriad lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 adjectives (2) || Add 500 adjectives to the Buriad lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 adjectives (3) || Add 500 adjectives to the Buriad lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 500 adjectives (4) || Add 500 adjectives to the Buriad lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad lexicon: add 50 adverbs || Add 50 adverbs to the Buriad lexicon in [[lexc]], avoiding compositional forms of verbs and nouns. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Buriad--Mongolian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Mongolian bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Buriad--Mongolian [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Buriad--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Buriad--Russian bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Buriad--Russian [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Altay--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Altay--Russian bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Altay--Russian [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Tuvan--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Tuvan--Russian bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Tuvan--Russian [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 nouns (1) || Add 500 nouns to the Uzbek lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 nouns (2) || Add 500 nouns to the Uzbek lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 nouns (3) || Add 500 nouns to the Uzbek lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 nouns (4) || Add 500 nouns to the Uzbek lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 verbs (1) || Add 500 verbs to the Uzbek lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 verbs (2) || Add 500 verbs to the Uzbek lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 verbs (3) || Add 500 verbs to the Uzbek lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 verbs (4) || Add 500 verbs to the Uzbek lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 adjectives (1) || Add 500 adjectives to the Uzbek lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 adjectives (2) || Add 500 adjectives to the Uzbek lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 adjectives (3) || Add 500 adjectives to the Uzbek lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 500 adjectives (4) || Add 500 adjectives to the Uzbek lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek lexicon: add 50 adverbs || Add 50 adverbs to the Uzbek lexicon in [[lexc]], avoiding compositional forms of verbs and nouns. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Uzbek--Russian [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Russian bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Uzbek--Russian [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Uzbek--Kazakh [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kazakh bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Uzbek--Kazakh [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Kyrgyz bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Uzbek--Kyrgyz [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Uzbek--Turkish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Uzbek--Turkish bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Uzbek--Turkish [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 nouns (1) || Add 500 nouns to the Udmurt lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 nouns (2) || Add 500 nouns to the Udmurt lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 nouns (3) || Add 500 nouns to the Udmurt lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 nouns (4) || Add 500 nouns to the Udmurt lexicon in [[lexc]]. ||align=center| 5-8 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 verbs (1) || Add 500 verbs to the Udmurt lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 verbs (2) || Add 500 verbs to the Udmurt lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 verbs (3) || Add 500 verbs to the Udmurt lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 verbs (4) || Add 500 verbs to the Udmurt lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 adjectives (1) || Add 500 adjectives to the Udmurt lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 adjectives (2) || Add 500 adjectives to the Udmurt lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 adjectives (3) || Add 500 adjectives to the Udmurt lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 500 adjectives (4) || Add 500 adjectives to the Udmurt lexicon in [[lexc]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]|}<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt lexicon: add 50 adverbs || Add 50 adverbs to the Udmurt lexicon in [[lexc]], avoiding compositional forms of verbs and nouns. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Udmurt--Komi [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Komi bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Udmurt--Komi [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Udmurt--Mari [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mari bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Udmurt--Mari [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Udmurt--Finnish [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Finnish bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Udmurt--Finnish [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 nouns (1) || Add 500 nouns to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 nouns (2) || Add 500 nouns to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 nouns (3) || Add 500 nouns to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 nouns (4) || Add 500 nouns to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 verbs (1) || Add 500 verbs to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 verbs (2) || Add 500 verbs to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 verbs (3) || Add 500 verbs to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 verbs (4) || Add 500 verbs to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 adjectives (1) || Add 500 adjectives to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 adjectives (2) || Add 500 adjectives to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 adjectives (3) || Add 500 adjectives to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 500 adjectives (4) || Add 500 adjectives to the Udmurt--Mordvin [[bidix]]. ||align=center| 12-16 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Udmurt--Mordvin bilingual dictionary: add 50 adverbs || Add 50 adverbs to the Udmurt--Mordvin [[bidix]]. ||align=center| 3-5 hours || [[User:Francis Tyers|Francis Tyers]], [[User:Firespeaker|Firespeaker]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Fix memory hogging in lttoolbox compound analyser || Described in [http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=109 bug report 109], the compound analyser in lttoolbox seems to cache too much without releasing memory. Fix this bug so it keeps memory usage constant without running slower and slower for every line of input. Requires C++ knowledge. ||align=center| 12-16 hours || [[User:Unhammer]]<br />
|-<br />
|align=center| {{sc|code}} || 3.&nbsp;Easy || Proofread 100 entries in the Serbo-Croatian morphological analyser || Go through a list of a 100 words and check their morphological paradigms. Correct typos and other errors in word entries and paradigms. If a word is in a wrong paradigm, assign it an other. ||align=center| 3-5 hours || [[User:Krvoje|Hrvoje Peradin]], [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|code}} || 2.&nbsp;Medium || Proofread 200 entries in the Serbo-Croatian morphological analyser || Go through a list of a 200 words and check their morphological paradigms. Correct typos and other errors in word entries and paradigms. If a word is in a wrong paradigm, assign it an other. ||align=center| 8-10 hours || [[User:Krvoje|Hrvoje Peradin]], [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Proofread 400 entries in the Serbo-Croatian morphological analyser || Go through a list of a 400 words and check their morphological paradigms. Correct typos and other errors in word entries and paradigms. If a word is in a wrong paradigm, assign it an other. ||align=center| 13-15 hours || [[User:Krvoje|Hrvoje Peradin]], [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Increase coverage for the Serbo-Croatian - Macedonian language pair || Add 80 words from a frequency list, assing them a paradigm in the Serbo-Croatian analyser, translate them, put the translation in the bidix and assign a paradigm for the translation in the Macedonian analyser. ||align=center| 13-15 hours || [[User:Krvoje|Hrvoje Peradin]], [[User:Francis Tyers|Francis Tyers]]<br />
|-<br />
|align=center| {{sc|code}} || 1.&nbsp;Hard || Even up the coverage of the Serbo-Croatian and Macedonian morphological analyser || There are words in the Macedonian morphological analyser which do not have a pair in the Serbo-Croatian analyser. Extract a 100, translate them, add them to the bidix and assign a paradigm for each one of them in the Serbo-Croatian analyser. ||align=center| 13-15 hours || [[User:Krvoje|Hrvoje Peradin]], [[User:Francis Tyers|Francis Tyers]]<br />
|}<br />
[[Category:Google Code-in]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Kiel_aldoni_novan_lingvoparon&diff=30377Kiel aldoni novan lingvoparon2011-12-21T20:01:25Z<p>Objectivesea: moved Kiel aldoni novan lingvan duon to Kiel aldoni novan lingvoparon:&#32;"Paro" is a better translation for "pair"; as a prefix, "duon-" actually means "half"</p>
<hr />
<div>{{TOCD}}<br />
<br />
''Ĉi tiu paĝo estas tradukata. Bonvolu helpu!''<br />
<br />
Tiu dokumento priskribas kiel oni komencas aldoni novan lingvan duon por la maŝintraduka sistemo Apertium komencante je nenio.<br />
<br />
La dokumento ne antaŭsupozas ian scion pri lingvoscienco, aŭ maŝintraduko preter ke vi povas distingi inter nomoj, verboj, prepozicioj k.t.p.<br />
<br />
==Enkonduko==<br />
<br />
Apertium estas, kiel vi eble jam komprenis, maŝintraduka sistemo. Nu, ne tute; ĝi estas maŝintraduka estrado. Ĝi provizas motoron kaj ilaron, kiujn vi povas uzi por krei viajn proprajn maŝintradukajn sistemojn. Vi devas sole provizi la datumojn. La datumoj, je baza nivelo, konsistas el tri vortaroj kaj iuj reguloj (por aliordigi vortojn, kaj aliaj gramatikaj aferoj).<br />
<br />
Por pli detala enkonduko pri kiel ĝi funkcias, vi povas trovi iujn bonegajn artikolojn ĉe la projekta retejo apertium.sourceforge.net.<br />
<br />
==Vi bezonos==<br />
<br />
* [[lttoolbox]] (>= 3.0.0)<br />
* libxml iloj (xmllint etc.)<br />
* apertium (>= 3.0.0)<br />
* teksta redaktilo (aŭ speciala redaktilo por XML-formato, se vi preferus tian)<br />
<br />
Tiu ĉi dokumento ne priskribas kiel oni instalas ĉi tiujn pakaĵojn. Por plia informo pri tio, bonvolu rigardu la dokumentadan parton el la retejo de Apertium.<br />
<br />
==Kio estas lingvoparo?==<br />
La maŝintradukilo Apertium estas speco supraĵa-transformada. Tio simple signifas ke ĝi uzas vortarojn kaj supraĵajn regularojn por la transformado. Supraĵa-transformado diferencas el profunda transformado per tio, ke ĝi ne analizas la plenan sintakson; tipe, la reguloj traktas grupojn de leksikonaj unueroj, anstataŭ trakti analizarbaĵojn. Je baza nivelo, estas tri ĉefaj vortaroj:<br />
# La morfema vortaro por lingvo xx: tiu ĉi enhavas la deklinaciajn regulojn de lingvo xx. En tiu ĉi ekzemplo, ĝi nomiĝas apertium-sh-en.sh.dix <br />
# La morfema vortaro por lingvo yy: tiu ĉi enhavas la deklinaciajn regulojn de lingvo yy. En tiu ĉi ekzemplo, ĝi nomiĝas apertium-sh-en.en.dix <br />
# Dulingva vortaro: ĝi enhavas rilatojn inter la vortoj kaj simboloj trans la du lingvoj. En tiu ĉi ekzemplo, ĝi nomiĝas apertium-sh-en.sh-en.dix <br />
<br />
Por traduki, oni povas uzi ambaŭ lingvojn kiel fonto aŭ celo. Ĉi tioj estas terminoj relativaj.<br />
<br />
Ekzistas ankaŭ du dosieroj por transformado-regulojn. Tiuj ĉi reguloj priskribas la aliordigadon de vortoj en frazoj: ekzemple chat noir -> kato nigra -> nigra kato. Ĝi priskribas ankaŭ la konkordo seksa kaj numera, k.t.p. Oni povas ankaŭ uzi la regulojn por enmeti aŭ forviŝi leksikonerojn, kiel priskribos malfrue. Tiuj ĉi dosieroj estas: <br />
* Transformado-reguloj por lingvo xx al lingvo yy: tiu ĉi dosiero enhavas regulojn por ŝanĝi lingvon xx al lingvon yy. En tiu ĉi ekzemplo, ĝi estas: apertium-sh-en.trules-sh-en.xml <br />
* Transformado-reguloj por lingvo yy al lingvo xx: tiu ĉi dosiero enhavas regulojn por ŝanĝi lingvon yy al lingvon xx. En tiu ĉi ekzemplo, ĝi estos: apertium-sh-en.trules-en-sh.xml <br />
<br />
Multaj lingvoparoj nuntempe disponeblaj havas ankaŭ aliajn dosierojn, sed ni ne priskribas tiujn dosierojn tie ĉi. Oni bezonas sole la menciitajn dosierojn por krei funkciadan sistemon.<br />
<br />
==Lingvoparo==<br />
Oni povus vidi per la dosieronomoj ke tiu ĉi klarigo uzas la ekzemplon pri la traduko el serba-kroata al angla por klarigi kiel oni kreas bazan sistemon. Tiu ĉi paro ne estas ideala, ĉar la Apertium-sistemo funkcias pli bone por pli proksime rilataj lingvoj; por la simplaj ekzemploj en tiu ĉi dokumento, ne gravas. <br />
<br />
==Mallonga noto pri terminoj==<br />
Nun ekzistas numero da terminoj kiuj oni devas kompreni antaŭ kontinui.<br />
<br />
La unua estas '''lemo'''. '''Lemo''' estas la citaĵa formo de vorto. Ĝi estas la vorto sen ia gramatika informaĵo. Ekzemple, la lemo de la vorto ''katojn'' estas ''kat'' sen ''-o''. En Esperanto, substantivoj estas tipe en la singulara, nominativa formo. Por verboj, la lemo estas la infinitivo sen ''-i''; ekzemple la lemo de ''amis'' estas ''am''.<br />
<br />
La dua termino estas ''simbolo''. En la kunteksto de la Apertium sistemo, ''simbolo'' signifas gramatikan etikedon. La vorto ''katoj'' estas plurala substantivo; do ĝi havos substantivan simbolon kaj pluralan simbolon. Je la enigo kaj eligo de Apertium modjuloj, tiuj etikedoj aperas inter angulaj krampoj, kiel sekve:<br />
<br />
* <n>; por substantivo.<br />
* <pl>; por pluralo.<br />
<br />
Aliaj ekzemploj de simboloj estas <sg>; por singularo, <p1> por unua persono, <pri> por indiki estantecon, k.t.p. Simbolo inter krampoj nomiĝas ankaŭ etikedo. Estas notinda ke en multaj da nuntempe disponigitaj lingvaj duoj, la simbolaj difinoj estas akronimoj aŭ mallongigitoj de katalunaj vortoj. Ekzemple, vbhaver devenas el ''vb'' (kiu signifas: verbo) kaj ''haver'' ("havi" en kataluna lingvo). Simboloj estas difinitaj inter <sdef> etikedoj kaj uzitaj inter <nowiki><s></nowiki> etikedoj.<br />
<br />
La tria vorto estas ''paradigmo''. En la kunteksto de la Apertiuma sistemo, paradigmo priskribas la fleksion de vortaroj. En la morfema vortaro, lemo (vidu supre) estas ligita al paradigmo kiu priskribas la fleksion de la lemo; do oni ne devas provizi ĉiujn finaĵojn de la lemo.<br />
<br />
Konsideru la sekvan ekzemplon kiu montras la utilon de tiu ĉi skemo: oni volas konservi la anglajn adjektivojn ''happy'' kaj ''lazy''; anstataŭ konservi du tre similajn vortarojn kiel:<br />
<br />
* happy, happ (y, ier, iest), kaj<br />
* lazy, laz (y, ier, iest),<br />
<br />
oni simple konservas la ŝablonon de unu (ekzemple ''happy'') kaj tiam signifas ke "''lazy'' fleksias kiel ''happy''", kaj "''friendly'' fleksias kiel ''happy''", "''naughty'' fleksias kiel ''happy''" k.t.p. En tiu ĉi ekzemplo, ''happy'' estas la ŝablono aŭ paradigmo kiu priskribas la fleksion de la aliaj vortoj. La preciza priskribo de la paradigma difino estas baldaŭ priskribota. Paradigmoj estas difinitaj inter ''<pardef>''-aj etikedoj, kaj uzita inter ''<par>''-aj etikedoj.<br />
<br />
==Kiel komenci==<br />
<br />
==Unulingvaj vortaroj==<br />
Ni komencu per fari nian unuan fontlingvan vortaron. La vortaro estas XML-dosiero. Ruligu vian tekstilon kaj tajpu la sekvajn:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<dictionary><br />
<br />
</dictionary><br />
</pre><br />
<br />
Nu, la dosiero difinas ke ni volas komenci vortaron. Por fari ĝin utilan, ni devas aldoni iujn enskribigojn. La unua estas la alfabeto: tiu ĉi difinas la signaron kiun oni uzos en la vortaro (serba-kroata tiukaze). Ĝi aperos kiel la sekvajn (enhavita ĉiujn da la signoj en la serba-kroata alfabeto)<br />
<pre><br />
<alphabet>ABCČĆDDžĐEFGHIJKLLjMNNjOPRSŠTUVZŽabcčćddžđefghijklljmnnjoprsštuvzž</alphabet><br />
</pre><br />
<br />
Metu la alfabeto malsupre de la <dictionary> etikedo.<br />
<br />
Sekve, ni devas difini iujn simbolojn. Ni komencu per la simplaj aĵoj: singularaj (sg) kaj pluralaj (pl) substantivoj (n): <br />
<pre><br />
<sdefs><br />
<sdef n="n"/><br />
<sdef n="sg"/><br />
<sdef n="pl"/><br />
</sdefs><br />
</pre><br />
La simbolaj nomoj ne devas esti tiel malgrandaj. Fakte, oni povas plene skribi tiujn ĉi. Tamen, vi ofte tajpos tiujn ĉi, do estas prudenta mallongigi. <br />
<br />
Bedaŭrinde, ĝi ne estas tiel simpla. Substantivoj en serba-kroata fleksias laŭ pli ol sole numero; ili fleksias ankaŭ laŭ genro kaj kazo. Tamen, ni alprenu por tiu ekzemplo ke la substantivo estas vira kaj en la nominativa kazo (kompleta ekzemplo troviĝas je la fino de tiu ĉi dokumento).<br />
<br />
Sekve, ni devas difini parto por la paradigmoj:<br />
<pre><br />
<pardefs><br />
<br />
</pardefs><br />
</pre><br />
kaj vortara parto:<br />
<pre><br />
<section id="main" type="standard"><br />
<br />
</section><br />
</pre><br />
Estas du tipoj de partoj: <br />
#Standarda parto kiu enhavas vortojn, enklitojn (kiuj estas tipoj de postafiksoj), k.t.p., <br />
#Nestata parto kiu tipe enhavas interpunkcio k.t.p. Ni ne havas nestatn parton tie ĉi, kvankam ni vidos ĝin pli poste.<br />
<br />
Do, nia dosiero devus nun aperi iom kiel:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<dictionary><br />
<sdefs><br />
<sdef n="n"/><br />
<sdef n="sg"/><br />
<sdef n="pl"/><br />
</sdefs><br />
<pardefs><br />
<br />
</pardefs><br />
<section id="main" type="standard"><br />
<br />
</section><br />
</dictionary><br />
</pre><br />
Nun ni havas la vortara ostaro kaj ni povas komenci per aldoni substantivon. Ni uzu 'gramofon' (kiuj signifas 'gramofono').<br />
<br />
Pro tio ke ni havas neniun paradigmon, ni devas difini paradigmon. <br />
<br />
Memoru ke ni alprenas ke substantivoj havas virajn genrojn kaj nominativajn kazojn. La singulara formo de la substantivo estas 'gramofon' kaj la plurala estas 'gramofoni'. Do:<br />
<pre><br />
<pardef n="gramofon__n"><br />
<e><br />
<p><br />
<l/><br />
<r><s n="n"/><s n="sg"/></r><br />
</p><br />
</e><br />
<e><br />
<p><br />
<l>i</l><br />
<r><s n="n"/><s n="pl"/></r><br />
</p><br />
</e><br />
</pardef><br />
</pre><br />
Rimarku: la '<l/>' (egalvalora ol <l></l>) signifas ke ne estas pluaj aĵoj kiuj estas aldonota al la radiko de la singulara. <br />
<br />
Tio ĉi eble ŝajnas kiel iom longa maniero por priskribi ĝin, sed estas bonaj rezonoj kaj ĝi iĝas baldaŭ pli naturan. Vi eble scivolas kion signifas la <e>, <p>, <l> kaj <r>. Nu:<br />
* e, signifas enskribigo.<br />
* p, signifas duon ("pair" en angla).<br />
* l, signifas maldekstro ("left" en angla).<br />
* r, signifas dekstro ("right" en angla).<br />
<br />
Kiel dekstro kaj maldekstro? Nu, la morfemaj vortaroj estas kompilota al finiaj ŝtataj maŝinoj. Kompili tiujn ĉi de maldekstro al dekstro kreas analizojn el vortoj, kaj de dekstro al maldekstro kreas vortojn el analizoj. Ekzemple:<br />
<pre><br />
* gramofoni (maldekstro al dekstro) gramofon<n><pl> (analizo)<br />
* gramofon<n><pl> (dekstro al maldekstro) gramofoni (vorto)<br />
</pre><br />
Nun, ni ĵus difinis paradigmon kaj ni devas ligi ĝin al sia lemo, ''gramafon''. Ni metu tiun ĉi en la parto kiun ni difinis.<br />
<br />
Oni devas meti la sekvan enskribon en <section>: <br />
<pre><br />
<e lm="gramofon"><i>gramofon</i><par n="gramofon__n"/></e><br />
</pre><br />
Rapida superrigardo de la mallongigoj:<br />
* lm, estas por lemo.<br />
* i, estas por idento (la maldekstro kaj la dekstro estas la sama).<br />
* par, estas por paradigmo.<br />
<br />
Tiu enskribo donas la limo de la vorto, ''gramofon'', la radiko, ''gramofon'' kaj la paradigmo 'gramofon__n' laŭ kiu la vorto fleksias. La malsameco inter la lemo kaj la radiko estas ke la limo estas la citada formo de la vorto, dum la radiko estas la subĉeno de la lemo al kiu postfiksoj estas aldonota. Tiu iĝos pli klara kiam ni vidigas enskribo kie la du malsamas.<br />
<br />
Ni estas nun preta testi la vortaron. Konservu ĝin kaj reiru al la ŝelo. Ni unue devas kompili ĝin (uzante lt-comp), tiam ni povas testi ĝin (uzante lt-proc).<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh.dix sh-en.automorf.bin<br />
</pre><br />
Devus doni la eligo:<br />
<pre><br />
main@standard 12 12<br />
</pre><br />
Dum ni estas kompilata ĝin maldekstre al dekstre, ni estas kreanta analizilo. Ni faru ankaŭ generilo: <br />
<pre><br />
$ lt-comp rl apertium-sh-en.sh.dix sh-en.autogen.bin<br />
</pre><br />
Je tiu fazo, la ordono devus doni la saman eligon.<br />
<br />
Ni povas nun testi tiujn ĉi. Ruligu lt-proc uzante la analizilo: <br />
<pre><br />
$ lt-proc sh-en.automorf.bin<br />
</pre><br />
Nun, provu ĝin. Tajpu ''gramofoni'' (gramofonoj) kaj vidu la eligon:<br />
<pre><br />
^gramofoni/gramofon<n><pl>$<br />
</pre><br />
Nun, por la angla vortaro, faru la sama aĵoj, sed uzu la angla vorto ''gramophone'' por gramofono, kaj ŝanĝu la pluralan fleksion. Kio fari se vi deziras uzi la pli ĝustan vorton 'record player'? Nu, oni vidos pli poste kiel fari ĝin.<br />
<br />
Vi devus nun havas du dosierojn en la dosierujoj.<br />
* apertium-sh-en.sh.dix kiu enhavas (tre) bazan serba-kroatan morfeman vortaron, kaj <br />
* apertium-sh-en.en.dix kiu enhavas (tre) bazan anglan morfeman vortaron.<br />
<br />
===Dulingva vortaro===<br />
Nu ni havas du morfemajn vortarojn. La sekva farendaĵo estas la dulingva vortaro. Tiu priskribas asociojn inter vortoj. Ĉiuj vortaroj uzas la saman aranĝon (kiu estas precizigita en la DTD, dix.dtd).<br />
<br />
Kreu novan dosieron, apertium-sh-en.sh-en.dix kaj enmetu la bazan ostaron:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<dictionary><br />
<alphabet/><br />
<sdefs><br />
<sdef n="n"/><br />
<sdef n="sg"/><br />
<sdef n="pl"/><br />
</sdefs><br />
<br />
<section id="main" type="standard"><br />
<br />
</section><br />
</dictionary><br />
</pre><br />
Nun ni devas aldoni enskribon por traduki inter du vortoj. Io kiel:<br />
<pre><br />
<e><p><l>gramofon<s n="n"/></l><r>gramophone<s n="n"/></r></p></e><br />
</pre><br />
Pro tio ke ekzistas multaj da tiuj enskriboj, oni skribas ĝin je unu linio por faciligi lego de tiu ĉi dosiero. Denove, uzante la 'l' kaj 'r' ĉu ne? Nu, oni kompilas ĝin maldekstre al dekstre por fari la serba-kroatan → anglan vortaron kaj dekstre al maldekstre por fari la anglan → serba-kroatan vortaron.<br />
<br />
Do, kiam vi faris ĝin, ruligu la sekvitajn ordonojn:<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh.dix sh-en.automorf.bin<br />
$ lt-comp rl apertium-sh-en.en.dix sh-en.autogen.bin<br />
<br />
$ lt-comp lr apertium-sh-en.en.dix en-sh.automorf.bin<br />
$ lt-comp rl apertium-sh-en.sh.dix en-sh.autogen.bin<br />
<br />
$ lt-comp lr apertium-sh-en.sh-en.dix sh-en.autobil.bin<br />
$ lt-comp rl apertium-sh-en.sh-en.dix en-sh.autobil.bin<br />
</pre><br />
Por generi la morfemajn analizilojn (automorf), la morfemajn generilojn (autogen) kaj la vortaj interrilataj tabloj (autobil), la "bil" estas por "bilingual" (t.e. dulingva).<br />
<br />
===Transiraj reguloj===<br />
<br />
Do, nun ni havas du morfemajn vortarojn, kaj dulingvan vortaron. Ni devas sole doni la transirajn regulojn por substantivoj. Transiraj regulaj dosieroj havas siajn proprajn DTD-on (transfer.dtd) kiujn oni povas trovi en la Apertium-a enpako. Se vi bezonas krei regulon, estas ofte bona ideo unue konsideri la regulajn dosierojn de aliaj lingvaj duojn. Oni povas reuzi multe da reguloj inter malsamaj lingvaj duoj. Ekzemple, la regulo malsupre estus utila por iu ajn nul-subjekta lingvo.<br />
<br />
Komencu kiel ĉiuj da la aliaj kun baza ostaro:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<transfer><br />
<br />
</transfer><br />
</pre><br />
At the moment, because we're ignoring case, we just need to make a rule that takes the grammatical symbols input and outputs them again.<br />
<br />
We first need to define categories and attributes. Categories and attributes both allow us to group grammatical symbols. Categories allow us to group symbols for the purposes of matching (for example 'n.*' is all nouns). Attributes allow us to group a set of symbols that can be chosen from. For example ('sg' and 'pl' may be grouped a an attribute 'number').<br />
<br />
Lets add the necessary sections:<br />
<pre><br />
<section-def-cats><br />
<br />
</section-def-cats><br />
<section-def-attrs><br />
<br />
</section-def-attrs><br />
</pre><br />
As we're only inflecting, nouns in singular and plural then we need to add a category for nouns, and with an attribute of number. Something like the following will suffice:<br />
<br />
Into section-def-cats add:<br />
<pre><br />
<def-cat n="nom"><br />
<cat-item tags="n.*"/><br />
</def-cat><br />
</pre><br />
This catches all nouns (lemmas followed by <n> then anything) and refers to them as "nom" (we'll see how thats used later).<br />
<br />
Into the section section-def-attrs, add:<br />
<pre><br />
<def-attr n="nbr"><br />
<attr-item tags="sg"/><br />
<attr-item tags="pl"/><br />
</def-attr><br />
</pre><br />
and then<br />
<pre><br />
<def-attr n="a_nom"><br />
<attr-item tags="n"/><br />
</def-attr><br />
</pre><br />
The first defines the attribute nbr (number), which can be either singular (sg) or plural (pl).<br />
<br />
The second defines the attribute a_nom (attribute noun).<br />
<br />
Next we need to add a section for global variables:<br />
<pre><br />
<section-def-vars><br />
<br />
</section-def-vars><br />
</pre><br />
These variables are used to store or transfer attributes between rules. We need only one for now,<br />
<pre><br />
<def-var n="number"/><br />
</pre><br />
Finally, we need to add a rule, to take in the noun and then output it in the correct form. We'll need a rules section...<br />
<pre><br />
<section-rules><br />
<br />
</section-rules><br />
</pre><br />
Changing the pace from the previous examples, I'll just paste this rule, then go through it, rather than the other way round.<br />
<pre><br />
<rule><br />
<pattern><br />
<pattern-item n="nom"/><br />
</pattern><br />
<action><br />
<out><br />
<lu><br />
<clip pos="1" side="tl" part="lem"/><br />
<clip pos="1" side="tl" part="a_nom"/><br />
<clip pos="1" side="tl" part="nbr"/><br />
</lu><br />
</out><br />
</action><br />
</rule><br />
</pre><br />
<br />
The first tag is obvious, it defines a rule. The second tag, pattern basically says: "apply this rule, if this pattern is found". In this example the pattern consists of a single noun (defined by the category item nom). Note that patterns are matched in a longest-match first. So if you have three rules, the first catches "<prn><vblex><n>", the second catches "<prn><vblex>" and the third catches "<n>", the pattern matched, and rule executed will be the first.<br />
<br />
For each pattern, there is an associated action, which produces an associated output, out. The output, is a lexical unit (lu).<br />
<br />
The clip tag allows a user to select and manipulate attributes and parts of the source language (side="sl"), or target language (side="tl") lexical item.<br />
<br />
Let's compile it and test it. Transfer rules are compiled with:<br />
<pre><br />
$ apertium-preprocess-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin<br />
</pre><br />
Which will generate a trules-sh-en.bin file.<br />
<br />
Now we're ready to test our machine translation system. There is one crucial part missing, the part-of-speech (PoS) tagger, but that will be explained shortly. In the meantime we can test it as is:<br />
<br />
First, lets analyse a word, gramofoni:<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin <br />
^gramofon/gramofon<n><pl>$<br />
</pre><br />
Now, normally here the POS tagger would choose the right version based on the part of speech, but we don't have a POS tagger yet, so we can use this little gawk script (thanks to Sergio) that will just output the first item retrieved.<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
^gramofon<n><pl>$<br />
</pre><br />
Now let's process that with the transfer rule:<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin<br />
</pre><br />
It will output:<br />
<pre><br />
^gramophone<n><pl>$^@<br />
</pre><br />
* 'gramophone' is the target language (side="tl") lemma (lem) at position 1 (pos="1").<br />
* '<n>' is the target language a_nom at position 1.<br />
* '<pl>' is the target language attribute of number (nbr) at position 1.<br />
<br />
Try commenting out one of these clip statements, recompiling and seeing what happens.<br />
<br />
So, now we have the output from the transfer, the only thing that remains is to generate the target-language inflected forms. For this, we use lt-proc, but in generation (-g), not analysis mode.<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
<br />
gramophones\@<br />
</pre><br />
And c'est ca. You now have a machine translation system that translates a Serbo-Croatian noun into an English noun. Obviously this isn't very useful, but we'll get onto the more complex stuff soon. Oh, and don't worry about the '@' symbol, I'll explain that soon too.<br />
<br />
Think of a few other words that inflect the same as gramofon. How about adding those. We don't need to add any paradigms, just the entries in the main section of the monolingual and bilingual dictionaries.<br />
<br />
==Bring on the verbs==<br />
<br />
Ok, so we have a system that translates nouns, but thats pretty useless, we want to translate verbs too, and even whole sentences! How about we start with the verb to see. In Serbo-Croatian this is videti. Serbo-Croatian is a null-subject language, this means that it doesn't typically use personal pronouns before the conjugated form of the verb. English is not. So for example: I see in English would be translated as vidim in Serbo-Croatian.<br />
<br />
* Vidim<br />
* see<p1><sg><br />
* I see<br />
<br />
Note: <p1> denotes first person<br />
<br />
This will be important when we come to write the transfer rule for verbs. Other examples of null-subject languages include: Spanish, Romanian and Polish. The also has the effect that while we only need to add the verb in the Serbo-Croatian morphological dictionary, we need to add both the verb, and the personal pronouns in the English morpohlogical dictionary. We'll go through both of these.<br />
<br />
The other forms of the verb videti are: vidiš, vidi, vidimo, vidite, and vide; which correspond to: you see (singular), he sees, we see, you see (plural), and they see.<br />
<br />
There are two forms of you see, one is plural and formal singular (vidite) and the other is singular and informal (vidiš).<br />
<br />
We're going to try and translate the sentence: "Vidim gramofoni" into "I see gramophones". In the interests of space, we'll just add enough information to do the translation and will leave filling out the paradigms (adding the other conjugations of the verb) as an exercise to the reader.<br />
<br />
The astute reader will have realised by this point that we can't just translate vidim gramofoni because it is not a grammatically correct sentence in Serbo-Croatian. The correct sentence would be vidim gramofone, as the noun takes the accusative case. We'll have to add that form too, no need to add the case information for now though, we just add it as another option for plural. So, just copy the 'e' block for 'i' and change the 'i' to 'e' there.<br />
<br />
First thing we need to do is add some more symbols. We need to first add a symbol for 'verb', which we'll call "vblex" (this means lexical verb, as opposed to modal verbs and other types). Verbs have 'person', and 'tense' along with number, so lets add a couple of those aswell. We need to translate "I see", so for person we should add "p1", or 'first person', and for tense "pri", or 'present indicative'.<br />
<pre><br />
<sdef n="vblex"/><br />
<sdef n="p1"/><br />
<sdef n="pri"/><br />
</pre><br />
After we've done this, the same with the nouns, we add a paradigm for the verb conjugation. The first line will be:<br />
<pre><br />
<pardef n="vid/eti__vblex"><br />
</pre><br />
The '/' is used to demarcate where the stems (the parts between the <l> </l> tags) are added to.<br />
<br />
Then the inflection for first person singular:<br />
<pre><br />
<e><br />
<p><br />
<l>im</l><br />
<r>eti<s n="vblex"/><s n="pri"/><s n="p1"/><s n="sg"/></r><br />
</p><br />
</e><br />
</pre><br />
The 'im' denotes the ending (as in 'vidim'), it is necessary to add 'eti' to the <r> section, as this will be chopped off by the definition. The rest is fairly straightforward, 'vblex' is lexical verb, 'pri' is present indicative tense, 'p1' is first person and 'sg' is singular. We can also add the plural which will be the same, except 'imo' instead of 'im' and 'pl' instead of 'sg'.<br />
<br />
After this we need to add a lemma, paradigm mapping to the main section:<br />
<pre><br />
<e lm="videti"><i>vid</i><par n="vid/eti__vblex"/></e><br />
</pre><br />
Note: the content of <i> </i> is the root, not the lemma.<br />
<br />
Thats the work on the Serbo-Croatian dictionary done for now. Lets compile it then test it.<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh.dix sh-en.automorf.bin<br />
main@standard 23 25<br />
$ echo "vidim" | lt-proc sh-en.automorf.bin<br />
^vidim/videti<vblex><pri><p1><sg>$<br />
$ echo "vidimo" | lt-proc sh-en.automorf.bin<br />
^vidimo/videti<vblex><pri><p1><pl>$<br />
</pre><br />
Ok, so now we do the same for the English dictionary (remember to add the same symbol definitions here as you added to the Serbo-Croatian one).<br />
<br />
The paradigm is:<br />
<pre><br />
<pardef n="s/ee__vblex"><br />
</pre><br />
because the past tense is 'saw'. Now, we can do one of two things, we can add both first and second person, but they are the same form. In fact, all forms (except third person singular) of the verb 'to see' are 'see'. So instead we make one entry for 'see' and give it only the 'pri' symbol.<br />
<pre><br />
<e><br />
<p><br />
<l>ee</l><br />
<r>ee<s n="vblex"/><s n="pri"/></r><br />
</p><br />
</e><br />
</pre><br />
and as always, an entry in the main section:<br />
<pre><br />
<e lm="see"><i>s</i><par n="s/ee__vblex"/></e><br />
</pre><br />
Then lets save, recompile and test:<br />
</pre><br />
$ lt-comp lr apertium-sh-en.en.dix en-sh.automorf.bin<br />
main@standard 18 19<br />
<br />
$ echo "see" | lt-proc en-sh.automorf.bin<br />
^see/see<vblex><pri>$<br />
</pre><br />
Now for the obligatory entry in the bilingual dictionary:<br />
<pre><br />
<e><p><l>videti<s n="vblex"/></l><r>see<s n="vblex"/></r></p></e><br />
</pre><br />
(again, don't forget to add the sdefs from earlier)<br />
<br />
And recompile:<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh-en.dix sh-en.autobil.bin<br />
main@standard 18 18<br />
$ lt-comp rl apertium-sh-en.sh-en.dix en-sh.autobil.bin<br />
main@standard 18 18<br />
</pre><br />
Now to test:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin <br />
<br />
^see<vblex><pri><p1><sg>$^@<br />
</pre><br />
We get the analysis passed through correctly, but when we try and generate a surface form from this, we get a '#', like below:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
#see\@<br />
</pre><br />
This '#' means that the generator cannot generate the correct lexical form because it does not contain it. Why is this?<br />
<br />
Basically the analyses don't match, the 'see' in the dictionary is see<vblex><pri>, but the see delivered by the transfer is see<vblex><pri><p1><sg>. The Serbo-Croatian side has more information than the English side requires. You can test this by adding the missing symbols to the English dictionary, and then recompiling, and testing again.<br />
<br />
However, a more paradigmatic way of taking care of this is by writing a rule. So, we open up the rules file (apertium-sh-en.trules-sh-en.xml in case you forgot).<br />
<br />
We need to add a new category for 'verb'.<br />
<pre><br />
<def-cat n="vrb"><br />
<cat-item tags="vblex.*"/><br />
</def-cat><br />
</pre><br />
We also need to add attributes for tense and for person. We'll make it really simple for now, you can add p2 and p3, but I won't in order to save space.<br />
<pre><br />
<def-attr n="temps"><br />
<attr-item tags="pri"/><br />
</def-attr><br />
<br />
<def-attr n="pers"><br />
<attr-item tags="p1"/><br />
</def-attr><br />
</pre><br />
We should also add an attribute for verbs.<br />
<pre><br />
<def-attr n="a_verb"><br />
<attr-item tags="vblex"/><br />
</def-attr><br />
</pre><br />
Now onto the rule:<br />
<pre><br />
<rule><br />
<pattern><br />
<pattern-item n="vrb"/><br />
</pattern><br />
<action><br />
<out><br />
<lu><br />
<clip pos="1" side="tl" part="lem"/><br />
<clip pos="1" side="tl" part="a_verb"/><br />
<clip pos="1" side="tl" part="temps"/><br />
</lu><br />
</out><br />
</action><br />
</rule><br />
</pre><br />
Remember when you tried commenting out the 'clip' tags in the previous rule example and they disappeared from the transfer, well, thats pretty much what we're doing here. We take in a verb with a full analysis, but only output a partial analysis (lemma + verb tag + tense tag).<br />
<br />
So now, if we recompile that, we get:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin<br />
^see<vblex><pri>$^@<br />
</pre><br />
and:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
see\@<br />
</pre><br />
Try it with 'vidimo' (we see) to see if you get the correct output.<br />
<br />
Now try it with "vidim gramofone":<br />
<pre><br />
$ echo "vidim gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
see gramophones\@<br />
</pre><br />
<br />
==But what about personal pronouns?==<br />
<br />
Well, thats great, but we're still missing the personal pronoun that is necessary in English. In order to add it in, we first need to edit the English morphological dictionary.<br />
<br />
As before, the first thing to do is add the necessary symbols:<br />
<pre><br />
<sdef n="prn"/><br />
<sdef n="subj"/><br />
</pre><br />
Of the two symbols, prn is pronoun, and subj is subject (as in the subject of a sentence).<br />
<br />
Because there is no root, or 'lemma' for personal subject pronouns, we just add the pardef as follows:<br />
<pre><br />
<pardef n="prsubj__prn"><br />
<e><br />
<p><br />
<l>I</l><br />
<r>prpers<s n="prn"/><s n="subj"/><s n="p1"/><s n="sg"/></r><br />
</p><br />
</e><br />
</pardef><br />
</pre><br />
With 'prsubj' being 'personal subject'. The rest of them (You, We etc.) are left as an exercise to the reader.<br />
<br />
We can add an entry to the main section as follows:<br />
<pre><br />
<e lm="personal subject pronouns"><i/><par n="prsubj__prn"/></e><br />
</pre><br />
So, save, recompile and test, and we should get something like:<br />
<pre><br />
$ echo "I" | lt-proc en-sh.automorf.bin<br />
^I/PRPERS<prn><subj><p1><sg>$<br />
</pre><br />
<br />
(Note: its in capitals because 'I' is in capitals).<br />
<br />
Now we need to amend the 'verb' rule to output the subject personal pronoun along with the correct verb form.<br />
<br />
First, add a category (this must be getting pretty pedestrian by now):<br />
<pre><br />
<def-cat n="prpers"><br />
<cat-item lemma="prpers" tags="prn.*"/><br />
</def-cat><br />
</pre><br />
Now add the types of pronoun as attributes, we might as well add the 'obj' type as we're at it, although we won't need to use it for now:<br />
<pre><br />
<def-attr n="tipus_prn"><br />
<attr-item tags="prn.subj"/><br />
<attr-item tags="prn.obj"/><br />
</def-attr><br />
</pre><br />
And now to input the rule:<br />
<pre><br />
<rule><br />
<pattern><br />
<pattern-item n="vrb"/><br />
</pattern><br />
<action><br />
<out><br />
<lu><br />
<lit v="prpers"/><br />
<lit-tag v="prn"/><br />
<lit-tag v="subj"/><br />
<clip pos="1" side="tl" part="pers"/><br />
<clip pos="1" side="tl" part="nbr"/><br />
</lu><br />
<b/><br />
<lu><br />
<clip pos="1" side="tl" part="lem"/><br />
<clip pos="1" side="tl" part="a_verb"/><br />
<clip pos="1" side="tl" part="temps"/><br />
</lu><br />
</out><br />
</action><br />
</rule><br />
</pre><br />
This is pretty much the same rule as before, only we made a couple of small changes.<br />
<br />
We needed to output:<br />
<pre><br />
^prpers<prn><subj><p1><sg>$ ^see<vblex><pri>$<br />
</pre><br />
so that the generator could choose the right pronoun and the right form of the verb.<br />
<br />
So, a quick rundown:<br />
<br />
* <lit>, prints a literal string, in this case "prpers"<br />
* <lit-tag>, prints a literal tag, because we can't get the tags from the verb, we add these ourself, "prn" for pronoun, and "subj" for subject.<br />
* <b/>, prints a blank, a space.<br />
<br />
Note that we retrieved the information for number and tense directly from the verb.<br />
<br />
So, now if we recompile and test that again:<br />
<pre><br />
$ echo "vidim gramofone" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
I see gramophones<br />
</pre><br />
Which, while it isn't exactly prize-winning prose (much like this HOWTO), is a fairly accurate translation.<br />
<br />
==Do rakontu min pri la gramofono (Multvortoj)==<br />
Dum 'gramopone' estas angla vorto, ĝi ne estas la pli bona traduko. Oni uzas 'gramophone' tipe por la tre malnova speco kun la nadlo anstataŭ la grifelo kaj nenio plilaŭtigo. Pli bona traduko estus 'record player'. Kvankam oni havas pli ol unu vorto, ni povas trakti ĝin kvazaŭ estas unu vorto per uzi multvortajn (multipalabra) konstruojn.<br />
<br />
Ni ne devas tuŝi la serba-kroatan vortaron - sole la anglan kaj dulingvan. Do malfermu ilin.<br />
<br />
La pluralo de 'record player' estas 'record players', do ĝi uzas la saman paradigmon kiel 'gramophone' (gramophone__n); ni aldonu sole 's'. Nu, ni sole devu aldoni novan eron al la ĉefa sekcio.<br />
<pre><br />
<e lm="record player"><i>record<b/>player</i><par n="gramophone__n"/></e><br />
</pre><br />
La sola aĵo kiu malsamas, estas la uzo de la <b/> etikedo, kvankam ĝi ne estas tute novo, pro tio ke ni vidis ĝin je la regla dosiero.<br />
<br />
Nu, rekompilu kaj testu laŭ la ortodoksa maniero:<br />
<pre><br />
$ echo "vidim gramofone" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
I see record players<br />
</pre><br />
Bonege. Granda bono de la uzo de multvortoj estas ke oni povas traduki idiomajn esprimojn laŭvorte. Ekzemple, la angla frazo "at the moment" tradukus al serba-kroata kiel "trenutno"('trenutak' = ''moment'' kaj 'trenutno' estas adverba formo) &mdash; ne estus ebla traduki ĝin vorte-per-vorte al serba-kroata.<br />
<br />
==Dealing with minor variation==<br />
<br />
Serbo-Croatian typically has a few ways of writing each word because of dialectal variation. It has a cool phonetic writing system so you write how you speak. For example, people speaking in Ijekavian would say "rječnik", while someone speaking Ekavian would say "rečnik", which reflects the differences in pronunciation of the proto-Slavic vowel ''yat''.<br />
<br />
===Analysis===<br />
<br />
There should be a fairly easy way of dealing with this, and there is, using paradigms again. Paradigms aren't only used for adding grammatical symbols, but they can also be used to replace any character/symbol with another. For example, here is a paradigm for accepting both "e" and "je" in the analysis. The paradigm should, as with the others go into the monolingual dictionary for Serbo-Croatian.<br />
<br />
<pre><br />
<pardef n="e_je__yat"><br />
<e><br />
<p><br />
<l>e</l><br />
<r>e</r><br />
</p><br />
</e><br />
<e><br />
<p><br />
<l>je</l><br />
<r>e</r><br />
</p><br />
</e><br />
</pardef><br />
</pre><br />
<br />
Then in the "main section":<br />
<br />
<pre><br />
<e lm="rečnik"><i>r</i><par n="e_je__yat"/><i>čni</i><par n="rečni/k__n"/></e><br />
</pre><br />
<br />
This only allows us to analyse both forms however... more work is necessary if we want to generate both forms.<br />
<br />
===Generation===<br />
<br />
==See also==<br />
<br />
*[[Building dictionaries]]<br />
<br />
[[Category:Documentation]]<br />
[[Category:HOWTO]]<br />
[[Category:Esperanto]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Kiel_aldoni_novan_lingvan_duon&diff=30378Kiel aldoni novan lingvan duon2011-12-21T20:01:25Z<p>Objectivesea: moved Kiel aldoni novan lingvan duon to Kiel aldoni novan lingvoparon:&#32;"Paro" is a better translation for "pair"; as a prefix, "duon-" actually means "half"</p>
<hr />
<div>#REDIRECT [[Kiel aldoni novan lingvoparon]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Talk:Kiel_aldoni_novan_lingvoparon&diff=30379Talk:Kiel aldoni novan lingvoparon2011-12-21T20:01:25Z<p>Objectivesea: moved Talk:Kiel aldoni novan lingvan duon to Talk:Kiel aldoni novan lingvoparon:&#32;"Paro" is a better translation for "pair"; as a prefix, "duon-" actually means "half"</p>
<hr />
<div>== Lingva duo → lingva paro ==<br />
<br />
Ĉu ne pli taŭgos la vorto "paro", ol "duo"? --IP 23:01, 16 December 2009 (UTC)</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Talk:Kiel_aldoni_novan_lingvan_duon&diff=30380Talk:Kiel aldoni novan lingvan duon2011-12-21T20:01:25Z<p>Objectivesea: moved Talk:Kiel aldoni novan lingvan duon to Talk:Kiel aldoni novan lingvoparon:&#32;"Paro" is a better translation for "pair"; as a prefix, "duon-" actually means "half"</p>
<hr />
<div>#REDIRECT [[Talk:Kiel aldoni novan lingvoparon]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=French_and_Esperanto/Outstanding_tests&diff=30372French and Esperanto/Outstanding tests2011-12-21T19:50:55Z<p>Objectivesea: slightly clarified English explanation at top; corrected typo in "naissance"</p>
<hr />
<div>[[Maŝintradukado al Esperanto]]: This is for tests that don't work (yet). <br />
<br />
Tests are categorized according to the kind of work that needs to be done to make them work. <br />
<br />
When a test is working, move it to [[French and Esperanto/Regression tests]].<br />
<br />
If getting a particular test to succeed seems far out in time and it will (probably) never be done, move it to [[French and Esperanto/Rejected tests]].<br />
<br />
When the list is nearly empty, move something from [[French and Esperanto/Proposed future tests]] to here.<br />
<br />
=Klarigo en Esperanto=<br />
<br />
Jen ĉi-sube vi povas aldoni plenajn frazojn, kiuj ne estas bone tradukitaj, kun iliaj ĝustaj tradukoj por la projekto [[Maŝintradukado al Esperanto]]. La frazoj estu realaj aŭ almenaŭ realecaj (t.e. ne tro strangaj aŭ elpensitaj). Eventuale, eblas aldoni nur tradukon de termino aŭ konkreta vorto, sed prefere tion en frazo.<br />
<br />
La celo estas orienti la evoluigantojn de la projekto pri korektindaĵoj, ĉefe pri gramatikaj strukturoj, sed eventuale ankaŭ pri leksikaj eraroj (atentu pri vortoj dusencaj, ĉefe nomoj kaj adjektivoj: nuntempe la tradukilo ne permesas distingi ĉu ekz. "table" estas "tablo" aŭ "tabelo"; sed eblas aldoni frazeologion).<br />
<br />
La modelo estas la jena:<br />
* (fr) ''Elle a été accusée par les grecs'' → Ŝi estis akuzita de la grekoj<br />
Bonvolu konservi la kapon "* (fr)" kaj la simbolon "→" inter la originala frazo kaj la proponata traduko.<br />
Bonvolu komenci la du frazojn per majusklo.<br />
<br />
Vi povas uzi [http://xixona.dlsi.ua.es/testing/index.php?direction=fr-eo la nune evoluigatan version de la tradukilo] (atentu, ĝi nuntempe ne funkcias).<br />
<br />
Se vi rimarkas, ke korektindaĵo estas jam bone tradukita, bonvolu movi ĝin al [[French and Esperanto/Regression tests]].<br />
<br />
Notu: Vi ne nepre devas registriĝi, sed estas bona ideo; Post ĉiu redakto ne-registrito devas entajpi elkalkulaĵon por povi registri la ŝanĝojn.<br />
<br />
<br />
=Klaku sur [Edit] dekstre, kopiu ĉi tiujn du liniojn, kaj modifu la kopion, skribante vian nomon, la misfunkciantan frazon kaj ĝustan tradukon (atentu usklecon en la ekzemploj!)=<br />
* (fr) ''Elle a été accusée par les grecs'' → Ŝi estis akuzita de la grekoj<br />
<br />
=Hèctor Alòs i Font=<br />
<br />
<br />
==Negativaj frazoj==<br />
<br />
=== être ===<br />
* (fr) ''Je ne le suis pas'' → Mi ne estas tia<br />
* (fr) ''Ils ne le sont pas'' → Ili ne estas tiaj<br />
* (fr) ''Je ne l'ai pas été'' → Mi ne estis tia<br />
* (fr) ''Ils ne l'ont pas été'' → Ili ne estis tiaj<br />
* (fr) ''Je ne l'avais pas été'' → Mi ne estis tia<br />
* (fr) ''Ils ne l'avaient pas été'' → Ili ne estis tiaj<br />
* (fr) ''Ils n'avaient pas beaucoup été'' → Ili ne multe estis<br />
* (fr) ''Je ne l'ai pas beaucoup été'' → Mi ne multe estis tia<br />
* (fr) ''Je ne l'avais pas beaucoup été'' → Mi ne multe estis tia<br />
* (fr) ''Ils ne l'avaient pas beaucoup été'' → Ili ne multe estis tiaj<br />
<br />
== pasivo ==<br />
* (fr) ''Ne pas s'être opposé'' → Ne esti kontraŭstarinta sin<br />
* (fr) ''Ne pas s'être beaucoup opposé'' → Ne multe esti kontraŭstarinta sin<br />
<br />
=== ne... que ===<br />
* (fr) ''Ils ne travaillent que pour manger'' → Ili laboras nur por manĝi<br />
* (fr) ''Ils n'ont travaillé que pour manger'' →Ili laboris nur por manĝi<br />
* (fr) ''Ils ne sont venus que pour manger'' → Ili venis nur por manĝi<br />
* (fr) ''Ils ne veulent travailler que pour manger'' → Ili volas labori nur por manĝi<br />
* (fr) ''Ils n'ont voulu travailler que pour manger'' → Ili volis labori nur por manĝi<br />
* (fr) ''Ils ne sont venus travailler que pour manger'' → Ili venis labori nur por manĝi<br />
<br />
* (fr) ''Ils ne sont plus utilisés que pour ça'' → Ili estas uzataj nur por tio<br />
* (fr) ''Il ne reste plus que du pain'' → Restas nur pano<br />
<br />
* (fr) ''Pour ne faire qu'un récital traditionnel'' → Por fari nur tradician recitalon<br />
<br />
===autres===<br />
* (fr) ''Il n’y a plus de résidents'' → Ne plu estas loĝantoj<br />
* (fr) ''Nous n'avons aucune intention'' → Ni havas neniun intencon<br />
<br />
== article pluriel, partitif ==<br />
* (fr) ''Donne-moi du pain et du beurre'' → Donu al mi panon kaj buteron<br />
* (fr) ''Compter des objets, des animaux ou des personnes'' → Kalkuli objektojn, bestojn aŭ personojn<br />
* (fr) ''Il n'y avait que des enfants et des vieillards'' → Estis nur infanoj kaj maljunuloj<br />
<br />
== -u ==<br />
* (fr) ''Je veux que tu viennes'' → Mi volas ke vi venu<br />
* (fr) ''J'aime que tu viennes'' → Mi ŝatas ke vi venas<br />
* (fr) ''J'aurais aimé que tu sois venu'' → Mi estus ŝatinta ke vi estu veninta<br />
* (fr) ''J'aurais aimé que tu aies mangé'' → Mi estus volinta ke vi estu manĝinta<br />
<br />
== verbo-grupoj==<br />
* (fr) ''Ils m'ont toujours intéressé.'' → Ili ĉiam interesis min<br />
<br />
== oble ==<br />
* (fr) ''60 fois plus massive'' → 60-oble pli masiva<br />
* (fr) ''50-60 fois plus massive'' → 50-60-oble pli masiva<br />
<br />
==komparoj==<br />
* (fr) ''Tant au Nouveau-Brunswick qu'au Québec'' → Tiel en Nov-Brunsviko kiel en Kebekio<br />
* (fr) ''Ses domaines français sont alors aussi étendus que ceux du roi lui-même'' → Liaj francaj fakoj estas tiam tiel vastaj kiel tiuj de la reĝo mem<br />
* (fr) ''Il faut attendre un peu plus longtemps que prévu'' → Necesas atendi iom pli longe ol antaŭvidite<br />
<br />
==tel... que==<br />
* (fr) ''La réussite fut d'une telle ampleur que le film connut deux suites'' → La sukceso estis de tia amplekso ke la filmo konis du daŭrigojn<br />
<br />
==c'est ... que==<br />
(t2x)<br />
* (fr) ''C'est dans ce contexte qu'il exhorte'' → En tiu kunteksto li admonas<br />
* (fr) ''C'est ici qu'il vient'' → Ĉi tie li venis<br />
<br />
==être ADJ de/à INF==<br />
* (fr) ''C'est rigolo de voir'' → Estas komike vidi<br />
<br />
==une fois (SN) pp==<br />
* (fr) ''Une fois une vitesse suffisante atteinte'' → Atinginte sufiĉan rapidecon<br />
<br />
==questions==<br />
* (fr) ''S'est-il caché en attendant la nuit ou a-t-il volé une nouvelle voiture?'' → Ĉu li sin estas kaŝinta atendante la nokton aŭ ĉu li ŝtelis novan aŭton?<br />
<br />
==en (pron)==<br />
* (fr) ''J'en veux quatre'' → Mi volas kvar<br />
* (fr) ''J'en veux beaucoup'' → Mi volas multajn<br />
* (fr) ''J'en ai voulu quatre'' → Mi volis kvar<br />
* (fr) ''J'en ai voulu beaucoup'' → Mi volis multajn<br />
<br />
=Arno Lagrange=<br />
==Modèle:Infobox Écrivain==<br />
Por vikipediaj ŝablonoj la programo devus traduki la nomon de la ŝablono laŭ la interlingvaj ligiloj kaj ties parametrojn laŭ la taŭgaj ekvivalentoj. Temas pri aparta kazo dependa de la kunteksto. (Mi lasis la parametrojn kun minusklo kaj forstrekis la malĝustan tradukon) <br />
* (fr) ''Modèle:Infobox Écrivain'' → Ŝablono:Informkesto verkisto<br />
* (fr) ''image'' → dosiero / <s>bildo</s><br />
* (fr) ''légende'' → priskribo / <s>legendo</s><br />
* (fr) ''activité'' → profesio / <s>aktiveco</s><br />
* (fr) ''mouvement'' → movado / <s>movo</s><br />
<br />
=== Noto ===<br />
verbo en la franca ĝenerale estas antaŭita de iu subjekto (pronomo aŭ substantiva grupo). Kiam vorto kiu povus esti ĉu verbo ĉu substantivo (date= dato / datumas) estas plej verŝajne ke temas pri substantivo uzata sen artikolo ekzemple en nomo de parametro (Date de naissance → Dato de naskiĝo/ Naskiĝdato) <br />
<br />
<br />
<br />
[[Category:French and Esperanto]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Kiel_aldoni_novan_lingvoparon&diff=30336Kiel aldoni novan lingvoparon2011-12-21T12:25:20Z<p>Objectivesea: Lingvoparo-sekcioj</p>
<hr />
<div>{{TOCD}}<br />
<br />
''Ĉi tiu paĝo estas tradukata. Bonvolu helpu!''<br />
<br />
Tiu dokumento priskribas kiel oni komencas aldoni novan lingvan duon por la maŝintraduka sistemo Apertium komencante je nenio.<br />
<br />
La dokumento ne antaŭsupozas ian scion pri lingvoscienco, aŭ maŝintraduko preter ke vi povas distingi inter nomoj, verboj, prepozicioj k.t.p.<br />
<br />
==Enkonduko==<br />
<br />
Apertium estas, kiel vi eble jam komprenis, maŝintraduka sistemo. Nu, ne tute; ĝi estas maŝintraduka estrado. Ĝi provizas motoron kaj ilaron, kiujn vi povas uzi por krei viajn proprajn maŝintradukajn sistemojn. Vi devas sole provizi la datumojn. La datumoj, je baza nivelo, konsistas el tri vortaroj kaj iuj reguloj (por aliordigi vortojn, kaj aliaj gramatikaj aferoj).<br />
<br />
Por pli detala enkonduko pri kiel ĝi funkcias, vi povas trovi iujn bonegajn artikolojn ĉe la projekta retejo apertium.sourceforge.net.<br />
<br />
==Vi bezonos==<br />
<br />
* [[lttoolbox]] (>= 3.0.0)<br />
* libxml iloj (xmllint etc.)<br />
* apertium (>= 3.0.0)<br />
* teksta redaktilo (aŭ speciala redaktilo por XML-formato, se vi preferus tian)<br />
<br />
Tiu ĉi dokumento ne priskribas kiel oni instalas ĉi tiujn pakaĵojn. Por plia informo pri tio, bonvolu rigardu la dokumentadan parton el la retejo de Apertium.<br />
<br />
==Kio estas lingvoparo?==<br />
La maŝintradukilo Apertium estas speco supraĵa-transformada. Tio simple signifas ke ĝi uzas vortarojn kaj supraĵajn regularojn por la transformado. Supraĵa-transformado diferencas el profunda transformado per tio, ke ĝi ne analizas la plenan sintakson; tipe, la reguloj traktas grupojn de leksikonaj unueroj, anstataŭ trakti analizarbaĵojn. Je baza nivelo, estas tri ĉefaj vortaroj:<br />
# La morfema vortaro por lingvo xx: tiu ĉi enhavas la deklinaciajn regulojn de lingvo xx. En tiu ĉi ekzemplo, ĝi nomiĝas apertium-sh-en.sh.dix <br />
# La morfema vortaro por lingvo yy: tiu ĉi enhavas la deklinaciajn regulojn de lingvo yy. En tiu ĉi ekzemplo, ĝi nomiĝas apertium-sh-en.en.dix <br />
# Dulingva vortaro: ĝi enhavas rilatojn inter la vortoj kaj simboloj trans la du lingvoj. En tiu ĉi ekzemplo, ĝi nomiĝas apertium-sh-en.sh-en.dix <br />
<br />
Por traduki, oni povas uzi ambaŭ lingvojn kiel fonto aŭ celo. Ĉi tioj estas terminoj relativaj.<br />
<br />
Ekzistas ankaŭ du dosieroj por transformado-regulojn. Tiuj ĉi reguloj priskribas la aliordigadon de vortoj en frazoj: ekzemple chat noir -> kato nigra -> nigra kato. Ĝi priskribas ankaŭ la konkordo seksa kaj numera, k.t.p. Oni povas ankaŭ uzi la regulojn por enmeti aŭ forviŝi leksikonerojn, kiel priskribos malfrue. Tiuj ĉi dosieroj estas: <br />
* Transformado-reguloj por lingvo xx al lingvo yy: tiu ĉi dosiero enhavas regulojn por ŝanĝi lingvon xx al lingvon yy. En tiu ĉi ekzemplo, ĝi estas: apertium-sh-en.trules-sh-en.xml <br />
* Transformado-reguloj por lingvo yy al lingvo xx: tiu ĉi dosiero enhavas regulojn por ŝanĝi lingvon yy al lingvon xx. En tiu ĉi ekzemplo, ĝi estos: apertium-sh-en.trules-en-sh.xml <br />
<br />
Multaj lingvoparoj nuntempe disponeblaj havas ankaŭ aliajn dosierojn, sed ni ne priskribas tiujn dosierojn tie ĉi. Oni bezonas sole la menciitajn dosierojn por krei funkciadan sistemon.<br />
<br />
==Lingvoparo==<br />
Oni povus vidi per la dosieronomoj ke tiu ĉi klarigo uzas la ekzemplon pri la traduko el serba-kroata al angla por klarigi kiel oni kreas bazan sistemon. Tiu ĉi paro ne estas ideala, ĉar la Apertium-sistemo funkcias pli bone por pli proksime rilataj lingvoj; por la simplaj ekzemploj en tiu ĉi dokumento, ne gravas. <br />
<br />
==Mallonga noto pri terminoj==<br />
Nun ekzistas numero da terminoj kiuj oni devas kompreni antaŭ kontinui.<br />
<br />
La unua estas '''lemo'''. '''Lemo''' estas la citaĵa formo de vorto. Ĝi estas la vorto sen ia gramatika informaĵo. Ekzemple, la lemo de la vorto ''katojn'' estas ''kat'' sen ''-o''. En Esperanto, substantivoj estas tipe en la singulara, nominativa formo. Por verboj, la lemo estas la infinitivo sen ''-i''; ekzemple la lemo de ''amis'' estas ''am''.<br />
<br />
La dua termino estas ''simbolo''. En la kunteksto de la Apertium sistemo, ''simbolo'' signifas gramatikan etikedon. La vorto ''katoj'' estas plurala substantivo; do ĝi havos substantivan simbolon kaj pluralan simbolon. Je la enigo kaj eligo de Apertium modjuloj, tiuj etikedoj aperas inter angulaj krampoj, kiel sekve:<br />
<br />
* <n>; por substantivo.<br />
* <pl>; por pluralo.<br />
<br />
Aliaj ekzemploj de simboloj estas <sg>; por singularo, <p1> por unua persono, <pri> por indiki estantecon, k.t.p. Simbolo inter krampoj nomiĝas ankaŭ etikedo. Estas notinda ke en multaj da nuntempe disponigitaj lingvaj duoj, la simbolaj difinoj estas akronimoj aŭ mallongigitoj de katalunaj vortoj. Ekzemple, vbhaver devenas el ''vb'' (kiu signifas: verbo) kaj ''haver'' ("havi" en kataluna lingvo). Simboloj estas difinitaj inter <sdef> etikedoj kaj uzitaj inter <nowiki><s></nowiki> etikedoj.<br />
<br />
La tria vorto estas ''paradigmo''. En la kunteksto de la Apertiuma sistemo, paradigmo priskribas la fleksion de vortaroj. En la morfema vortaro, lemo (vidu supre) estas ligita al paradigmo kiu priskribas la fleksion de la lemo; do oni ne devas provizi ĉiujn finaĵojn de la lemo.<br />
<br />
Konsideru la sekvan ekzemplon kiu montras la utilon de tiu ĉi skemo: oni volas konservi la anglajn adjektivojn ''happy'' kaj ''lazy''; anstataŭ konservi du tre similajn vortarojn kiel:<br />
<br />
* happy, happ (y, ier, iest), kaj<br />
* lazy, laz (y, ier, iest),<br />
<br />
oni simple konservas la ŝablonon de unu (ekzemple ''happy'') kaj tiam signifas ke "''lazy'' fleksias kiel ''happy''", kaj "''friendly'' fleksias kiel ''happy''", "''naughty'' fleksias kiel ''happy''" k.t.p. En tiu ĉi ekzemplo, ''happy'' estas la ŝablono aŭ paradigmo kiu priskribas la fleksion de la aliaj vortoj. La preciza priskribo de la paradigma difino estas baldaŭ priskribota. Paradigmoj estas difinitaj inter ''<pardef>''-aj etikedoj, kaj uzita inter ''<par>''-aj etikedoj.<br />
<br />
==Kiel komenci==<br />
<br />
==Unulingvaj vortaroj==<br />
Ni komencu per fari nian unuan fontlingvan vortaron. La vortaro estas XML-dosiero. Ruligu vian tekstilon kaj tajpu la sekvajn:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<dictionary><br />
<br />
</dictionary><br />
</pre><br />
<br />
Nu, la dosiero difinas ke ni volas komenci vortaron. Por fari ĝin utilan, ni devas aldoni iujn enskribigojn. La unua estas la alfabeto: tiu ĉi difinas la signaron kiun oni uzos en la vortaro (serba-kroata tiukaze). Ĝi aperos kiel la sekvajn (enhavita ĉiujn da la signoj en la serba-kroata alfabeto)<br />
<pre><br />
<alphabet>ABCČĆDDžĐEFGHIJKLLjMNNjOPRSŠTUVZŽabcčćddžđefghijklljmnnjoprsštuvzž</alphabet><br />
</pre><br />
<br />
Metu la alfabeto malsupre de la <dictionary> etikedo.<br />
<br />
Sekve, ni devas difini iujn simbolojn. Ni komencu per la simplaj aĵoj: singularaj (sg) kaj pluralaj (pl) substantivoj (n): <br />
<pre><br />
<sdefs><br />
<sdef n="n"/><br />
<sdef n="sg"/><br />
<sdef n="pl"/><br />
</sdefs><br />
</pre><br />
La simbolaj nomoj ne devas esti tiel malgrandaj. Fakte, oni povas plene skribi tiujn ĉi. Tamen, vi ofte tajpos tiujn ĉi, do estas prudenta mallongigi. <br />
<br />
Bedaŭrinde, ĝi ne estas tiel simpla. Substantivoj en serba-kroata fleksias laŭ pli ol sole numero; ili fleksias ankaŭ laŭ genro kaj kazo. Tamen, ni alprenu por tiu ekzemplo ke la substantivo estas vira kaj en la nominativa kazo (kompleta ekzemplo troviĝas je la fino de tiu ĉi dokumento).<br />
<br />
Sekve, ni devas difini parto por la paradigmoj:<br />
<pre><br />
<pardefs><br />
<br />
</pardefs><br />
</pre><br />
kaj vortara parto:<br />
<pre><br />
<section id="main" type="standard"><br />
<br />
</section><br />
</pre><br />
Estas du tipoj de partoj: <br />
#Standarda parto kiu enhavas vortojn, enklitojn (kiuj estas tipoj de postafiksoj), k.t.p., <br />
#Nestata parto kiu tipe enhavas interpunkcio k.t.p. Ni ne havas nestatn parton tie ĉi, kvankam ni vidos ĝin pli poste.<br />
<br />
Do, nia dosiero devus nun aperi iom kiel:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<dictionary><br />
<sdefs><br />
<sdef n="n"/><br />
<sdef n="sg"/><br />
<sdef n="pl"/><br />
</sdefs><br />
<pardefs><br />
<br />
</pardefs><br />
<section id="main" type="standard"><br />
<br />
</section><br />
</dictionary><br />
</pre><br />
Nun ni havas la vortara ostaro kaj ni povas komenci per aldoni substantivon. Ni uzu 'gramofon' (kiuj signifas 'gramofono').<br />
<br />
Pro tio ke ni havas neniun paradigmon, ni devas difini paradigmon. <br />
<br />
Memoru ke ni alprenas ke substantivoj havas virajn genrojn kaj nominativajn kazojn. La singulara formo de la substantivo estas 'gramofon' kaj la plurala estas 'gramofoni'. Do:<br />
<pre><br />
<pardef n="gramofon__n"><br />
<e><br />
<p><br />
<l/><br />
<r><s n="n"/><s n="sg"/></r><br />
</p><br />
</e><br />
<e><br />
<p><br />
<l>i</l><br />
<r><s n="n"/><s n="pl"/></r><br />
</p><br />
</e><br />
</pardef><br />
</pre><br />
Rimarku: la '<l/>' (egalvalora ol <l></l>) signifas ke ne estas pluaj aĵoj kiuj estas aldonota al la radiko de la singulara. <br />
<br />
Tio ĉi eble ŝajnas kiel iom longa maniero por priskribi ĝin, sed estas bonaj rezonoj kaj ĝi iĝas baldaŭ pli naturan. Vi eble scivolas kion signifas la <e>, <p>, <l> kaj <r>. Nu:<br />
* e, signifas enskribigo.<br />
* p, signifas duon ("pair" en angla).<br />
* l, signifas maldekstro ("left" en angla).<br />
* r, signifas dekstro ("right" en angla).<br />
<br />
Kiel dekstro kaj maldekstro? Nu, la morfemaj vortaroj estas kompilota al finiaj ŝtataj maŝinoj. Kompili tiujn ĉi de maldekstro al dekstro kreas analizojn el vortoj, kaj de dekstro al maldekstro kreas vortojn el analizoj. Ekzemple:<br />
<pre><br />
* gramofoni (maldekstro al dekstro) gramofon<n><pl> (analizo)<br />
* gramofon<n><pl> (dekstro al maldekstro) gramofoni (vorto)<br />
</pre><br />
Nun, ni ĵus difinis paradigmon kaj ni devas ligi ĝin al sia lemo, ''gramafon''. Ni metu tiun ĉi en la parto kiun ni difinis.<br />
<br />
Oni devas meti la sekvan enskribon en <section>: <br />
<pre><br />
<e lm="gramofon"><i>gramofon</i><par n="gramofon__n"/></e><br />
</pre><br />
Rapida superrigardo de la mallongigoj:<br />
* lm, estas por lemo.<br />
* i, estas por idento (la maldekstro kaj la dekstro estas la sama).<br />
* par, estas por paradigmo.<br />
<br />
Tiu enskribo donas la limo de la vorto, ''gramofon'', la radiko, ''gramofon'' kaj la paradigmo 'gramofon__n' laŭ kiu la vorto fleksias. La malsameco inter la lemo kaj la radiko estas ke la limo estas la citada formo de la vorto, dum la radiko estas la subĉeno de la lemo al kiu postfiksoj estas aldonota. Tiu iĝos pli klara kiam ni vidigas enskribo kie la du malsamas.<br />
<br />
Ni estas nun preta testi la vortaron. Konservu ĝin kaj reiru al la ŝelo. Ni unue devas kompili ĝin (uzante lt-comp), tiam ni povas testi ĝin (uzante lt-proc).<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh.dix sh-en.automorf.bin<br />
</pre><br />
Devus doni la eligo:<br />
<pre><br />
main@standard 12 12<br />
</pre><br />
Dum ni estas kompilata ĝin maldekstre al dekstre, ni estas kreanta analizilo. Ni faru ankaŭ generilo: <br />
<pre><br />
$ lt-comp rl apertium-sh-en.sh.dix sh-en.autogen.bin<br />
</pre><br />
Je tiu fazo, la ordono devus doni la saman eligon.<br />
<br />
Ni povas nun testi tiujn ĉi. Ruligu lt-proc uzante la analizilo: <br />
<pre><br />
$ lt-proc sh-en.automorf.bin<br />
</pre><br />
Nun, provu ĝin. Tajpu ''gramofoni'' (gramofonoj) kaj vidu la eligon:<br />
<pre><br />
^gramofoni/gramofon<n><pl>$<br />
</pre><br />
Nun, por la angla vortaro, faru la sama aĵoj, sed uzu la angla vorto ''gramophone'' por gramofono, kaj ŝanĝu la pluralan fleksion. Kio fari se vi deziras uzi la pli ĝustan vorton 'record player'? Nu, oni vidos pli poste kiel fari ĝin.<br />
<br />
Vi devus nun havas du dosierojn en la dosierujoj.<br />
* apertium-sh-en.sh.dix kiu enhavas (tre) bazan serba-kroatan morfeman vortaron, kaj <br />
* apertium-sh-en.en.dix kiu enhavas (tre) bazan anglan morfeman vortaron.<br />
<br />
===Dulingva vortaro===<br />
Nu ni havas du morfemajn vortarojn. La sekva farendaĵo estas la dulingva vortaro. Tiu priskribas asociojn inter vortoj. Ĉiuj vortaroj uzas la saman aranĝon (kiu estas precizigita en la DTD, dix.dtd).<br />
<br />
Kreu novan dosieron, apertium-sh-en.sh-en.dix kaj enmetu la bazan ostaron:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<dictionary><br />
<alphabet/><br />
<sdefs><br />
<sdef n="n"/><br />
<sdef n="sg"/><br />
<sdef n="pl"/><br />
</sdefs><br />
<br />
<section id="main" type="standard"><br />
<br />
</section><br />
</dictionary><br />
</pre><br />
Nun ni devas aldoni enskribon por traduki inter du vortoj. Io kiel:<br />
<pre><br />
<e><p><l>gramofon<s n="n"/></l><r>gramophone<s n="n"/></r></p></e><br />
</pre><br />
Pro tio ke ekzistas multaj da tiuj enskriboj, oni skribas ĝin je unu linio por faciligi lego de tiu ĉi dosiero. Denove, uzante la 'l' kaj 'r' ĉu ne? Nu, oni kompilas ĝin maldekstre al dekstre por fari la serba-kroatan → anglan vortaron kaj dekstre al maldekstre por fari la anglan → serba-kroatan vortaron.<br />
<br />
Do, kiam vi faris ĝin, ruligu la sekvitajn ordonojn:<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh.dix sh-en.automorf.bin<br />
$ lt-comp rl apertium-sh-en.en.dix sh-en.autogen.bin<br />
<br />
$ lt-comp lr apertium-sh-en.en.dix en-sh.automorf.bin<br />
$ lt-comp rl apertium-sh-en.sh.dix en-sh.autogen.bin<br />
<br />
$ lt-comp lr apertium-sh-en.sh-en.dix sh-en.autobil.bin<br />
$ lt-comp rl apertium-sh-en.sh-en.dix en-sh.autobil.bin<br />
</pre><br />
Por generi la morfemajn analizilojn (automorf), la morfemajn generilojn (autogen) kaj la vortaj interrilataj tabloj (autobil), la "bil" estas por "bilingual" (t.e. dulingva).<br />
<br />
===Transiraj reguloj===<br />
<br />
Do, nun ni havas du morfemajn vortarojn, kaj dulingvan vortaron. Ni devas sole doni la transirajn regulojn por substantivoj. Transiraj regulaj dosieroj havas siajn proprajn DTD-on (transfer.dtd) kiujn oni povas trovi en la Apertium-a enpako. Se vi bezonas krei regulon, estas ofte bona ideo unue konsideri la regulajn dosierojn de aliaj lingvaj duojn. Oni povas reuzi multe da reguloj inter malsamaj lingvaj duoj. Ekzemple, la regulo malsupre estus utila por iu ajn nul-subjekta lingvo.<br />
<br />
Komencu kiel ĉiuj da la aliaj kun baza ostaro:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<transfer><br />
<br />
</transfer><br />
</pre><br />
At the moment, because we're ignoring case, we just need to make a rule that takes the grammatical symbols input and outputs them again.<br />
<br />
We first need to define categories and attributes. Categories and attributes both allow us to group grammatical symbols. Categories allow us to group symbols for the purposes of matching (for example 'n.*' is all nouns). Attributes allow us to group a set of symbols that can be chosen from. For example ('sg' and 'pl' may be grouped a an attribute 'number').<br />
<br />
Lets add the necessary sections:<br />
<pre><br />
<section-def-cats><br />
<br />
</section-def-cats><br />
<section-def-attrs><br />
<br />
</section-def-attrs><br />
</pre><br />
As we're only inflecting, nouns in singular and plural then we need to add a category for nouns, and with an attribute of number. Something like the following will suffice:<br />
<br />
Into section-def-cats add:<br />
<pre><br />
<def-cat n="nom"><br />
<cat-item tags="n.*"/><br />
</def-cat><br />
</pre><br />
This catches all nouns (lemmas followed by <n> then anything) and refers to them as "nom" (we'll see how thats used later).<br />
<br />
Into the section section-def-attrs, add:<br />
<pre><br />
<def-attr n="nbr"><br />
<attr-item tags="sg"/><br />
<attr-item tags="pl"/><br />
</def-attr><br />
</pre><br />
and then<br />
<pre><br />
<def-attr n="a_nom"><br />
<attr-item tags="n"/><br />
</def-attr><br />
</pre><br />
The first defines the attribute nbr (number), which can be either singular (sg) or plural (pl).<br />
<br />
The second defines the attribute a_nom (attribute noun).<br />
<br />
Next we need to add a section for global variables:<br />
<pre><br />
<section-def-vars><br />
<br />
</section-def-vars><br />
</pre><br />
These variables are used to store or transfer attributes between rules. We need only one for now,<br />
<pre><br />
<def-var n="number"/><br />
</pre><br />
Finally, we need to add a rule, to take in the noun and then output it in the correct form. We'll need a rules section...<br />
<pre><br />
<section-rules><br />
<br />
</section-rules><br />
</pre><br />
Changing the pace from the previous examples, I'll just paste this rule, then go through it, rather than the other way round.<br />
<pre><br />
<rule><br />
<pattern><br />
<pattern-item n="nom"/><br />
</pattern><br />
<action><br />
<out><br />
<lu><br />
<clip pos="1" side="tl" part="lem"/><br />
<clip pos="1" side="tl" part="a_nom"/><br />
<clip pos="1" side="tl" part="nbr"/><br />
</lu><br />
</out><br />
</action><br />
</rule><br />
</pre><br />
<br />
The first tag is obvious, it defines a rule. The second tag, pattern basically says: "apply this rule, if this pattern is found". In this example the pattern consists of a single noun (defined by the category item nom). Note that patterns are matched in a longest-match first. So if you have three rules, the first catches "<prn><vblex><n>", the second catches "<prn><vblex>" and the third catches "<n>", the pattern matched, and rule executed will be the first.<br />
<br />
For each pattern, there is an associated action, which produces an associated output, out. The output, is a lexical unit (lu).<br />
<br />
The clip tag allows a user to select and manipulate attributes and parts of the source language (side="sl"), or target language (side="tl") lexical item.<br />
<br />
Let's compile it and test it. Transfer rules are compiled with:<br />
<pre><br />
$ apertium-preprocess-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin<br />
</pre><br />
Which will generate a trules-sh-en.bin file.<br />
<br />
Now we're ready to test our machine translation system. There is one crucial part missing, the part-of-speech (PoS) tagger, but that will be explained shortly. In the meantime we can test it as is:<br />
<br />
First, lets analyse a word, gramofoni:<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin <br />
^gramofon/gramofon<n><pl>$<br />
</pre><br />
Now, normally here the POS tagger would choose the right version based on the part of speech, but we don't have a POS tagger yet, so we can use this little gawk script (thanks to Sergio) that will just output the first item retrieved.<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
^gramofon<n><pl>$<br />
</pre><br />
Now let's process that with the transfer rule:<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin<br />
</pre><br />
It will output:<br />
<pre><br />
^gramophone<n><pl>$^@<br />
</pre><br />
* 'gramophone' is the target language (side="tl") lemma (lem) at position 1 (pos="1").<br />
* '<n>' is the target language a_nom at position 1.<br />
* '<pl>' is the target language attribute of number (nbr) at position 1.<br />
<br />
Try commenting out one of these clip statements, recompiling and seeing what happens.<br />
<br />
So, now we have the output from the transfer, the only thing that remains is to generate the target-language inflected forms. For this, we use lt-proc, but in generation (-g), not analysis mode.<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
<br />
gramophones\@<br />
</pre><br />
And c'est ca. You now have a machine translation system that translates a Serbo-Croatian noun into an English noun. Obviously this isn't very useful, but we'll get onto the more complex stuff soon. Oh, and don't worry about the '@' symbol, I'll explain that soon too.<br />
<br />
Think of a few other words that inflect the same as gramofon. How about adding those. We don't need to add any paradigms, just the entries in the main section of the monolingual and bilingual dictionaries.<br />
<br />
==Bring on the verbs==<br />
<br />
Ok, so we have a system that translates nouns, but thats pretty useless, we want to translate verbs too, and even whole sentences! How about we start with the verb to see. In Serbo-Croatian this is videti. Serbo-Croatian is a null-subject language, this means that it doesn't typically use personal pronouns before the conjugated form of the verb. English is not. So for example: I see in English would be translated as vidim in Serbo-Croatian.<br />
<br />
* Vidim<br />
* see<p1><sg><br />
* I see<br />
<br />
Note: <p1> denotes first person<br />
<br />
This will be important when we come to write the transfer rule for verbs. Other examples of null-subject languages include: Spanish, Romanian and Polish. The also has the effect that while we only need to add the verb in the Serbo-Croatian morphological dictionary, we need to add both the verb, and the personal pronouns in the English morpohlogical dictionary. We'll go through both of these.<br />
<br />
The other forms of the verb videti are: vidiš, vidi, vidimo, vidite, and vide; which correspond to: you see (singular), he sees, we see, you see (plural), and they see.<br />
<br />
There are two forms of you see, one is plural and formal singular (vidite) and the other is singular and informal (vidiš).<br />
<br />
We're going to try and translate the sentence: "Vidim gramofoni" into "I see gramophones". In the interests of space, we'll just add enough information to do the translation and will leave filling out the paradigms (adding the other conjugations of the verb) as an exercise to the reader.<br />
<br />
The astute reader will have realised by this point that we can't just translate vidim gramofoni because it is not a grammatically correct sentence in Serbo-Croatian. The correct sentence would be vidim gramofone, as the noun takes the accusative case. We'll have to add that form too, no need to add the case information for now though, we just add it as another option for plural. So, just copy the 'e' block for 'i' and change the 'i' to 'e' there.<br />
<br />
First thing we need to do is add some more symbols. We need to first add a symbol for 'verb', which we'll call "vblex" (this means lexical verb, as opposed to modal verbs and other types). Verbs have 'person', and 'tense' along with number, so lets add a couple of those aswell. We need to translate "I see", so for person we should add "p1", or 'first person', and for tense "pri", or 'present indicative'.<br />
<pre><br />
<sdef n="vblex"/><br />
<sdef n="p1"/><br />
<sdef n="pri"/><br />
</pre><br />
After we've done this, the same with the nouns, we add a paradigm for the verb conjugation. The first line will be:<br />
<pre><br />
<pardef n="vid/eti__vblex"><br />
</pre><br />
The '/' is used to demarcate where the stems (the parts between the <l> </l> tags) are added to.<br />
<br />
Then the inflection for first person singular:<br />
<pre><br />
<e><br />
<p><br />
<l>im</l><br />
<r>eti<s n="vblex"/><s n="pri"/><s n="p1"/><s n="sg"/></r><br />
</p><br />
</e><br />
</pre><br />
The 'im' denotes the ending (as in 'vidim'), it is necessary to add 'eti' to the <r> section, as this will be chopped off by the definition. The rest is fairly straightforward, 'vblex' is lexical verb, 'pri' is present indicative tense, 'p1' is first person and 'sg' is singular. We can also add the plural which will be the same, except 'imo' instead of 'im' and 'pl' instead of 'sg'.<br />
<br />
After this we need to add a lemma, paradigm mapping to the main section:<br />
<pre><br />
<e lm="videti"><i>vid</i><par n="vid/eti__vblex"/></e><br />
</pre><br />
Note: the content of <i> </i> is the root, not the lemma.<br />
<br />
Thats the work on the Serbo-Croatian dictionary done for now. Lets compile it then test it.<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh.dix sh-en.automorf.bin<br />
main@standard 23 25<br />
$ echo "vidim" | lt-proc sh-en.automorf.bin<br />
^vidim/videti<vblex><pri><p1><sg>$<br />
$ echo "vidimo" | lt-proc sh-en.automorf.bin<br />
^vidimo/videti<vblex><pri><p1><pl>$<br />
</pre><br />
Ok, so now we do the same for the English dictionary (remember to add the same symbol definitions here as you added to the Serbo-Croatian one).<br />
<br />
The paradigm is:<br />
<pre><br />
<pardef n="s/ee__vblex"><br />
</pre><br />
because the past tense is 'saw'. Now, we can do one of two things, we can add both first and second person, but they are the same form. In fact, all forms (except third person singular) of the verb 'to see' are 'see'. So instead we make one entry for 'see' and give it only the 'pri' symbol.<br />
<pre><br />
<e><br />
<p><br />
<l>ee</l><br />
<r>ee<s n="vblex"/><s n="pri"/></r><br />
</p><br />
</e><br />
</pre><br />
and as always, an entry in the main section:<br />
<pre><br />
<e lm="see"><i>s</i><par n="s/ee__vblex"/></e><br />
</pre><br />
Then lets save, recompile and test:<br />
</pre><br />
$ lt-comp lr apertium-sh-en.en.dix en-sh.automorf.bin<br />
main@standard 18 19<br />
<br />
$ echo "see" | lt-proc en-sh.automorf.bin<br />
^see/see<vblex><pri>$<br />
</pre><br />
Now for the obligatory entry in the bilingual dictionary:<br />
<pre><br />
<e><p><l>videti<s n="vblex"/></l><r>see<s n="vblex"/></r></p></e><br />
</pre><br />
(again, don't forget to add the sdefs from earlier)<br />
<br />
And recompile:<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh-en.dix sh-en.autobil.bin<br />
main@standard 18 18<br />
$ lt-comp rl apertium-sh-en.sh-en.dix en-sh.autobil.bin<br />
main@standard 18 18<br />
</pre><br />
Now to test:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin <br />
<br />
^see<vblex><pri><p1><sg>$^@<br />
</pre><br />
We get the analysis passed through correctly, but when we try and generate a surface form from this, we get a '#', like below:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
#see\@<br />
</pre><br />
This '#' means that the generator cannot generate the correct lexical form because it does not contain it. Why is this?<br />
<br />
Basically the analyses don't match, the 'see' in the dictionary is see<vblex><pri>, but the see delivered by the transfer is see<vblex><pri><p1><sg>. The Serbo-Croatian side has more information than the English side requires. You can test this by adding the missing symbols to the English dictionary, and then recompiling, and testing again.<br />
<br />
However, a more paradigmatic way of taking care of this is by writing a rule. So, we open up the rules file (apertium-sh-en.trules-sh-en.xml in case you forgot).<br />
<br />
We need to add a new category for 'verb'.<br />
<pre><br />
<def-cat n="vrb"><br />
<cat-item tags="vblex.*"/><br />
</def-cat><br />
</pre><br />
We also need to add attributes for tense and for person. We'll make it really simple for now, you can add p2 and p3, but I won't in order to save space.<br />
<pre><br />
<def-attr n="temps"><br />
<attr-item tags="pri"/><br />
</def-attr><br />
<br />
<def-attr n="pers"><br />
<attr-item tags="p1"/><br />
</def-attr><br />
</pre><br />
We should also add an attribute for verbs.<br />
<pre><br />
<def-attr n="a_verb"><br />
<attr-item tags="vblex"/><br />
</def-attr><br />
</pre><br />
Now onto the rule:<br />
<pre><br />
<rule><br />
<pattern><br />
<pattern-item n="vrb"/><br />
</pattern><br />
<action><br />
<out><br />
<lu><br />
<clip pos="1" side="tl" part="lem"/><br />
<clip pos="1" side="tl" part="a_verb"/><br />
<clip pos="1" side="tl" part="temps"/><br />
</lu><br />
</out><br />
</action><br />
</rule><br />
</pre><br />
Remember when you tried commenting out the 'clip' tags in the previous rule example and they disappeared from the transfer, well, thats pretty much what we're doing here. We take in a verb with a full analysis, but only output a partial analysis (lemma + verb tag + tense tag).<br />
<br />
So now, if we recompile that, we get:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin<br />
^see<vblex><pri>$^@<br />
</pre><br />
and:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
see\@<br />
</pre><br />
Try it with 'vidimo' (we see) to see if you get the correct output.<br />
<br />
Now try it with "vidim gramofone":<br />
<pre><br />
$ echo "vidim gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
see gramophones\@<br />
</pre><br />
<br />
==But what about personal pronouns?==<br />
<br />
Well, thats great, but we're still missing the personal pronoun that is necessary in English. In order to add it in, we first need to edit the English morphological dictionary.<br />
<br />
As before, the first thing to do is add the necessary symbols:<br />
<pre><br />
<sdef n="prn"/><br />
<sdef n="subj"/><br />
</pre><br />
Of the two symbols, prn is pronoun, and subj is subject (as in the subject of a sentence).<br />
<br />
Because there is no root, or 'lemma' for personal subject pronouns, we just add the pardef as follows:<br />
<pre><br />
<pardef n="prsubj__prn"><br />
<e><br />
<p><br />
<l>I</l><br />
<r>prpers<s n="prn"/><s n="subj"/><s n="p1"/><s n="sg"/></r><br />
</p><br />
</e><br />
</pardef><br />
</pre><br />
With 'prsubj' being 'personal subject'. The rest of them (You, We etc.) are left as an exercise to the reader.<br />
<br />
We can add an entry to the main section as follows:<br />
<pre><br />
<e lm="personal subject pronouns"><i/><par n="prsubj__prn"/></e><br />
</pre><br />
So, save, recompile and test, and we should get something like:<br />
<pre><br />
$ echo "I" | lt-proc en-sh.automorf.bin<br />
^I/PRPERS<prn><subj><p1><sg>$<br />
</pre><br />
<br />
(Note: its in capitals because 'I' is in capitals).<br />
<br />
Now we need to amend the 'verb' rule to output the subject personal pronoun along with the correct verb form.<br />
<br />
First, add a category (this must be getting pretty pedestrian by now):<br />
<pre><br />
<def-cat n="prpers"><br />
<cat-item lemma="prpers" tags="prn.*"/><br />
</def-cat><br />
</pre><br />
Now add the types of pronoun as attributes, we might as well add the 'obj' type as we're at it, although we won't need to use it for now:<br />
<pre><br />
<def-attr n="tipus_prn"><br />
<attr-item tags="prn.subj"/><br />
<attr-item tags="prn.obj"/><br />
</def-attr><br />
</pre><br />
And now to input the rule:<br />
<pre><br />
<rule><br />
<pattern><br />
<pattern-item n="vrb"/><br />
</pattern><br />
<action><br />
<out><br />
<lu><br />
<lit v="prpers"/><br />
<lit-tag v="prn"/><br />
<lit-tag v="subj"/><br />
<clip pos="1" side="tl" part="pers"/><br />
<clip pos="1" side="tl" part="nbr"/><br />
</lu><br />
<b/><br />
<lu><br />
<clip pos="1" side="tl" part="lem"/><br />
<clip pos="1" side="tl" part="a_verb"/><br />
<clip pos="1" side="tl" part="temps"/><br />
</lu><br />
</out><br />
</action><br />
</rule><br />
</pre><br />
This is pretty much the same rule as before, only we made a couple of small changes.<br />
<br />
We needed to output:<br />
<pre><br />
^prpers<prn><subj><p1><sg>$ ^see<vblex><pri>$<br />
</pre><br />
so that the generator could choose the right pronoun and the right form of the verb.<br />
<br />
So, a quick rundown:<br />
<br />
* <lit>, prints a literal string, in this case "prpers"<br />
* <lit-tag>, prints a literal tag, because we can't get the tags from the verb, we add these ourself, "prn" for pronoun, and "subj" for subject.<br />
* <b/>, prints a blank, a space.<br />
<br />
Note that we retrieved the information for number and tense directly from the verb.<br />
<br />
So, now if we recompile and test that again:<br />
<pre><br />
$ echo "vidim gramofone" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
I see gramophones<br />
</pre><br />
Which, while it isn't exactly prize-winning prose (much like this HOWTO), is a fairly accurate translation.<br />
<br />
==Do rakontu min pri la gramofono (Multvortoj)==<br />
Dum 'gramopone' estas angla vorto, ĝi ne estas la pli bona traduko. Oni uzas 'gramophone' tipe por la tre malnova speco kun la nadlo anstataŭ la grifelo kaj nenio plilaŭtigo. Pli bona traduko estus 'record player'. Kvankam oni havas pli ol unu vorto, ni povas trakti ĝin kvazaŭ estas unu vorto per uzi multvortajn (multipalabra) konstruojn.<br />
<br />
Ni ne devas tuŝi la serba-kroatan vortaron - sole la anglan kaj dulingvan. Do malfermu ilin.<br />
<br />
La pluralo de 'record player' estas 'record players', do ĝi uzas la saman paradigmon kiel 'gramophone' (gramophone__n); ni aldonu sole 's'. Nu, ni sole devu aldoni novan eron al la ĉefa sekcio.<br />
<pre><br />
<e lm="record player"><i>record<b/>player</i><par n="gramophone__n"/></e><br />
</pre><br />
La sola aĵo kiu malsamas, estas la uzo de la <b/> etikedo, kvankam ĝi ne estas tute novo, pro tio ke ni vidis ĝin je la regla dosiero.<br />
<br />
Nu, rekompilu kaj testu laŭ la ortodoksa maniero:<br />
<pre><br />
$ echo "vidim gramofone" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.trules-sh-en.xml trules-sh-en.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
I see record players<br />
</pre><br />
Bonege. Granda bono de la uzo de multvortoj estas ke oni povas traduki idiomajn esprimojn laŭvorte. Ekzemple, la angla frazo "at the moment" tradukus al serba-kroata kiel "trenutno"('trenutak' = ''moment'' kaj 'trenutno' estas adverba formo) &mdash; ne estus ebla traduki ĝin vorte-per-vorte al serba-kroata.<br />
<br />
==Dealing with minor variation==<br />
<br />
Serbo-Croatian typically has a few ways of writing each word because of dialectal variation. It has a cool phonetic writing system so you write how you speak. For example, people speaking in Ijekavian would say "rječnik", while someone speaking Ekavian would say "rečnik", which reflects the differences in pronunciation of the proto-Slavic vowel ''yat''.<br />
<br />
===Analysis===<br />
<br />
There should be a fairly easy way of dealing with this, and there is, using paradigms again. Paradigms aren't only used for adding grammatical symbols, but they can also be used to replace any character/symbol with another. For example, here is a paradigm for accepting both "e" and "je" in the analysis. The paradigm should, as with the others go into the monolingual dictionary for Serbo-Croatian.<br />
<br />
<pre><br />
<pardef n="e_je__yat"><br />
<e><br />
<p><br />
<l>e</l><br />
<r>e</r><br />
</p><br />
</e><br />
<e><br />
<p><br />
<l>je</l><br />
<r>e</r><br />
</p><br />
</e><br />
</pardef><br />
</pre><br />
<br />
Then in the "main section":<br />
<br />
<pre><br />
<e lm="rečnik"><i>r</i><par n="e_je__yat"/><i>čni</i><par n="rečni/k__n"/></e><br />
</pre><br />
<br />
This only allows us to analyse both forms however... more work is necessary if we want to generate both forms.<br />
<br />
===Generation===<br />
<br />
==See also==<br />
<br />
*[[Building dictionaries]]<br />
<br />
[[Category:Documentation]]<br />
[[Category:HOWTO]]<br />
[[Category:Esperanto]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Lttoolbox&diff=30296Lttoolbox2011-12-20T22:53:42Z<p>Objectivesea: Several stylistic improvements to English text</p>
<hr />
<div>{{TOCD}}<br />
'''lttoolbox''' is a toolbox for lexical processing, [[morphological analysis]] and generation of words. ''Analysis'' is the process of splitting a word (e.g. cats) into its lemma 'cat' and the grammatical information <code><n><pl></code>. ''Generation'' is the opposite process.<br />
<br />
The package is split into three programs, <code>lt-comp</code>, the compiler, <code>lt-proc</code>, the processor, and <code>lt-expand</code>, which generates all possible mappings between [[surface form]]s and [[lexical form]]s in the dictionary.<br />
<br />
==Creation==<br />
{{main|Monodix basics}}<br />
<br />
Morphological analyser specification files, or morphological dictionaries may be found in all of our [[language pair]] packages, from the [[incubator]], or you may elect to create your own (more instructions at the page ''[[Monodix basics]]''). You can also check out our [[list of dictionaries]], which has statistics on names, locations and number of entries of each of the dictionaries.<br />
<br />
==Usage==<br />
<br />
===Compilation===<br />
{{see-also|Compiling dictionaries}}<br />
Compilation into the binary format is achieved by means of the <code>lt-comp</code> program. You can compile a given <code>.dix</code> from left to right (<code>LR</code>), or from right to left (<code>RL</code>). Compiling <code>LR</code> usually creates an ''analyser'', compiling <code>RL</code> usually creates a ''generator''.<ref>In all current linguistic packages, the left-to-right direction of compilation is ''analysis'', whereas the right-to-left direction is ''generation''. This is not, however, a software restriction.</ref><br />
<br />
;Example<br />
<br />
Compile the <code>apertium-es-ca.ca.dix</code> dictionary in a left-to-right manner into the binary <code>ca.bin</code>.<br />
<br />
<pre><br />
$ lt-comp lr apertium-es-ca.ca.dix ca.bin<br />
</pre><br />
<br />
===Processing===<br />
<br />
There are two main modes of use for the processor (<code>lt-proc</code>), analysis (which is the default mode) and generation. Analysis converts surface forms into the set of possible lexical forms, while generation converts a lexical form into the corresponding surface form.<br />
<br />
====Analysis====<br />
<br />
After compiling the <code>apertium-es-ca.ca.dix</code> file left-to-right into <code>ca.morf.bin</code>, we can analyse Catalan:<br />
<br />
;Example<br />
<br />
<pre><br />
$ echo "prova" | lt-proc ca.morf.bin<br />
<br />
^prova/prova<n><f><sg>/provar<vblex><pri><p3><sg>/provar<vblex><imp><p2><sg>$<br />
</pre><br />
<br />
====Generation====<br />
<br />
And compiling it right-to-left, we can generate:<br />
<br />
;Example<br />
<br />
<pre><br />
$ echo "^prova<n><f><pl>$" | lt-proc -g ca.gen.bin<br />
<br />
proves<br />
</pre><br />
<br />
===Expansion===<br />
<br />
Sometimes you want to be able to see the complete output of the dictionary &mdash; i.e., all of the mappings between lexical and surface forms. For this you can use the <code>lt-expand</code> tool. This output is often useful in finding bugs in the assignment of paradigms, etc.<br />
<br />
;Example<br />
<br />
Here are the first ten lines that are produced as output from the command to expand the Catalan dictionary in the <code>apertium-es-ca</code> pair. (At last count, the total length of the output was over 2.3 million lines.)<br />
<br />
<pre><br />
$ lt-expand apertium-es-ca.ca.dix <br />
<br />
abdominals:abdominal<adj><mf><pl><br />
abdominal:abdominal<adj><mf><sg><br />
absents:absent<adj><mf><pl><br />
absent:absent<adj><mf><sg><br />
absolutes:absolut<adj><f><pl><br />
absoluta:absolut<adj><f><sg><br />
absoluts:absolut<adj><m><pl><br />
absolut:absolut<adj><m><sg><br />
abstractes:abstracte<adj><mf><pl><br />
abstracta:abstracte<adj><f><sg><br />
</pre><br />
<br />
;Note<br />
<br />
You cannot run lt-expand directly on a <code>.dix.xml</code> file. The <code>.dix</code> files in (for example) the <code>apertium-en-af</code> pair have their symbols in a separate file. You need to first run <code>xmllint</code>:<br />
<br />
<pre><br />
$ xmllint --xinclude apertium-en-af.af.dix.xml > apertium-en-af.af.dix<br />
</pre><br />
<br />
Then run <code>lt-expand</code> on the <code>apertium-en-af.af.dix</code> file.<br />
<br />
==Troubleshooting==<br />
<br />
;Empty left side<br />
<br />
If you get a message like:<br />
<br />
<pre><br />
Error: Invalid dictionary (hint: the left side of an entry is empty)<br />
</pre><br />
<br />
Try searching for empty left sides in your dictionary by using <code>lt-expand</code> and <code>grep</code>. For example, in the Icelandic dictionary,<br />
<br />
<pre><br />
$ lt-expand apertium-fo-is.is.dix | grep ^:<br />
:kunna<vblex><imp><p2><sg><br />
:kunna<vblex><imp><p1><pl><br />
:kunna<vblex><imp><p2><pl><br />
</pre><br />
<br />
The empty left side will look something like:<br />
<br />
<pre><br />
<e><br />
<p><br />
<l></l><br />
<r>kunna<s n="vblex"/><s n="imp"/><s n="p2"/><s n="pl"/></r><br />
</p><br />
</e><br />
</pre><br />
<br />
It is not possible to have an empty left side in a paradigm if you have no invariant (<code>&lt;i&gt;</code>) section in the main section entry, e.g.<br />
<br />
<pre><br />
<e lm="kunna"><i></i><par n="/kunna__vblex"/></e><br />
</pre><br />
<br />
This means you should look for the "kunna" verb; where the left side is empty, you should either put something there or add something to the invariant section.<br />
<br />
==Speed==<br />
<br />
<pre><br />
$ yes word | head -10000000 > /tmp/foo<br />
<br />
$ head /tmp/foo<br />
word<br />
word<br />
word<br />
...<br />
<br />
$ wc -l /tmp/foo<br />
1000000 /tmp/foo<br />
<br />
$ time cat /tmp/foo | lt-proc en-ca.automorf.bin >/dev/null<br />
<br />
real 0m17.606s<br />
user 0m17.281s<br />
sys 0m0.036s<br />
<br />
58,823 words / second<br />
</pre><br />
<br />
==Using as a library==<br />
See [[Lttoolbox API]] for how to analyse and generate words with lttoolbox from C++ or Python.<br />
<br />
==Wishlist==<br />
<br />
* Being able to have multichar symbols/tags without '<' and '>'<br />
<br />
==See also==<br />
<br />
* [[Monodix basics]]<br />
* [[Using an lttoolbox dictionary]]<br />
* [[lttoolbox and lexc]]<br />
* [[Lttoolbox-java]]<br />
* [[Basic lttoolbox example]]<br />
<br />
==Notes==<br />
<references/><br />
<br />
<br />
[[Category:Lttoolbox|*]]<br />
[[Category:Morphological analysers]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=Apertium_New_Language_Pair_HOWTO&diff=30287Apertium New Language Pair HOWTO2011-12-20T22:17:27Z<p>Objectivesea: /* Introduction */ deleted extra period</p>
<hr />
<div>{{TOCD}}<br />
Apertium New Language Pair HOWTO<br />
<br />
This HOWTO document will describe how to start a new language pair for the Apertium machine translation system from scratch.<br />
<br />
It does not assume any knowledge of linguistics, or machine translation above the level of being able to distinguish nouns from verbs (and prepositions etc.)<br />
<br />
==Introduction==<br />
<br />
Apertium is, as you've probably realised by now, a machine translation system. Well, not quite, it's a machine translation platform. It provides an engine and toolbox that allow you to build your own machine translation systems. The only thing you need to do is write the data. The data consists, on a basic level, of three dictionaries and a few rules (to deal with word re-ordering and other grammatical stuff).<br />
<br />
For a more detailed introduction into how it all works, there are some excellent papers on the [[Publications]] page.<br />
<br />
==You will need==<br />
<br />
* [[lttoolbox]] (>= 3.0.0)<br />
* libxml utils (xmllint etc.)<br />
* apertium (>= 3.0.0)<br />
* a text editor (or a specialised XML editor if you prefer)<br />
<br />
This document will not describe how to install these packages, for more information please see the documentation section of the Apertium website.<br />
<br />
==What does a language pair consist of?==<br />
<br />
Apertium is a shallow-transfer type machine translation system. Thus, it basically works on dictionaries and shallow transfer rules. In operation, shallow-transfer is distinguished from deep-transfer in that it doesn't do full syntactic parsing, the rules are typically operations on groups of lexical units, rather than operations on parse trees. At a basic level, there are three main dictionaries:<br />
# The morphological dictionary for language xx: this contains the rules of how words in language xx are inflected. In our example this will be called: <code>apertium-sh-en.sh.dix</code><br />
# The morphological dictionary for language yy: this contains the rules of how words in language yy are inflected. In our example this will be called: <code>apertium-sh-en.en.dix</code><br />
# Bilingual dictionary: contains correspondences between words and symbols in the two languages. In our example this will be called: <code>apertium-sh-en.sh-en.dix</code><br />
<br />
In a translation pair, both languages can be either source or target for translation, these are relative terms.<br />
<br />
There are also two files for transfer rules. These are the rules that govern how words are re-ordered in sentences, e.g. chat noir -> cat black -> black cat. It also governs agreement of gender, number etc. The rules can also be used to insert or delete lexical items, as will be described later. These files are:<br />
<br />
* language xx to language yy transfer rules: this file contains rules for how language xx will be changed into language yy. In our example this will be: <code>apertium-sh-en.sh-en.t1x</code><br />
* language yy to xx language transfer rules: this file contains rules for how language yy will be changed into language xx. In our example this will be: <code>apertium-sh-en.en-sh.t1x</code><br />
<br />
Many of the language pairs currently available have other files, but we won't cover them here. These files are the only ones required to generate a functional system.<br />
<br />
==Language pair==<br />
<br />
As you may have been alluded to by the file names, this HOWTO will use the example of translating Serbo-Croatian to English to explain how to create a basic system. This is not an ideal pair, since the system works better for more closely related languages. This shouldn't present a problem for the simple examples given here.<br />
<br />
==A brief note on terms==<br />
<br />
There are number of terms that will need to be understood before we continue.<br />
<br />
The first is ''lemma''. A lemma is the citation form of a word. It is the word stripped of any grammatical information. For example, the lemma of the word cats is ''cat''. In English nouns this will typically be the singular form of the word in question. For verbs, the lemma is the infinitive stripped of to, e.g. the lemma of ''was'' would be ''be''.<br />
<br />
The second is ''symbol''. In the context of the Apertium system, symbol refers to a grammatical label. The word cats is a plural noun, therefore it will have the noun symbol and the plural symbol. In the input and output of Apertium modules these are typically given between angle brackets, as follows:<br />
<br />
* <code><n></code>; for noun.<br />
* <code><pl></code>; for plural.<br />
<br />
Other examples of symbols are <sg>; singular, <p1> first person, <pri> present indicative, etc. When written in angle brackets, the symbols may also be referred to as tags. It is worth noting that in many of the currently available language pairs the symbol definitions are acronyms or contractions of words in Catalan. For example, vbhaver — from vb (verb) and haver ("to have" in Catalan). Symbols are defined in <sdef> tags and used in <nowiki><s></nowiki> tags.<br />
<br />
The third word is ''paradigm''. In the context of the Apertium system, paradigm refers to an example of how a particular group of words inflect. In the morphological dictionary, lemmas (see above) are linked to paradigms that allow us to describe how a given lemma inflects without having to write out all of the endings.<br />
<br />
An example of the utility of this is, if we wanted to store the two adjectives ''happy'' and ''lazy'', instead of storing two lots of the same thing:<br />
<br />
* happy, happ (y, ier, iest)<br />
* lazy, laz (y, ier, iest)<br />
<br />
We can simply store one, and then say "lazy, inflects like happy", or indeed "shy inflects like happy", "naughty inflects like happy", "friendly inflects like happy", etc. In this example, happy would be the paradigm, the model for how the others inflect. The precise description of how this is defined will be explained shortly. Paradigms are defined in <pardef> tags, and used in <par> tags.<br />
<br />
==Getting started==<br />
<!-- Ur yezh indezeuropek eo ar brezhoneg --><br />
<br />
===Monolingual dictionaries===<br />
{{see-also|List of dictionaries|Incubator}}<br />
Let's start by making our first source language dictionary. The dictionary is an XML file. Fire up your text editor and type the following:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<dictionary><br />
<br />
</dictionary><br />
</pre><br />
So, the file so far defines that we want to start a dictionary. In order for it to be useful, we need to add some more entries, the first is an alphabet. This defines the set of letters that may be used in the dictionary, for Serbo-Croatian. It will look something like the following, containing all the letters of the Serbo-Croatian alphabet:<br />
<pre><br />
<alphabet>ABCČĆDDžĐEFGHIJKLLjMNNjOPRSŠTUVZŽabcčćddžđefghijklljmnnjoprsštuvzž</alphabet><br />
</pre><br />
<br />
Place the alphabet below the <dictionary> tag.<br />
<br />
Next we need to define some symbols. Let's start off with the simple stuff, noun (n) in singular (sg) and plural (pl).<br />
<pre><br />
<sdefs><br />
<sdef n="n"/><br />
<sdef n="sg"/><br />
<sdef n="pl"/><br />
</sdefs><br />
</pre><br />
The symbol names do not have to be so small, in fact, they could just be written out in full, but as you'll be typing them a lot, it makes sense to abbreviate.<br />
<br />
Unfortunately, it isn't quite so simple. Nouns in Serbo-Croatian inflect for more than just number, they are also inflected for case, and have a gender. However, we'll assume for the purposes of this example that the noun is masculine and in the nominative case (a full example may be found at the end of this document).<br />
<br />
The next thing is to define a section for the paradigms,<br />
<pre><br />
<pardefs><br />
<br />
</pardefs><br />
</pre><br />
and a dictionary section:<br />
<pre><br />
<section id="main" type="standard"><br />
<br />
</section><br />
</pre><br />
There are two types of sections, the first is a standard section, that contains words, enclitics, etc. The second type is an [[inconditional section]] which typically contains punctuation, and so forth. We don't have an inconditional section here.<br />
<br />
So, our file should now look something like:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<dictionary><br />
<sdefs><br />
<sdef n="n"/><br />
<sdef n="sg"/><br />
<sdef n="pl"/><br />
</sdefs><br />
<pardefs><br />
<br />
</pardefs><br />
<section id="main" type="standard"><br />
<br />
</section><br />
</dictionary><br />
</pre><br />
Now we've got the skeleton in place, we can start by adding a noun. The noun in question will be 'gramofon' (which means 'gramophone' or 'record player').<br />
<br />
The first thing we need to do, as we have no prior paradigms, is to define a paradigm.<br />
<br />
Remember, we're assuming masculine gender and nominative case. The singular form of the noun is 'gramofon', and the plural is 'gramofoni'. So:<br />
<pre><br />
<pardef n="gramofon__n"><br />
<e><p><l/><r><s n="n"/><s n="sg"/></r></p></e><br />
<e><p><l>i</l><r><s n="n"/><s n="pl"/></r></p></e><br />
</pardef><br />
</pre><br />
Note: the '<l/>' (equivalent to <l></l>) denotes that there is no extra material to be added to the stem for the singular.<br />
<br />
This may seem like a rather verbose way of describing it, but there are reasons for this and it quickly becomes second nature. You're probably wondering what the <e>, <p>, <l> and <r> stand for. Well,<br />
<br />
* e, is for entry.<br />
* p, is for pair.<br />
* l, is for left.<br />
* r, is for right.<br />
<br />
Why left and right? Well, the morphological dictionaries will later be compiled into finite state machines. Compiling them left to right produces analyses from words, and from right to left produces words from analyses. For example:<br />
<pre><br />
* gramofoni (left to right) gramofon<n><pl> (analysis)<br />
* gramofon<n><pl> (right to left) gramofoni (generation)<br />
</pre><br />
Now we've defined a paradigm, we need to link it to its lemma, gramofon. We put this in the section that we've defined.<br />
<br />
The entry to put in the <section> will look like:<br />
<pre><br />
<e lm="gramofon"><i>gramofon</i><par n="gramofon__n"/></e><br />
</pre><br />
A quick run down on the abbreviations:<br />
<br />
* lm, is for lemma.<br />
* i, is for identity (the left and the right are the same).<br />
* par, is for paradigm.<br />
<br />
This entry states the lemma of the word, gramofon, the root, gramofon and the paradigm with which it inflects gramofon__n. The difference between the lemma and the root is that the lemma is the citation form of the word, while the root is the substring of the lemma to which suffixes are added. This will become clearer later when we show an entry where the two are different.<br />
<br />
We're now ready to test the dictionary. Save it, and then return to the shell. We first need to compile it (with lt-comp), then we can test it (with lt-proc). For those who are new to cygwin just take note that you need to save the dictionary file inside the home folder (for example C:\Apertium\home\Username\filename_of_dictionary). Otherwise you will not be able to compile.<br />
<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh.dix sh-en.automorf.bin<br />
</pre><br />
Should produce the output:<br />
<pre><br />
main@standard 12 12<br />
</pre><br />
As we are compiling it left to right, we're producing an analyser. Lets make a generator too.<br />
<pre><br />
$ lt-comp rl apertium-sh-en.sh.dix sh-en.autogen.bin<br />
</pre><br />
At this stage, the command should produce the same output.<br />
<br />
We can now test these. Run lt-proc on the analyser.<br />
<pre><br />
$ lt-proc sh-en.automorf.bin<br />
</pre><br />
Now try it out, type in gramofoni (gramophones), and see the output:<br />
<pre><br />
^gramofoni/gramofon<n><pl>$<br />
</pre><br />
Now, for the English dictionary, do the same thing, but substitute the English word gramophone for gramofon, and change the plural inflection. What if you want to use the more correct word 'record player'? Well, we'll explain how to do that later.<br />
<br />
You should now have two files in the directory:<br />
<br />
* apertium-sh-en.sh.dix which contains a (very) basic Serbo-Croatian morphological dictionary, and<br />
* apertium-sh-en.en.dix which contains a (very) basic English morphological dictionary.<br />
<br />
===Bilingual dictionary===<br />
<br />
So we now have two morphological dictionaries, next thing to make is the bilingual dictionary. This describes mappings between words. All dictionaries use the same format (which is specified in the DTD, dix.dtd).<br />
<br />
Create a new file, apertium-sh-en.sh-en.dix and add the basic skeleton:<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<dictionary><br />
<alphabet/><br />
<sdefs><br />
<sdef n="n"/><br />
<sdef n="sg"/><br />
<sdef n="pl"/><br />
</sdefs><br />
<br />
<section id="main" type="standard"><br />
<br />
</section><br />
</dictionary><br />
</pre><br />
Now we need to add an entry to translate between the two words. Something like:<br />
<pre><br />
<e><p><l>gramofon<s n="n"/></l><r>gramophone<s n="n"/></r></p></e><br />
</pre><br />
Because there are a lot of these entries, they're typically written on one line to facilitate easier reading of the file. Again with the 'l' and 'r' right? Well, we compile it left to right to produce the Serbo-Croatian → English dictionary, and right to left to produce the English → Serbo-Croatian dictionary.<br />
<br />
So, once this is done, run the following commands:<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh.dix sh-en.automorf.bin<br />
$ lt-comp rl apertium-sh-en.en.dix sh-en.autogen.bin<br />
<br />
$ lt-comp lr apertium-sh-en.en.dix en-sh.automorf.bin<br />
$ lt-comp rl apertium-sh-en.sh.dix en-sh.autogen.bin<br />
<br />
$ lt-comp lr apertium-sh-en.sh-en.dix sh-en.autobil.bin<br />
$ lt-comp rl apertium-sh-en.sh-en.dix en-sh.autobil.bin<br />
</pre><br />
To generate the morphological analysers (automorf), the morphological generators (autogen) and the word lookups (autobil), the bil is for "bilingual".<br />
<br />
===Transfer rules===<br />
<br />
So, now we have two morphological dictionaries, and a bilingual dictionary. All that we need now is a transfer rule for nouns. Transfer rule files have their own DTD (transfer.dtd) which can be found in the Apertium package. If you need to implement a rule it is often a good idea to look in the rule files of other language pairs first. Many rules can be recycled/reused between languages. For example the one described below would be useful for any null-subject language.<br />
<br />
Start out like all the others with a basic skeleton ( apertium-sh-en.sh-en.t1x ) :<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<transfer><br />
<br />
</transfer><br />
</pre><br />
At the moment, because we're ignoring case, we just need to make a rule that takes the grammatical symbols input and outputs them again.<br />
<br />
We first need to define categories and attributes. Categories and attributes both allow us to group grammatical symbols. Categories allow us to group symbols for the purposes of matching (for example 'n.*' is all nouns). Attributes allow us to group a set of symbols that can be chosen from. For example ('sg' and 'pl' may be grouped a an attribute 'number').<br />
<br />
Lets add the necessary sections:<br />
<pre><br />
<section-def-cats><br />
<br />
</section-def-cats><br />
<section-def-attrs><br />
<br />
</section-def-attrs><br />
</pre><br />
As we're only inflecting, nouns in singular and plural then we need to add a category for nouns, and with an attribute of number. Something like the following will suffice:<br />
<br />
Into section-def-cats add:<br />
<pre><br />
<def-cat n="nom"><br />
<cat-item tags="n.*"/><br />
</def-cat><br />
</pre><br />
This catches all nouns (lemmas followed by <n> then anything) and refers to them as "nom" (we'll see how that's used later).<br />
<br />
Into the section section-def-attrs, add:<br />
<pre><br />
<def-attr n="nbr"><br />
<attr-item tags="sg"/><br />
<attr-item tags="pl"/><br />
</def-attr><br />
</pre><br />
and then<br />
<pre><br />
<def-attr n="a_nom"><br />
<attr-item tags="n"/><br />
</def-attr><br />
</pre><br />
The first defines the attribute nbr (number), which can be either singular (sg) or plural (pl).<br />
<br />
The second defines the attribute a_nom (attribute noun).<br />
<br />
Next we need to add a section for global variables:<br />
<pre><br />
<section-def-vars><br />
<br />
</section-def-vars><br />
</pre><br />
These variables are used to store or transfer attributes between rules. We need only one for now,<br />
<pre><br />
<def-var n="number"/><br />
</pre><br />
Finally, we need to add a rule, to take in the noun and then output it in the correct form. We'll need a rules section...<br />
<pre><br />
<section-rules><br />
<br />
</section-rules><br />
</pre><br />
Changing the pace from the previous examples, I'll just paste this rule, then go through it, rather than the other way round.<br />
<pre><br />
<rule><br />
<pattern><br />
<pattern-item n="nom"/><br />
</pattern><br />
<action><br />
<out><br />
<lu><br />
<clip pos="1" side="tl" part="lem"/><br />
<clip pos="1" side="tl" part="a_nom"/><br />
<clip pos="1" side="tl" part="nbr"/><br />
</lu><br />
</out><br />
</action><br />
</rule><br />
</pre><br />
<br />
The first tag is obvious, it defines a rule. The second tag, pattern basically says: "apply this rule, if this pattern is found". In this example the pattern consists of a single noun (defined by the category item nom). Note that patterns are matched in a longest-match first. So, say you have three rules, the first catches "<prn><vblex><n>", the second catches "<prn><vblex>" and the third catches "<n>". The pattern matched, and rule executed would be the first one.<br />
<br />
For each pattern, there is an associated action, which produces an associated output, out. The output, is a lexical unit (lu).<br />
<br />
The clip tag allows a user to select and manipulate attributes and parts of the source language (side="sl"), or target language (side="tl") lexical item.<br />
<br />
Let's compile it and test it. Transfer rules are compiled with:<br />
<pre><br />
$ apertium-preprocess-transfer apertium-sh-en.sh-en.t1x sh-en.t1x.bin<br />
</pre><br />
Which will generate a <code>sh-en.t1x.bin</code> file.<br />
<br />
Now we're ready to test our machine translation system. There is one crucial part missing, the part-of-speech (PoS) tagger, but that will be explained shortly. In the meantime we can test it as is:<br />
<br />
First, lets analyse a word, gramofoni:<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin <br />
^gramofoni/gramofon<n><pl>$<br />
</pre><br />
Now, normally here the POS tagger would choose the right version based on the part of speech, but we don't have a POS tagger yet, so we can use this little gawk script (thanks to Sergio) that will just output the first item retrieved.<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}'<br />
^gramofon<n><pl>$<br />
</pre><br />
Now let's process that with the transfer rule:<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.sh-en.t1x sh-en.t1x.bin sh-en.autobil.bin<br />
</pre><br />
It will output:<br />
<pre><br />
^gramophone<n><pl>$^@<br />
</pre><br />
* 'gramophone' is the target language (side="tl") lemma (lem) at position 1 (pos="1").<br />
* '<n>' is the target language a_nom at position 1.<br />
* '<pl>' is the target language attribute of number (nbr) at position 1.<br />
<br />
Try commenting out one of these clip statements, recompiling and seeing what happens.<br />
<br />
So, now we have the output from the transfer, the only thing that remains is to generate the target-language inflected forms. For this, we use lt-proc, but in generation (-g), not analysis mode.<br />
<pre><br />
$ echo "gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.sh-en.t1x sh-en.t1x.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
<br />
gramophones\@<br />
</pre><br />
And c'est ca. You now have a machine translation system that translates a Serbo-Croatian noun into an English noun. Obviously this isn't very useful, but we'll get onto the more complex stuff soon. Oh, and don't worry about the '@' symbol, I'll explain that soon too.<br />
<br />
Think of a few other words that inflect the same as gramofon. How about adding those. We don't need to add any paradigms, just the entries in the main section of the monolingual and bilingual dictionaries.<br />
<br />
==Bring on the verbs==<br />
<br />
Ok, so we have a system that translates nouns, but that's pretty useless, we want to translate verbs too, and even whole sentences! How about we start with the verb to see. In Serbo-Croatian this is videti. Serbo-Croatian is a null-subject language, this means that it doesn't typically use personal pronouns before the conjugated form of the verb. English is not. So for example: I see in English would be translated as vidim in Serbo-Croatian.<br />
<br />
* Vidim<br />
* see<p1><sg><br />
* I see<br />
<br />
Note: <code><p1></code> denotes first person<br />
<br />
This will be important when we come to write the transfer rule for verbs. Other examples of null-subject languages include: Spanish, Romanian and Polish. This also has the effect that while we only need to add the verb in the Serbo-Croatian morphological dictionary, we need to add both the verb, and the personal pronouns in the English morphological dictionary. We'll go through both of these.<br />
<br />
The other forms of the verb videti are: vidiš, vidi, vidimo, vidite, and vide; which correspond to: you see (singular), he sees, we see, you see (plural), and they see.<br />
<br />
There are two forms of you see, one is plural and formal singular (vidite) and the other is singular and informal (vidiš).<br />
<br />
We're going to try and translate the sentence: "Vidim gramofoni" into "I see gramophones". In the interests of space, we'll just add enough information to do the translation and will leave filling out the paradigms (adding the other conjugations of the verb) as an exercise to the reader.<br />
<br />
The astute reader will have realised by this point that we can't just translate vidim gramofoni because it is not a grammatically correct sentence in Serbo-Croatian. The correct sentence would be vidim gramofone, as the noun takes the accusative case. We'll have to add that form too, no need to add the case information for now though, we just add it as another option for plural. So, in the paradigm definition just copy the 'e' block for 'i' and change the 'i' to 'e' there.<br />
<br />
<pre><br />
<pardef n="gramofon__n"><br />
<e><p><l/><r><s n="n"/><s n="sg"/></r></p></e><br />
<e><p><l>i</l><r><s n="n"/><s n="pl"/></r></p></e><br />
<e><p><l>e</l><r><s n="n"/><s n="pl"/></r></p></e><br />
</pardef><br />
</pre><br />
<br />
First thing we need to do is add some more symbols. We need to first add a symbol for 'verb', which we'll call "vblex" (this means lexical verb, as opposed to modal verbs and other types). Verbs have 'person', and 'tense' along with number, so lets add a couple of those as well. We need to translate "I see", so for person we should add "p1", or 'first person', and for tense "pri", or 'present indicative'.<br />
<pre><br />
<sdef n="vblex"/><br />
<sdef n="p1"/><br />
<sdef n="pri"/><br />
</pre><br />
After we've done this, the same with the nouns, we add a paradigm for the verb conjugation. The first line will be:<br />
<pre><br />
<pardef n="vid/eti__vblex"><br />
</pre><br />
The '/' is used to demarcate where the stems (the parts between the <l> </l> tags) are added to.<br />
<br />
Then the inflection for first person singular:<br />
<pre><br />
<br />
<e><p><l>im</l><r>eti<s n="vblex"/><s n="pri"/><s n="p1"/><s n="sg"/></r></p></e><br />
<br />
</pre><br />
The 'im' denotes the ending (as in 'vidim'), it is necessary to add 'eti' to the <r> section, as this will be chopped off by the definition. The rest is fairly straightforward, 'vblex' is lexical verb, 'pri' is present indicative tense, 'p1' is first person and 'sg' is singular. We can also add the plural which will be the same, except 'imo' instead of 'im' and 'pl' instead of 'sg'.<br />
<br />
After this we need to add a lemma, paradigm mapping to the main section:<br />
<pre><br />
<e lm="videti"><i>vid</i><par n="vid/eti__vblex"/></e><br />
</pre><br />
Note: the content of <nowiki><i> </i></nowiki> is the root, not the lemma.<br />
<br />
That's the work on the Serbo-Croatian dictionary done for now. Lets compile it then test it.<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh.dix sh-en.automorf.bin<br />
main@standard 23 25<br />
$ echo "vidim" | lt-proc sh-en.automorf.bin<br />
^vidim/videti<vblex><pri><p1><sg>$<br />
$ echo "vidimo" | lt-proc sh-en.automorf.bin<br />
^vidimo/videti<vblex><pri><p1><pl>$<br />
</pre><br />
Ok, so now we do the same for the English dictionary (remember to add the same symbol definitions here as you added to the Serbo-Croatian one).<br />
<br />
The paradigm is:<br />
<pre><br />
<pardef n="s/ee__vblex"><br />
</pre><br />
because the past tense is 'saw'. Now, we can do one of two things, we can add both first and second person, but they are the same form. In fact, all forms (except third person singular) of the verb 'to see' are 'see'. So instead we make one entry for 'see' and give it only the 'pri' symbol.<br />
<pre><br />
<br />
<e><p><l>ee</l><r>ee<s n="vblex"/><s n="pri"/></r></p></e><br />
<br />
</pre><br />
and as always, an entry in the main section:<br />
<pre><br />
<e lm="see"><i>s</i><par n="s/ee__vblex"/></e><br />
</pre><br />
Then lets save, recompile and test:<br />
<pre><br />
$ lt-comp lr apertium-sh-en.en.dix en-sh.automorf.bin<br />
main@standard 18 19<br />
<br />
$ echo "see" | lt-proc en-sh.automorf.bin<br />
^see/see<vblex><pri>$<br />
</pre><br />
Now for the obligatory entry in the bilingual dictionary:<br />
<pre><br />
<e><p><l>videti<s n="vblex"/></l><r>see<s n="vblex"/></r></p></e><br />
</pre><br />
(again, don't forget to add the sdefs from earlier)<br />
<br />
And recompile:<br />
<pre><br />
$ lt-comp lr apertium-sh-en.sh-en.dix sh-en.autobil.bin<br />
main@standard 18 18<br />
$ lt-comp rl apertium-sh-en.sh-en.dix en-sh.autobil.bin<br />
main@standard 18 18<br />
</pre><br />
Now to test:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.sh-en.t1x sh-en.t1x.bin sh-en.autobil.bin<br />
<br />
^see<vblex><pri><p1><sg>$^@<br />
</pre><br />
We get the analysis passed through correctly, but when we try and generate a surface form from this, we get a '#', like below:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.sh-en.t1x sh-en.t1x.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
#see\@<br />
</pre><br />
This '#' means that the generator cannot generate the correct lexical form because it does not contain it. Why is this?<br />
<br />
Basically the analyses don't match, the 'see' in the dictionary is see<vblex><pri>, but the see delivered by the transfer is see<vblex><pri><p1><sg>. The Serbo-Croatian side has more information than the English side requires. You can test this by adding the missing symbols to the English dictionary, and then recompiling, and testing again.<br />
<br />
However, a more paradigmatic way of taking care of this is by writing a rule. So, we open up the rules file (<code>apertium-sh-en.sh-en.t1x sh-en.t1x.bin sh-en.autobil.bin</code> in case you forgot).<br />
<br />
We need to add a new category for 'verb'.<br />
<pre><br />
<def-cat n="vrb"><br />
<cat-item tags="vblex.*"/><br />
</def-cat><br />
</pre><br />
We also need to add attributes for tense and for person. We'll make it really simple for now, you can add p2 and p3, but I won't in order to save space.<br />
<pre><br />
<def-attr n="temps"><br />
<attr-item tags="pri"/><br />
</def-attr><br />
<br />
<def-attr n="pers"><br />
<attr-item tags="p1"/><br />
</def-attr><br />
</pre><br />
We should also add an attribute for verbs.<br />
<pre><br />
<def-attr n="a_verb"><br />
<attr-item tags="vblex"/><br />
</def-attr><br />
</pre><br />
Now onto the rule:<br />
<pre><br />
<rule><br />
<pattern><br />
<pattern-item n="vrb"/><br />
</pattern><br />
<action><br />
<out><br />
<lu><br />
<clip pos="1" side="tl" part="lem"/><br />
<clip pos="1" side="tl" part="a_verb"/><br />
<clip pos="1" side="tl" part="temps"/><br />
</lu><br />
</out><br />
</action><br />
</rule><br />
</pre><br />
Remember when you tried commenting out the 'clip' tags in the previous rule example and they disappeared from the transfer, well, that's pretty much what we're doing here. We take in a verb with a full analysis, but only output a partial analysis (lemma + verb tag + tense tag).<br />
<br />
So now, if we recompile that, we get:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.sh-en.t1x sh-en.t1x.bin sh-en.autobil.bin<br />
^see<vblex><pri>$^@<br />
</pre><br />
and:<br />
<pre><br />
$ echo "vidim" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.sh-en.t1x sh-en.t1x.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
see\@<br />
</pre><br />
Try it with 'vidimo' (we see) to see if you get the correct output.<br />
<br />
Now try it with "vidim gramofone":<br />
<pre><br />
$ echo "vidim gramofoni" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.sh-en.t1x sh-en.t1x.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
see gramophones\@<br />
</pre><br />
<br />
==But what about personal pronouns?==<br />
<br />
Well, that's great, but we're still missing the personal pronoun that is necessary in English. In order to add it in, we first need to edit the English morphological dictionary.<br />
<br />
As before, the first thing to do is add the necessary symbols:<br />
<pre><br />
<sdef n="prn"/><br />
<sdef n="subj"/><br />
</pre><br />
Of the two symbols, prn is pronoun, and subj is subject (as in the subject of a sentence).<br />
<br />
Because there is no root, or 'lemma' for personal subject pronouns, we just add the pardef as follows:<br />
<pre><br />
<pardef n="prsubj__prn"><br />
<e><p><l>I</l><r>prpers<s n="prn"/><s n="subj"/><s n="p1"/><s n="sg"/></r></p></e><br />
</pardef><br />
</pre><br />
With 'prsubj' being 'personal subject'. The rest of them (You, We etc.) are left as an exercise to the reader.<br />
<br />
We can add an entry to the main section as follows:<br />
<pre><br />
<e lm="personal subject pronouns"><i/><par n="prsubj__prn"/></e><br />
</pre><br />
So, save, recompile and test, and we should get something like:<br />
<pre><br />
$ echo "I" | lt-proc en-sh.automorf.bin<br />
^I/PRPERS<prn><subj><p1><sg>$<br />
</pre><br />
<br />
(Note: it's in capitals because 'I' is in capitals).<br />
<br />
Now we need to amend the 'verb' rule to output the subject personal pronoun along with the correct verb form.<br />
<br />
First, add a category (this must be getting pretty pedestrian by now):<br />
<pre><br />
<def-cat n="prpers"><br />
<cat-item lemma="prpers" tags="prn.*"/><br />
</def-cat><br />
</pre><br />
Now add the types of pronoun as attributes, we might as well add the 'obj' type as we're at it, although we won't need to use it for now:<br />
<pre><br />
<def-attr n="tipus_prn"><br />
<attr-item tags="prn.subj"/><br />
<attr-item tags="prn.obj"/><br />
</def-attr><br />
</pre><br />
And now to input the rule:<br />
<pre><br />
<rule><br />
<pattern><br />
<pattern-item n="vrb"/><br />
</pattern><br />
<action><br />
<out><br />
<lu><br />
<lit v="prpers"/><br />
<lit-tag v="prn"/><br />
<lit-tag v="subj"/><br />
<clip pos="1" side="tl" part="pers"/><br />
<clip pos="1" side="tl" part="nbr"/><br />
</lu><br />
<b/><br />
<lu><br />
<clip pos="1" side="tl" part="lem"/><br />
<clip pos="1" side="tl" part="a_verb"/><br />
<clip pos="1" side="tl" part="temps"/><br />
</lu><br />
</out><br />
</action><br />
</rule><br />
</pre><br />
This is pretty much the same rule as before, only we made a couple of small changes.<br />
<br />
We needed to output:<br />
<pre><br />
^prpers<prn><subj><p1><sg>$ ^see<vblex><pri>$<br />
</pre><br />
so that the generator could choose the right pronoun and the right form of the verb.<br />
<br />
So, a quick rundown:<br />
<br />
* <code><lit></code>, prints a literal string, in this case "prpers"<br />
* <code><lit-tag></code>, prints a literal tag, because we can't get the tags from the verb, we add these ourself, "prn" for pronoun, and "subj" for subject.<br />
* <code><b/></code>, prints a blank, a space.<br />
<br />
Note that we retrieved the information for number and tense directly from the verb.<br />
<br />
So, now if we recompile and test that again:<br />
<pre><br />
$ echo "vidim gramofone" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.sh-en.t1x sh-en.t1x.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
I see gramophones<br />
</pre><br />
Which, while it isn't exactly prize-winning prose (much like this HOWTO), is a fairly accurate translation.<br />
<br />
==So tell me about the record player (Multiwords)==<br />
<br />
While gramophone is an English word, it isn't the best translation. Gramophone is typically used for the very old kind, you know with the needle instead of the stylus, and no powered amplification. A better translation would be 'record player'. Although this is more than one word, we can treat it as if it is one word by using multiword (multipalabra) constructions.<br />
<br />
We don't need to touch the Serbo-Croatian dictionary, just the English one and the bilingual one, so open it up.<br />
<br />
The plural of 'record player' is 'record players', so it takes the same paradigm as gramophone (gramophone__n) — in that we just add 's'. All we need to do is add a new element to the main section.<br />
<pre><br />
<e lm="record player"><i>record<b/>player</i><par n="gramophone__n"/></e><br />
</pre><br />
The only thing different about this is the use of the <b/> tag, although this isn't entirely new as we saw it in use in the rules file.<br />
<br />
So, recompile and test in the orthodox fashion:<br />
<pre><br />
$ echo "vidim gramofone" | lt-proc sh-en.automorf.bin | \<br />
gawk 'BEGIN{RS="$"; FS="/";}{nf=split($1,COMPONENTS,"^"); for(i = 1; i<nf; i++) printf COMPONENTS[i]; if($2 != "") printf("^%s$",$2);}' | \<br />
apertium-transfer apertium-sh-en.sh-en.t1x sh-en.t1x.bin sh-en.autobil.bin | \<br />
lt-proc -g sh-en.autogen.bin<br />
I see record players<br />
</pre><br />
Perfect. A big benefit of using multiwords is that you can translate idiomatic expressions verbatim, without having to do word-by-word translation. For example the English phrase, "at the moment" would be translated into Serbo-Croatian as "trenutno" (trenutak = ''moment'', trenutno being adverb of that) &mdash; it would not be possible to translate this English phrase word-by-word into Serbo-Croatian.<br />
<br />
==Dealing with minor variation==<br />
<br />
Serbo-Croatian is an umbrella term for several standard languages, so there are differences in pronounciation and ortography. There is a cool phonetic writing system so you write how you speak. A notable example is the pronounciation of the proto-Slavic vowel ''yat''. The word for dictionary can for instance be either "rječnik" (called Ijekavian), or "rečnik" (called Ekavian).<br />
<br />
===Analysis===<br />
<br />
There should be a fairly easy way of dealing with this, and there is, using paradigms again. Paradigms aren't only used for adding grammatical symbols, but they can also be used to replace any character/symbol with another. For example, here is a paradigm for accepting both "e" and "je" in the analysis. The paradigm should, as with the others go into the monolingual dictionary for Serbo-Croatian.<br />
<br />
<pre><br />
<pardef n="e_je__yat"><br />
<e><br />
<p><br />
<l>e</l><br />
<r>e</r><br />
</p><br />
</e><br />
<e><br />
<p><br />
<l>je</l><br />
<r>e</r><br />
</p><br />
</e><br />
</pardef><br />
</pre><br />
<br />
Then in the "main section":<br />
<br />
<pre><br />
<e lm="rečnik"><i>r</i><par n="e_je__yat"/><i>čni</i><par n="rečni/k__n"/></e><br />
</pre><br />
<br />
This only allows us to analyse both forms however... more work is necessary if we want to generate both forms.<br />
<br />
===Generation===<br />
<br />
==See also==<br />
<br />
*[[Building dictionaries]]<br />
*[[Cookbook]] <br />
*[[Chunking]]<br />
*[[Contributing to an existing pair]]<br />
<br />
[[Category:Documentation in English]]<br />
[[Category:HOWTO]]<br />
[[Category:Writing dictionaries]]</div>Objectiveseahttps://wiki.apertium.org/w/index.php?title=English_and_Esperanto/Evaluation&diff=30283English and Esperanto/Evaluation2011-12-20T21:58:47Z<p>Objectivesea: /* Archimedes */ Archimedes --> Arkimedo; Syracuse --> Sirakuzo</p>
<hr />
<div>{{TOCD}}<br />
<br />
==Archimedes==<br />
<br />
Date: 08 May 2009<br /><br />
Word error rate (WER): 15.79 % <br /><br />
Position-independent word error rate (PER): 15.79 % <br /><br />
<br />
<br />
;English<br />
<pre><br />
Archimedes of Syracuse was an ancient Greek mathematician, physicist and <br />
engineer. Although little is known of his life, he is regarded as one of the leading scientists <br />
in classical antiquity. In addition to making discoveries in the fields of mathematics and geometry, he is <br />
credited with producing machines that were well ahead of their time. <br />
<br />
He laid the foundations of hydrostatics, and explained the principle of the lever, the device <br />
on which mechanics is based. His early advances in calculus included the first known <br />
summation of an infinite series with a method that is still used today. The historians of Ancient Rome <br />
showed a strong interest in Archimedes and wrote accounts of his life and works, while the relatively <br />
few copies of his treatises that survived through the Middle Ages were an influential source of ideas for <br />
scientists during the Renaissance.<br />
</pre><br />
<br />
;Apertium<br />
<pre><br />
*Archimedes de *Syracuse estis antikva greka matematikisto, fizikisto kaj <br />
inĝeniero. Kvankam malmulte estas sciita de lia vivo, li estas rigardita kiel unu el la eminentaj sciencistoj <br />
en klasika antikveco. Krom faranta eltrovoj en la kampoj de matematiko kaj geometrio, li estas <br />
kreditita kun produktanta maŝinoj kiu estis bone antaŭen de ilia tempo. <br />
<br />
Li metis la fundamentojn de *hydrostatics, kaj klarigis la principo de la levilo, la aparato <br />
sur kiu mekanikoj estas bazita. Liaj fruaj antaŭenigoj en kalkulado inkluzivis la unua sciata <br />
*summation de senlima serio kun metodo kiu estas ankoraŭ uzita hodiaŭ. La historiistoj de Antikva Romo <br />
montris fortan intereson en *Archimedes kaj skribis kontoj de lia vivo kaj laboroj, dum la relative <br />
malabundaj kopioj de liaj traktatoj kiu supervivis tra la mezepoko estis influa fonto de ideoj por <br />
sciencistoj dum la Renesanco.<br />
</pre><br />
<br />
;Post-editted<br />
<pre><br />
Arkimedo de Sirakuzo estis antikva greka matematikisto, fizikisto kaj <br />
inĝeniero. Kvankam malmulte estas sciata pri lia vivo, li estas konsiderata unu el la unuarangaj sciencistoj <br />
en klasika antikveco. Krom fari eltrovojn en la kampoj de matematiko kaj geometrio, li estas <br />
atribuita al produkti maŝinojn kiu estis bone antaŭ de ilia tempo. <br />
<br />
Li metis la fundamentojn de hidrostatiko, kaj klarigis la principon de la levilo, la aparato <br />
sur kiu mekaniko estas bazita. Liaj fruaj antaŭenigoj en kalkulado inkluzivis la unua sciata <br />
sumigo de senlima serio kun metodo kiu estas ankoraŭ uzata hodiaŭ. La historiistoj de Antikva Romo <br />
montris fortan intereson en Arkimedo kaj skribis rakontojn pri lia vivo kaj laboroj, dum la relative <br />
maloftaj kopioj de liaj pritraktoj kiu supervivis tra la mezepoko estis influa fonto de ideoj por <br />
sciencistoj dum la Renesanco.<br />
</pre><br />
<br />
==Russian Empire==<br />
<pre><br />
$ cd dev<br />
$ apertium-eval-translator -test eval-RussianEmpire.apertium.txt -ref eval-RussianEmpire.post-edited.txt<br />
</pre><br />
<br />
Date: 08 May 2009<br /><br />
Word error rate (WER): 25.92 %<br /><br />
Position-independent word error rate (PER): 21.99 %<br /><br />
<br />
<br />
;English<br />
<pre><br />
The Russian Empire (Modern Russian: Российская империя, translit: Rossiyskaya Imperiya) was a state that existed <br />
from 1721 until the Russian Revolution of 1917. It was the successor to the Tsardom of Russia, and the predecessor <br />
of the Soviet Union. It was one of the largest empires the world had seen. At one point in 1866, it stretched from <br />
eastern Europe, across northern Asia, and into North America. At the beginning of the 19th century, Russia was the <br />
largest country in the world, extending from the Arctic Ocean to the north to the Black Sea on the south, from the <br />
Baltic Sea on the west to the Pacific Ocean on the east. Across this vast realm were scattered the Tsar's 150 <br />
million subjects, from poor, illiterate peasants to the noble families of great wealth. Its government, ruled by <br />
the Tsar, was one of the last absolute monarchies left in Europe.<br />
<br />
The Russian Empire was a natural successor to the Tsardom of Muscovy. Though the empire was only officially <br />
proclaimed by Tsar Peter I following the Treaty of Nystad (1721), some historians would argue that it was truly <br />
born when Peter acceeded to the throne in early 1682.<br />
<br />
The administrative boundaries of European Russia, apart from Finland, coincided broadly with the natural limits of <br />
the East-European plains. In the North it met the Arctic Ocean; the islands of Novaya Zemlya, Kolguyev and Vaigach <br />
also belonged to it, but the Kara Sea was reckoned to Siberia. To the East it had the Asiatic dominions of the <br />
empire, Siberia and the Kyrgyz steppes, from both of which it was separated by the Ural Mountains, the Ural River <br />
and the Caspian Sea — the administrative boundary, however, partly extending into Asia on the Siberian slope of the<br />
Urals. To the South it had the Black Sea and Caucasus, being separated from the latter by the Manych depression, <br />
which in Post-Pliocene times connected the Sea of Azov with the Caspian. The West boundary was purely conventional:<br />
it crossed the peninsula of Kola from the Varangerfjord to the Gulf of Bothnia; thence it ran to the Kurisches <br />
Haff in the southern Baltic, and thence to the mouth of the Danube. From the Danube, it took a great circular sweep<br />
to the West to embrace Poland, and separating Russia from Prussia, Austrian Galicia and Romania.<br />
</pre><br />
<br />
;Apertium<br />
<pre><br />
La rusa Imperio (Moderna ruso: Российская империя, *translit: *Rossiyskaya *Imperiya) estis stato kiu ekzistis <br />
de 1721 ĝis la rusa Revolucio de 1917. Ĝi estis la posteulo al la *Tsardom de Rusio, kaj la antaŭulo <br />
de la Sovetio. Ĝi estis unu el la plej grandaj imperioj la mondo vidis. Ĉe unu punkto en 1866, ĝi streĉis de <br />
orienta Eŭropo, trans norda Azio, kaj en Norda Ameriko. Ĉe la komenco de la 19a jarcento, Rusio estis la <br />
plej granda lando en la mondo, etendanta de la Arkta Oceano al la norda al la Nigra Maro sur la sudo, de la <br />
Balta Maro sur la okcidenta al la Pacifika Oceano sur la oriento. Trans ĉi tiu vasta sfero estis disigita la Caro-a 150 <br />
milionaj temoj, de senhavulo, nelegosciaj kamparanoj al la noblaj familioj de granda riĉeco. Ĝia registaro, regita de <br />
la Caro, estis unu el la lastaj absolutaj monarkioj lasita en Eŭropo.<br />
<br />
La rusa Imperio estis natura posteulo al la *Tsardom de *Muscovy. Kvankam la imperio estis nur oficiale <br />
proklamita de Caro Peter I sekvanta la Traktaton de *Nystad (1721), kelkaj historiistoj argumentus ke ĝi estis vere <br />
portita kiam Peter *acceeded al la trono en frua 1682.<br />
<br />
La administraciaj limoj de eŭropa Rusio, krom Finnlando, koincidis larĝe kun la naturaj limoj de <br />
la Orienta-eŭropaj ebenaĵoj. En la Nordo ĝi renkontis la Arktan Oceanon; la insuloj de *Novaya *Zemlya, *Kolguyev kaj *Vaigach <br />
ankaŭ apartenita al ĝi, sed la Kara Maro estis kalkulita al *Siberia. Al la Oriento ĝi havis la *Asiatic superregoj de la <br />
imperio, *Siberia kaj la kirgizaj stepoj, de ambaŭ de kiu ĝi estis apartigita de la *Ural Montoj, la *Ural Rivero <br />
kaj la Kaspia Maro — la administracia limo, tamen, parte etendanta en Azio sur la *Siberian deklivo de la<br />
Uraloj. Al la Sudo ĝi havis la Nigran Maron kaj Kaŭkazion, estanta apartigita de la lasta de la *Manych melankolio, <br />
kiu en Poŝto-*Pliocene tempoj konektis la Maron de *Azov kun la Kaspio. La Okcidenta limo estis sole tradicia:<br />
ĝi transiris la duoninsulon de *Kola de la *Varangerfjord al la Golfo de *Bothnia; *thence ĝi kuris al la *Kurisches <br />
*Haff en la suda Balta, kaj *thence al la buŝo de la *Danube. De la *Danube, ĝi prenis grandan rondan kamentubiston<br />
al la Okcidenta ampleksi Pollandon, kaj apartiganta Rusio de *Prussia, aŭstra Galegio kaj Rumanio.<br />
</pre><br />
<br />
;Post-editted<br />
<pre><br />
La Rusa Imperio (Modernrusa: Российская империя, translit: Rossiyskaya Imperiya) estis ŝtato kiu ekzistis <br />
de 1721 ĝis la Rusa Revolucio de 1917. Ĝi estis la sekvinto de la Car-regado de Rusio, kaj la antaŭirinto<br />
de Sovetio. Ĝi estis unu el la plej grandaj imperioj kiun la mondo vidis. Dum unu fojo en 1866, ĝi streĉis de <br />
orienta Eŭropo, trans norda Azio, kaj enen en Norda Ameriko. Ĉe la komenco de la 19a jarcento, Rusio estis la <br />
plej granda lando en la mondo, etendanta de la Arkta Oceano norden al la Nigra Maro sude, de la <br />
Balta Maro okcidente al la Pacifika Oceano oriente. Tra ĉi tiu vasta regno estis la 150 <br />
milionaj subuloj de la Caro, de malriĉaj, nelegosciaj kamparanoj ĝis noblaj familioj de granda riĉeco. Ĝia registaro, regita de <br />
la Caro, estis unu el la lastaj absolutaj monarkioj en Eŭropo.<br />
<br />
La Rusa Imperio estis natura sekvinto al la Car-regno de Muscovy. Kvankam la imperio estis nur oficiale <br />
proklamita de Caro Peter I post la Traktaton de Nystad (1721), kelkaj historiistoj argumentus ke ĝi estis vere <br />
naskite kiam Peter transprenis la tronon frue en 1682.<br />
<br />
La administraciaj limoj de Eŭropa Rusio, krom Finnlando, koincidis larĝe kun la naturaj limoj de <br />
la orienta-eŭropaj ebenaĵoj. En la nordo ĝi renkontis la Arktan Oceanon; la insuloj de Novaya Zemlya, Kolguyev kaj Vaigach <br />
ankaŭ apartenis al ĝi, sed la Kara Maro estis rekonita kunkalkulita al Siberio. Oriente ĝi havis la Aziaj superregoj de la <br />
imperio, Siberio kaj la kirgizaj stepoj, de ambaŭ apartigita de la Urala Montoj, la Urala Rivero <br />
kaj la Kaspia Maro — la administracia limo, tamen, parte etendanta en Azion sur la Siberia deklivo de la<br />
Uraloj. Sude estis la Nigra Maro kaj Kaŭkazio, apartigita de la lasta de la Manych malaltaĵo, <br />
kiu en post-Pliocenaj tempoj konektis la Maron de Azov kun Kaspio. La okcidenta limo estis sole tradicia:<br />
ĝi transiris la duoninsulon Kola de la Varanger-fjordo al la Golfo de Bothnia; de tie ĝi iris al la Kurisches <br />
Haff en la suda Balto, kaj de tie al la buŝo de Danubo. De Danubo, ĝi prenis grandan rondan balaon<br />
okcidenten por ampleksi Pollandon, kaj apartiganta Rusio de Prusio, Aŭstra Galegio kaj Rumanio.<br />
</pre><br />
<br />
<br />
== Corpus coverage ==<br />
<br />
<br />
=== Detailed data on corpus from Wikipedia ===<br />
<br />
<pre><br />
$ zcat corpa/en.crp.txt.gz | sh corpus-stat.sh<br />
Number of tokenised words in the corpus: 478187<br />
Number of known words in the corpus: 450255<br />
Coverage: 94.2 %<br />
Top unknown words in the corpus:<br />
191 ^Apollo/*Apollo$<br />
104 ^Aramaic/*Aramaic$<br />
91 ^Alberta/*Alberta$<br />
81 ^de/*de$<br />
80 ^Abu/*Abu$<br />
63 ^Bakr/*Bakr$<br />
62 ^Agassi/*Agassi$<br />
59 ^Carnegie/*Carnegie$<br />
58 ^Agrippina/*Agrippina$<br />
58 ^Achilles/*Achilles$<br />
</pre><br />
<br />
<br />
<br />
<br />
=== Detailed data on corpus from Reuter's ===<br />
<br />
<pre><br />
$ zcat corpa/en.crp.txt.gz_org_reuters | sh corpus-stat.sh<br />
<br />
Number of tokenised words in the corpus: 1091016<br />
Number of known words in the corpus: 988758<br />
Coverage: 90.6 %<br />
Top unknown words in the corpus:<br />
8952 ^mln/*mln$<br />
7140 ^dlrs/*dlrs$<br />
6045 ^pct/*pct$<br />
4936 ^Reuter/*Reuter$<br />
3357 ^cts/*cts$<br />
2292 ^Inc/*Inc$<br />
2035 ^Corp/*Corp$<br />
1366 ^REUTER/*REUTER$<br />
1320 ^Co/*Co$<br />
926 ^dlr/*dlr$<br />
</pre><br />
<br />
<br />
<br />
=== Detailed data on another corpus (don't remember from where) ===<br />
<br />
<pre><br />
$ zcat corpa/en.crp.txt.gz_2 | sh corpus-stat.sh<br />
Number of tokenised words in the corpus: 496715<br />
Number of known words in the corpus: 474858<br />
Coverage: 95.6 %<br />
Top unknown words in the corpus:<br />
261 ^Corp/*Corp$<br />
242 ^Inc/*Inc$<br />
155 ^Co/*Co$<br />
106 ^anti/*anti$<br />
102 ^ve/*ve$<br />
98 ^Iraq/*Iraq$<br />
97 ^Chicago/*Chicago$<br />
83 ^Iran/*Iran$<br />
81 ^San/*San$<br />
74 ^de/*de$<br />
</pre></div>Objectivesea