Users guide and notes Jacob

These are my notes for making the English-Esperanto translator but I might be usefull to the same kind of people like me who knows next to nothing about linguistics.

Ive installed standtard Ubuntu packages and theyre working fine:

Using Apertium

echo "Jeg vil gå en tur" | apertium da-sv
Jag vill gå en tur

or

$ echo "Jeg vil gå en tur" | apertium -d apertium-sv-da da-sv
Jag vill gå en tur

don't use the command apertium-translator, its old and deprecated!

how to add a missing word

You will need to add the word in both the source language monodix AND on the translation dictionary.

Example: I want to add "treeview" which is an English noun.

First I check if its in the English monodict apertium-eo-en.en.dix. If it isnt we'll need to add it.

First we need to find the regular noun paradigm in english The paradigm is 'house__n'. Why 'house' ? Just because it's a memorable example.

Understanding the files

<e r="LR"><p><l>kataluno<s n="n"/><s n="f"/></l><r>Catalan<s n="n"/></r></p></e>
<e r="LR"><p><l>kataluno<s n="n"/><s n="m"/></l><r>Catalan<s n="n"/></r></p></e>
<e r="RL"><p><l>kataluno<s n="n"/><s n="GD"/></l><r>Catalan<s n="n"/></r></p></e>

<e r="LR"><p><l>katoliko<s n="n"/><s n="f"/></l><r>Catholic<s n="n"/></r></p></e>
<e r="LR"><p><l>katoliko<s n="n"/><s n="m"/></l><r>Catholic<s n="n"/></r></p></e>
<e r="RL"><p><l>katoliko<s n="n"/><s n="GD"/></l><r>Catholic<s n="n"/></r></p></e>
13.04 It all the same!
 francis.tyers: yep
  that says:
  translating left-to-right: katoliko<n><f> → Catholic<n>
  and
  katoliko<n><m> → Catholic<n>
  
13.05 mig: You could write "katalunino" to say a female Catalan person, but most people wouldnt care and would write "kataluno"
 francis.tyers: translating from right-to-left, Catholic<n> → katoliko<n><GD> (GD = gender to be determined)
 mig: ah
  LR = left-to-right
  the directions
13.06 francis.tyers: yeah
 mig: i have undersood
 francis.tyers: left-to-right = esperanto to english

Why words needs also to be in the monolingual dictionary

treeview is not in the english dictionary

mig: ah
 couldnt it just suppose it to be a noun , then :-)

13.53 francis.tyers: nope

mig: or take it from the apertium-eo-en.eo.dix
francis.tyers: everything to be translated needs to be in the analyser
 how would it know the number ?
 how would it know treeview is singular and treeviews is plural ?

13.54 it could guess, but then how would it be able to distinguish between "to treeview" and "he treeviews" (which don't exist)

mig: so I need also to add the word to apertium-eo-en.en.dix.

and it has the same declination as all other verbs ?

mig: infibitive
 no, its quite skew :-)
 declination= ?

13.10 francis.tyers: conjugation

mig: I promise to learn the lingustic words within the week.
francis.tyers: haha :D
 an idea
mig: yes, all declinations (ways of conjugation) are all the same in Esperanto

Why projects concerning the same languages (f.eks English) not share the English monolingual dictionary?

Why can't for example en-ca, en-es and en-eo all share the SAME English dictionary? > Then we could all contribute to this gian dict for the advantage of > all 3 projects? >

For each project if we want to add it to one dictionary, we need to add it to all of them. For example, if you want to add a word to es-en, you need to add it to all three dictionaries (en, en-es, es) -- in the appropriate form. Otherwise you get the @ # * symbols.

Because of this, and because not every has the time to edit, or speaks all of the languages, we find it more convienient to work with them separately, as language pairs, and then merge when/where possible. You'll note that most of the paradigm names, for example, are shared.

Although the ideal is for each dictionary to be "isolated", it isn't always like that. For example, there are some things it makes sense to distinguish in some language pairs and not in others.

#include clause

Q: In general, is there a way to do something like an #include clause so that I could keep my additions seperate for the rest? A: See apertium-en-es/apertium-en-es.en.metadix.xml:

<?xml version="1.0" encoding="UTF-8"?>

<dictionary>
  <alphabet>·ÀÁÂÄÇÈÉÊËÌÍÎÏÑÒÓÔÖÙÚÛÜàáâäçèéêëìíîïñòóôöùúûüABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>

        <!-- symbols -->
        <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"  href="apertium-en-es.symbols.xml"/>

        <!-- paradigms -->
        <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="apertium-en-es.en.pardefs.xml"/>

And then in apertium-en-es.symbols.xml:

<?xml version="1.0" encoding="UTF-8"?>

  <sdefs>
    <sdef n="comp" />
    <sdef n="detnt" />
    <sdef n="predet" />
    <sdef n="past" />
    <sdef n="atn" />

TODO

- go through http://wiki.apertium.org/wiki/Monodix_basics and review the file (the apertium-eo-en.eo.dix file)
- add treeview (and others added to apertium-eo-en.eo-en.dix) to the English monodix 
- make some wiki notes.

File from traduku.net

convert it into EN : EO

 then tag the EO side
 and strip out the nouns and adjectives
 those are most important to start with
 then grab a corpus
 (wikipedia, or euro parl or something)

22.15 and order them by frequency of the english word

mig: why reorder?
francis.tyers@gmail.com: higher frequency words are more important

22.16 if you translate "the" correctly, you cover ~50% of the text, if you translate "gable" correctly you cover maybe 0.5% 22.17 mig: yes, yes, but why bother if all words get in?

francis.tyers@gmail.com: because someone has to add the inflection for the english side
 the esperanto side is regular, but the english is not always regular

22.18 mig: OK, so reording is important because we probably wont make all 110000.

francis.tyers@gmail.com: yep
 but the good news is we don't need to make 110000
 we have 93% coverage with ~7,000 words
 so we can get 99% coverage with probably 20,000

Users guide and notes Jacob

Contents

Using Apertium

how to add a missing word

Understanding the files

Why words needs also to be in the monolingual dictionary

Why projects concerning the same languages (f.eks English) not share the English monolingual dictionary?

#include clause

TODO

File from traduku.net

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools