Difference between revisions of "Monodix basics"

From Apertium
Jump to navigation Jump to search
(New page: We've been told that the Apertium format for dictionaries is rather counter-intuitive, which is fair enough if you're not used to thinking of dictionaries in a particular way. This page ho...)
 
Line 6: Line 6:
   
 
If that doesn't make any sense, you should probably read up some more on XML.
 
If that doesn't make any sense, you should probably read up some more on XML.
  +
  +
==Introduction==
   
 
So, on a global level, the most basic dictionary needs three sections. The first section defines the alphabet that is used with the dictionary. This is fairly self-explanatory and will look something like:
 
So, on a global level, the most basic dictionary needs three sections. The first section defines the alphabet that is used with the dictionary. This is fairly self-explanatory and will look something like:
Line 27: Line 29:
 
==Notes==
 
==Notes==
 
<references/>
 
<references/>
  +
  +
[[Category:Documentation]]

Revision as of 18:05, 6 December 2007

We've been told that the Apertium format for dictionaries is rather counter-intuitive, which is fair enough if you're not used to thinking of dictionaries in a particular way. This page hopes to be a basic introduction to how they work and how you can get started reading them, and hopefully writing them!

This page assumes you are comfortable with HTML and XML, and assumes you can distinguish an element from an attribute and what character data is. If you're wanting a quick re-cap, this should help:

<element attribute="value">character data</element>

If that doesn't make any sense, you should probably read up some more on XML.

Introduction

So, on a global level, the most basic dictionary needs three sections. The first section defines the alphabet that is used with the dictionary. This is fairly self-explanatory and will look something like:

  <alphabet>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>

The second section defines the grammatical symbols[1] of the language you are working with. This is normally where people say, hang on... what are grammatical symbols? Well, they're pretty much ways of describing words, and the different forms that words can take, so I assume you know what a nouns are (house, beer, boat, cat, ...) and can distuingish them from adjectives (red, good, transparent, ...) and verbs (eat, multiply, write, ...). The way we specify these is as follows:

  <sdefs>
    <sdef n="noun"/>
    <sdef n="verb"/>
    <sdef n="adjective"/>
  </sdefs>

People often complain about the brevity of the tags, and typically even the values are abbreviated, so noun becomes "n", verb becomes "vb" and adjective becomes "adj". The brevity serves a purpose however, when you're writing, or copying you want the tags to get in the way as little as possible. For reference, <sdef> means "symbol definition", and <sdefs> is simply this in the plural.

Notes

  1. In other linguistic literature these are sometimes referred to as "features", or "categories" and "sub-categories".