Difference between revisions of "Prefixes and infixes"
m (→Lingala) |
|||
Line 19: | Line 19: | ||
(pastpres is recent past used as present tense with some common verbs) |
(pastpres is recent past used as present tense with some common verbs) |
||
==See also== |
|||
* [[Metaparadigms]] |
Revision as of 10:11, 30 July 2007
Apertium was initially designed for languages in which word inflection manifests itself as changes in the suffix of words. For instance, in Spanish, cantar (to sing), cantarías (yo would sing), cantábamos (we sang), etc., all share a prefix cant-. Therefore, both Apertium's tagger and structural transfer assume that the lexical forms corresponding to these surface forms consists of a lemma (cantar) followed by a series of morphological symbols. For instance cantábamos would be cantar.vblex.pii.p1.pl
(cantar, lexical verb, imperfect indicative, 1st person, plural).
But in other languages inflection occurs as prefixes or infixes. For instance, in Swahili kitabu means book and vitabu means books, so a natural way to represent their lexical forms would be sg.kitabu.n
and pl.kitabu.n
, or perhaps sg.n.kitabu
and pl.n.kitabu
, natural meaning that in this way, morphemes in lexical forms would be in the same order as in surface forms, and one could use this to form paradigms (for instance, the same singular/plural forms are found in many other Swahili nouns: kisu/visu (knife), kijiko/vijiko (spoon), etc.
These are difficult to treat in Apertium as it is now, so if we want Apertium to be used for more languages, we need to modify the part-of-speech tagger and the transfer.
- One possible solution would be to see lexical forms as sets and not as sequences. e.g.
pl.n.kitabu
orpl.kitabu.n
would be the same (swahili). A normalization would have to take place somewhere (for instance, tokitabu.n.pl
), but then the structural transfer module would have to be able to reorder (de-normalize) these tags into the order expected by the morphological generator.- A suitable way of normalizing and denormalizing would be having a (source-language dependent) file which specifies a 'canonical order' used by tagger and transfer and another one which specifies the 'standard order' of morphemes in the target language. The bilingual dictionary would be in 'normalized form'. Something similar to this is actually performed by the
pretransfer module
which normalizes split lemmas such astake.vblex.sep.past_off
totake_off.vblex.sep.past
.
- A suitable way of normalizing and denormalizing would be having a (source-language dependent) file which specifies a 'canonical order' used by tagger and transfer and another one which specifies the 'standard order' of morphemes in the target language. The bilingual dictionary would be in 'normalized form'. Something similar to this is actually performed by the
- Another possibility is to generalize the part-of-speech tagger and the transfer to be able to detect and deal with lexical forms in which the lemma can be split or come in any position whatsoever. As before, the person writing the tagger definition or the structural transfer rules would be responsible of managing these correctly.
Miscellaneous examples
Lingala
- to ask = kotúna (ko-tún-a) —
infger.tún.vblex.infger
- to wonder = komítúna (ko-mí-tún-a) —
infger.ref.tún.vblex.infger
- I asked = natúní (na-tún-í) —
p1sg.tún.vblex.pastpres
- You asked = otúní (o-tún-í) —
p2sg.tún.vblex.pastpres
(pastpres is recent past used as present tense with some common verbs)