Difference between revisions of "Transfer rules examples"

From Apertium
Jump to navigation Jump to search
 
(17 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Exemples de règles de transfert|En français]]

This page is intended to supplement the page [[A long introduction to transfer rules]]. Examples used are taken from apertium-eo-fr pair. It is (at the beginning of 2013) a released pair for translating French to Esperanto. But Esperanto → French translation direction had not been implemented by the initial developer. It is another developer, full beginner for writing transfer rules who chose to do that. The examples given are the first rules written to translate a group of one, two or three Esperanto words into a group of two or three French words.
This page is intended to supplement the page [[A long introduction to transfer rules]]. Examples used are taken from apertium-eo-fr pair. It is (at the beginning of 2013) a released pair for translating French to Esperanto. But Esperanto → French translation direction had not been implemented by the initial developer. It is another developer, full beginner for writing transfer rules who chose to do that. The examples given are the first rules written to translate a group of one, two or three Esperanto words into a group of two or three French words.


Line 12: Line 14:
| align=center | Deformatting
| align=center | Deformatting
| Allows to mark zones of the source text not to be translated. For example, HTML tags must not be translated in another language, but only the text of the Web page.
| Allows to mark zones of the source text not to be translated. For example, HTML tags must not be translated in another language, but only the text of the Web page.
| The same softwares are used for every language pairs. It is the format of the data to be translated which will take to use a particular deformatter.
| The same software are used for every language pairs. It is the format of the data to be translated which will take to use a particular deformatter.
|-
|-
| align=center | Analysis
| align=center | Analysis
| Each [[surface form|word]] of the source text is decomposed into a [[lemma]] followed by the type of the word and its attributes (gender, number, person and time for a verb ...). For some words, several analyzes are possible. In this case, they all are sent on output.
| Each [[surface form|word]] of the source text is decomposed into a [[lemma]] followed by the type of the word and its attributes (gender, number, person and tense for a verb ...). For some words, several analyses are possible. In this case, they all are sent on output.
| Valid for every languages, it uses the [[morphological dictionary]] of the source language.
| Valid for every languages, it uses the [[morphological dictionary]] of the source language.
|-
|-
| align=center | Disambiguation
| align=center | Disambiguation
| When there are several analysis for a word, this step permits to keep only one.
| When there are several analysis for a word, this step permits to keep only one.
| Valid for every languages, it uses a file with <code>.prob</code> suffix<br />For non ambiguous languages as Esperanto, this step stays necessary to take off the [[surface form]] of each analyzed word (pre-formatting for the transfer step).
| Valid for every languages, it uses a file with <code>.prob</code> suffix<br />For non ambiguous languages as Esperanto, this step stays necessary to take off the [[surface form]] of each analysed word (pre-formatting for the transfer step).
|-
|-
| align=center | Pre-transfer
| align=center | Pre-transfer
Line 27: Line 29:
|-
|-
| align=center | Transfer
| align=center | Transfer
| Transforms analyzes from the source language into their translated version in the target language.
| Transforms analyses from the source language into their translated version in the target language.
| Valid for every languages, it uses the [[bilingual dictionary]] and the transfer file with <code>.t1x</code> suffix.
| Valid for every languages, it uses the [[bilingual dictionary]] and the transfer file with <code>.t1x</code> suffix.
|-
|-
Line 76: Line 78:
la traduction automatique
la traduction automatique


'''When analyzing this part of sentence''', we get :
'''When analysing this part of sentence''', we get :


^le<det><def><f><sg>$ ^traduction<n><f><sg>$ ^automatique<adj><mf><sg>$
^le<det><def><f><sg>$ ^traduction<n><f><sg>$ ^automatique<adj><mf><sg>$
Line 120: Line 122:
</pre>
</pre>


=== Section def-cats ===
=== def-cats section ===


The '''def-cats''' section is mandatory. It allows to declare '''categories''' of word that we will fetch to apply a particular transfer rule. It can be simple words (a determinant, a noun, an adjective, a verb, ...) or a little more complicated things as a noun with in its description the tag <nom> (nominative) meaning it is part of the subject of the sentence.
The '''def-cats''' section is mandatory. It allows to declare '''categories''' of word that we will fetch to apply a particular transfer rule. It can be simple words (a determinant, a noun, an adjective, a verb, ...) or a little more complicated things as a noun with in its description the tag <nom> (nominative) meaning it is part of the subject of the sentence.
Line 133: Line 135:
</pre>
</pre>


=== Section def-attrs ===
=== def-attrs section ===


The '''def-attrs''' section is mandatory. It allows to put together by functionalities '''attribute''' names for words defined in the section '''sdefs''' of a [[morphological dictionary]]. For example, we will put together in this section every tag corresponding to the :
The '''def-attrs''' section is mandatory. It allows to put together by functionality '''attribute''' names for words defined in the section '''sdefs''' of a [[morphological dictionary]]. For example, we will put together in this section every tag corresponding to the :


* gender of a word
* gender of a word
* number of a word (singular, plural, ...)
* number of a word (singular, plural, ...)
* person of a verb
* person of a verb
* time of a verb
* tense of a verb
* ...
* ...


Line 153: Line 155:
</pre>
</pre>


=== Section def-vars ===
=== def-vars section ===


The '''def-vars''' section is mandatory and must contain at least one element with the following syntax <code><def-var n="..."/></code> . It lists the global variables used in the transfer rules. However, for the rules described in this page, we will not need any of these variables.
The '''def-vars''' section is mandatory and must contain at least one element with the following syntax <code><def-var n="..."/></code> . It lists the global variables used in the transfer rules. However, for the rules described in this page, we will not need any of these variables.


=== Section def-macros ===
=== def-macros section ===


The '''def-macros''' section is optional. Nevertheless, it will be very useful to write shorter transfer files avoiding to duplicate identical (or almost) operations done in several transfer rules.
The '''def-macros''' section is optional. Nevertheless, it will be very useful to write shorter transfer files avoiding to duplicate identical (or almost) operations done in several transfer rules.
Line 169: Line 171:
</pre>
</pre>


=== Section rules ===
=== rules section ===


Finally, the '''rules''' section is mandatory. It is the longest of the transfer file and the one that justifies its existence. It indeed makes it possible to define the operations to be performed to translate groups of words (or sometimes single words, as we will see).
Finally, the '''rules''' section is mandatory. It is the longest of the transfer file and the one that justifies its existence. It indeed makes it possible to define the operations to be performed to translate groups of words (or sometimes single words, as we will see).
Line 189: Line 191:
== Examples of transfer rules ==
== Examples of transfer rules ==


=== Transferring two words making them agree ===
''the following part will have to be translated later''


We will start to translate to French the Esperanto determinant '''la''' followed by a common noun.
=== Transférer deux mots en les accordant ===


==== Search for modifications ====
On va commencer par traduire en French l'article Esperanto '''la''' suivi d'un nom commun.


In Esperanto, the definite determinant '''la''' is invariant, while in French, it has three forms: '''le''', '''la''', '''les''' according to gender and number of the noun to which it agrees.
==== Recherche des modifications à apporter ====


For the common noun, there are two forms in Esperanto depending on whether it belongs to the subject or to the object complement in the sentence. In French, it is written the same way in both cases.
En Esperanto, l'article défini '''la''' est invariant, alors qu'en French, il possède 3 formes : '''le''', '''la''', '''les''' selon le genre et le nombre du nom auquel il s'accorde.

Pour le nom commun, il a deux formes en Esperanto selon qu'il fasse partie du sujet ou du complément d'objet dans la phrase. En French, il s'écrit pareil dans les deux cas.


'''Examples :'''
'''Examples :'''


{|class=wikitable
{|class=wikitable
! Esperanto !! Analyse Esperanto !! French !! Analyse French
! Esperanto !! Esperanto analyses !! French !! French analyses
|-
|-
| la tago<br/>la tagon || ^la<det><def><sp>$ ^tago<n><sg><nom>$<br/>^la<det><def><sp>$ ^tago<n><sg><acc>$
| la tago<br/>la tagon || ^la<det><def><sp>$ ^tago<n><sg><nom>$<br/>^la<det><def><sp>$ ^tago<n><sg><acc>$
Line 220: Line 220:
|}
|}


Let examine what the lexical translation of the Esperanto analysis gives and compare it to the analysis in French we wants to submit to the generator:
Examinons ce que donne la traduction lexicale de l'analyse Esperanto et comparons-la à l'analyse en French que l'on veut soumettre au générateur :


{|class=wikitable
{|class=wikitable
! Analyse Esperanto !! Analyse Esperanto traduite en French !! L'analyse en French que l'on veut obtenir
! Esperanto analyses !! Esperanto analyses translated in French !! The analyses in French of what we want to get
|-
|-
| ^la<det><def><sp>$ ^tago<n><sg><nom>$<br/>^la<det><def><sp>$ ^tago<n><sg><acc>$
| ^la<det><def><sp>$ ^tago<n><sg><nom>$<br/>^la<det><def><sp>$ ^tago<n><sg><acc>$
Line 243: Line 243:
|}
|}


On constate que :
We can note :


* pour l'article, la traduction lexicale donne systématiquement ^le<det><def><sp>$ . Il faudra remplacer la dernière balise <sp> (singulier ou pluriel) par les balises du nom commun indiquant son genre et son nombre.
* for the determinant, the lexical translation always gives ^le<det><def><sp>$ . It will be necessary to replace the last tag <sp> (singular or plural) by tags used by the common noun giving its gender and number.
* pour le nom commun, la traduction lexicale a trouvé (dans le [[bilingual dictionary]]) le genre du nom traduit en French. Pour savoir si ce nom est au singulier ou au pluriel, elle a conservé l'attribut nombre de la language d'origine. Par contre, on a aussi conservé les attributs <nom> ou <acc> dont on n'a pas besoin en French et qui empêcheraient la generation du mot. Il faudra donc les supprimer dans la règle de transfer.
* for the common noun, the lexical translation found (in the [[bilingual dictionary]]) the gender of the noun translated to French. To know if this noun is singular or plural, it kept the number attribute of the original language. But the attribute <nom> or <acc> which is not needed in French was also kept and it can prevent to generate the word. So, this attribute will have to be removed by the transfer rule.


==== Écriture de la règle de transfer ====
==== Writing the transfer rule ====


Pour cette première règle, nous partons d'un fichier de suffixe <code>.t1x</code> "vide" ayant la structure décrite [[Examples de transfer rules#Structure d'un fichier .t1x|ici]].
For this first rule, we start from a "empty" file with <code>.t1x</code> suffix having the structure described [[Transfer rules examples#Structure of a .t1x file|here]].


La section '''def-macros''' étant facultative, comme elle n'est pas utilisée pour les premières transfer rules décrites dans cette page, nous ne la mettrons pas pour l'instant.
As the '''def-macros''' section is optional and not used for the first transfer rules described in this page, we will not put it for the present.


La section '''def-vars''' est obligatoire. Bien qu'elle ne sera jamais utilisée dans les examples de cette page, nous nous contenterons d'y mettre un contenu minimal pour que le fichier <code>.t1x</code> puisse être compilé :
The '''def-vars''' section is mandatory. Although it will never be used in the examples this page, we will just put a minimum content so that the file <code>.t1x</code> can be compiled:


<pre>
<pre>
Line 262: Line 262:
</pre>
</pre>


The other sections may contain useful information for our first transfer rule.
Les autres sections pourront contenir des informations utiles à notre première règle de transfer.


===== Section def-cats =====
===== def-cats section =====


Dans cette section, on va définir 2 catégories de mots :
In this section, we will define 2 word categories :


* les déterminants qu'on appellera '''det''' et qu'on identifiera dans les analyses par la balise '''<det>''' suivie de n'importe quoi.
* determinants written as '''det''' which are identified in analysis by the tag '''<det>''' followed by anything.
* les noms communs qu'on appellera '''nom_commun''' et qu'on identifiera dans les analyses par la balise '''<n>''' suivie de n'importe quoi.
* common noun written as '''nom_commun''' which are identified in analysis by the tag '''<n>''' followed by anything.


La section '''def-cats''' s'écrira comme ceci :
The '''def-cats''' section will be written as follow :


<pre>
<pre>
Line 285: Line 285:
</pre>
</pre>


* les noms des catégories de mots sont dans l'attribut '''n''' des balises '''<def-cat n="...">'''
* names of word categories are in the attribute '''n''' of '''<def-cat n="...">''' tags
* les description de ce qu'on doit trouver dans l'analyse pour reconnaître la catégorie de mot sont dans l'attribut '''tags''' des balises '''<cat-item tags="..."/>'''
* descriptions of what must be found into analysis to recognize the word category are in the attribute '''tags''' of '''<cat-item tags="..."/>''' tags.


===== Section def-attrs =====
===== def-attrs section =====


Now we will define possible attributes to the various tags of words
On va à présent définir les attributs possibles pour les différentes balises des mots


<pre>
<pre>
Line 313: Line 313:
</pre>
</pre>


* dans l'attribut '''n''' des balises '''<def-attr n="...">''', on donne un nom aux différentes caractéristiques des mots que l'on veut traiter
* In the '''n''' attribute of tags '''<def-attr n="...">''', we give a name to the various characteristics of the words we want to process
* pour chacune de ces caractéristiques, les balises '''<attr-item tags="..."/>''' indiquent les différentes valeurs possibles de cette caractéristique.
* for each of these characteristics, '''<attr-item tags="..."/>''' tags indicate the different possible values ​​of this characteristic.


For the rule we want to write, we defined 3 characteristics :
Dans le cas de la règle que l'on veut écrire, on a défini 3 caractéristiques :


* '''type_mot''' (peut être pas obligatoire, mais il n'y a pas de solution alternative documentée). Pour l'instant, les types disponibles sont :
* '''type_mot''' (may be mandatory, but there is no documented alternative solution). Presently, the available types are:
** n (nom commun)
** n (common noun)
** det (déterminant)
** det (determinant)
:We will add some others later when we will write other rules.
:On en rajoutera plus tard lorsqu'on écrira d'autres règles.


* '''genre''' avec comme valeurs possibles
* '''genre''' with the possible values
** m (masculin)
** m (masculine)
** f (féminin)
** f (feminine)
** mf (masculin ou féminin)
** mf (masculine or feminine)


* '''nombre''' avec comme valeurs possibles
* '''nombre''' with the possible values
** sg (singulier)
** sg (singular)
** pl (pluriel)
** pl (plural)
** sp (singulier ou pluriel)
** sp (singular or plural)


===== Section rules =====
===== rules section =====


Une '''section rules''' contenant uniquement la règle qu'on veut écrire contiendra :
A '''rules section''' containing only the rule we want to write will contain:


<pre>
<pre>
Line 366: Line 366:
</pre>
</pre>


La règle est composée de 2 sections :
The rule is made of 2 sections :


<pre>
<pre>
Line 375: Line 375:
</pre>
</pre>


Dans ce morceau, on précise quelles sont les catégories de mots successives que l'on doit trouver dans l'analyse du texte source pour que la règle puisse s'appliquer. Dans le cas présent, il faudra trouver un déterminant, suivi d'un nom commun. Les attributs des balises '''<pattern-item n="..."/>''' doivent tous avoir été définis dans la section '''def-cats''', sinon la règle ne pourra jamais être appliquée.
In this part, we specifies which are the successive categories of words that must be found in the analysis of the source text so that the rule can apply. In this case, we will have to find a determinant, followed of a common noun. The attributes of '''<pattern-item n="..."/>''' tags must all have been defined in the '''def-cats''' section, otherwise the rule could never be applied.


La partie la plus intéressante de la règle est à partir de la balise '''<action>'''. Elle a la structure suivante :
The most interesting part of the rule is starting from the '''<action>''' tag. It has the following structure:


<pre>
<pre>
Line 383: Line 383:
<out>
<out>
<lu>
<lu>
... (generation de l'unité lexicale pour le premier mot)
... (generation of the lexical unit for the first word)
</lu>
</lu>
<b />
<b />
<lu>
<lu>
... (generation de l'unité lexicale pour le deuxième mot)
... (generation of the lexical unit for the second word)
</lu>
</lu>
</out>
</out>
Line 393: Line 393:
</pre>
</pre>


Dans cette règle, on ne fait que générer des données que l'on envoie en sortie. Le contenu de la balise '''<action>''' se limite donc à de la generation de texte qui est indiquée par la balise '''<out>'''.
In this rule, we only generate data that we send on output. The contents of '''<action>''' tag is therefore limited to the generation of the text that is indicated in '''<out>''' tag.


On devra générer l'analyse de 2 mots dans la target language. Chaque analyse de mot constitue une [[unité lexicale]] (balise '''<lu>''') qui en sortie sera symbolisée par les caractères '''^...$''' la description de l'unité lexicale remplacera les pointillés.
We will have to generate the analysis of 2 words in the target language. Analysis of each word is a lexical unit]] ('''<lu>''' tag) which on output will be symbolized by the characters '''^...$''' where the description of the lexical unit will replace the dotted lines.


Entre les deux unités lexicales, on laissera un espace (balise '''<b /> ''') sinon, les deux mots générés se toucheraient.
Between the two lexical units, we will leave a space ('''<b /> ''' tag) otherwise, the two words generated would be stick.


Let us examine how lexical units are written :
Examinons l'écriture de chacune des unités lexicales :


La première balise '''<clip pos="1" side="tl" part="lem"/>''' possède élément par élément la signification suivante :
The first tag '''<clip pos="1" side="tl" part="lem"/>''' has element by element the following meaning:


{|class=wikitable
{|class=wikitable
! Morceau !! Signification
! Part !! Meaning
|-
|-
| clip || C'est un mot-clé qui peut être traduit par "récupère"
| clip || This is a keyword which can be translated by "get"
|-
|-
| pos="1" || C'est le numéro du '''pattern-item''' dans la liste '''<pattern>...</pattern>''' de la règle. Ici, pos="1" correspond à l'analyse du déterminant
| pos="1" || It is the number of the '''pattern-item''' in the list '''<pattern>...</pattern>''' of the rule. Here, pos="1" corresponds to the analysis of the determinant
|-
|-
| side="tl" || On récupère l'information dans la target language. Pour accéder à la source language, on mettrait '''side="sl"'''
| side="tl" || We get the information from the target language. To access to the source language, we would write '''side="sl"'''
|-
|-
| part="lem" || C'est un mot clé réservé correspondant au lemme.
| part="lem" || This is a reserved keyword corresponding to the lemma.
|-
|-
|}
|}


La troisième balise '''<lit-tag v="def"/>''' possède élément par élément la signification suivante :
The third '''<lit-tag v="def"/>''' tag has element by element the following meaning:


{|class=wikitable
{|class=wikitable
! Morceau !! Signification
! Part !! Meaning
|-
|-
| lit-tag || C'est un mot-clé qui peut être traduit par "génère une balise"
| lit-tag || This is a keyword which can be translated by "generate a tag"
|-
|-
| v="def" || On précise ici le contenu de la balise. Dans le cas présent, on générera '''<def>'''.
| v="def" || Here we specify the contents of the tag. In this case, '''<def>''' will be generated.
|-
|-
|}
|}


The 5 instruction necessary to generate the analysis of the determinant have the following meaning:
L'ensemble des 5 instructions nécessaires pour générer l'analyse du déterminant possède la signification suivante :


{|class=wikitable
{|class=wikitable
! width=280 | Instruction !! Signification
! width=280 | Instruction !! Meaning
|-
|-
| <clip pos="1" side="tl" part="lem"/> || Récupérer le lemme du premier mot du pattern dans la target language. Ce sera toujours l'article French "le".
| <clip pos="1" side="tl" part="lem"/> || Get the lemma of the first word of the pattern in the target language. It will always be French article "le".
|-
|-
| <clip pos="1" side="tl" part="type_mot"/> || Récupérer le type du premier mot du pattern dans la target language. Ce sera '''det'''.
| <clip pos="1" side="tl" part="type_mot"/> || Get the type of the first word of the pattern in the target language. It will be '''det'''.
|-
|-
| <lit-tag v="def"/> || Générer une balise '''def''', c'est à dire le texte '''<def>''' qui permet de préciser que l'article est ''défini''.
| <lit-tag v="def"/> || Generate a '''def''' tag, that is the text '''<def>''' which specifies that the determinant is ''defined''.
|-
|-
| <clip pos="2" side="tl" part="genre"/> || Récupérer le genre du deuxième mot du pattern de la target language, c'est à dire le genre du nom commun.
| <clip pos="2" side="tl" part="genre"/> || Get the gender of the second word of the pattern in the target language, that is the gender of the common noun.
|-
|-
| <clip pos="2" side="tl" part="nombre"/> || Récupérer le nombre du deuxième mot du pattern de la target language, c'est à dire le nombre du nom commun.
| <clip pos="2" side="tl" part="nombre"/> || Get the number of the second word of the pattern in the target language, that is the number of the common noun.
|-
|-
|}
|}


L'ensemble de ces 5 éléments récupérés constitue l'unité lexicale '''<lu>...</lu>''' qui sera envoyée en sortie grâce à la balise '''<out>...</out>'''
The 5 elements we got constitute constitutes the lexical unit '''<lu>...</lu>''' that will be sent on output using the tag '''<out>...</out>'''


Pour la deuxième unité lexicale correspondant à la traduction du nom commun, on peut remarquer qu'on a dans chaque ligne : '''pos="2" side="tl"''' ce qui signifie qu'on recopiera simplement certaines balises du nom commun (2ème mot de la règle).
For the second lexical unit corresponding to the common noun translation, we can notice that we have on each line : '''pos="2" side="tl"''' meaning that we will simply copy several tags of the common noun (2nd word of the rule).


Detailed explanation of the four instructions:
Explication détaillée des 4 instructions :


{|class=wikitable
{|class=wikitable
! width=280 | Instruction !! Signification
! width=280 | Instruction !! Meaning
|-
|-
| <clip pos="2" side="tl" part="lem"/> || Récupérer le lemme du deuxième mot du pattern dans la target language (le nom commun en French).
| <clip pos="2" side="tl" part="lem"/> || Get the lemma of the second word of the pattern in the target language (the common noun in French).
|-
|-
| <clip pos="2" side="tl" part="type_mot"/> || Récupérer le type du deuxième mot. Ce sera '''n'''.
| <clip pos="2" side="tl" part="type_mot"/> || Get the type of the second word. That will be '''n'''.
|-
|-
| <clip pos="2" side="tl" part="genre"/> || Récupérer le genre du nom commun.
| <clip pos="2" side="tl" part="genre"/> || Get the gender of the common noun.
|-
|-
| <clip pos="2" side="tl" part="nombre"/> || Récupérer le nombre du nom commun.
| <clip pos="2" side="tl" part="nombre"/> || Get the number of the common noun.
|-
|-
|}
|}


===== Remarque =====
===== Note =====


Si on envoie au générateur le résultat obtenu en sortie du transfer, on n'obtient pas tout à fait ce qu'il faudrait :
If we send to the generator the result of the of the transfer, we don't get exactly what is needed :


{|class=wikitable
{|class=wikitable
! Analyse French !! Résultat generation !! Ce qu'il faudrait
! French analysis !! Result of the generation !! What is needed
|-
|-
| ^le<det><def><m><sg>$ ^jour<n><m><sg>$ || ~le jour || le jour
| ^le<det><def><m><sg>$ ^jour<n><m><sg>$ || ~le jour || le jour
Line 488: Line 488:
|}
|}


Le remplacement de l'article '''le/la''' par '''l'''' en fonction de la première lettre du mot suivant n'est pas fait au moment de la generation mais juste après dans l'étape de post-generation qui s'occupe des mots marqués par une ~ . Cette remarque étant faite, la post-generation ne sera pas mentionnée dans cette page.
The replacement of the determinant '''le/la''' by '''l'''' according to the first letter of the following word is not done during the generation but just after during the post-generation step which process the words marked by a ~ . This remark being done, the post-generation will not be mentioned again in this page.


=== Rajouter un mot dans la target language ===
=== Adding a word in the target language text ===


L'Esperanto ne possède pas d'article indéfini. Pour exprimer '''un''', '''une''', '''des''', on se contente de ne pas mettre l'article défini ''la'' devant la nom commun. Un nom commun isolé écrit en Esperanto devra donc être précédé de l'article indéfini '''un''', '''une''' ou '''des''' adéquat, si on le traduit en French.
Esperanto does not have any indefinite determinant. To translate '''un''', '''une''', '''des''', we simply do not put the definite determinant ''la'' before the common noun. A common noun written alone in Esperanto will have to be preceded by the correct indefinite determinant '''un''', '''une''' or '''des''', if it is translated in French.


Notre deuxième règle va faire cette transformation.
Our second rule will make this transformation.


Examinons de que donne le transfer lexical of a word en Esperanto et comparons-le à ce qu'on voudrait obtenir en French.
Let examine what gives the lexical translation of the Esperanto analysis and compare it to the analysis in French we want to submit to the generator:

Examinons ce que donne la traduction lexicale de l'analyse Esperanto et comparons-la à l'analyse en French que l'on veut soumettre au générateur :


{|class=wikitable
{|class=wikitable
! Analyse Esperanto !! Analyse Esperanto traduite en French !! L'analyse en French que l'on veut obtenir
! Esperanto analysis !! Esperanto analysis translated to French !! French analysis that we want to get
|-
|-
| ^tago<n><sg><nom>$<br/>^tago<n><sg><acc>$
| ^tago<n><sg><nom>$<br/>^tago<n><sg><acc>$
Line 521: Line 519:
|}
|}


Par rapport à la règle précédente, au lieu de générer '''^le<det><def><''genre''><''nombre''>$''' on va générer '''^un<det><ind><''genre''><''nombre''>$'''. Tout le reste est sans changement.
Compared to the previous rule, instead of generating '''^le<det><def><''gender''><''number''>$''' we will generate '''^un<det><ind><''gender''><''number''>$'''. Everything else is unchanged.


Pour écrire la nouvelle règle, on dispose déjà de tout ce qu'il faut dans les sections '''def-cats''' et '''def-attrs'''. Il suffira donc de rajouter la nouvelle règle dans la section '''rules''' qui va devenir :
To write the new rule, we already have all what we need in '''def-cats''' and '''def-attrs''' sections . So, we will just have to add the new rule in the '''rules''' section that will become:


<pre>
<pre>
Line 533: Line 531:
</pattern>
</pattern>
<action>
<action>
... (voir le contenu au paragraphe précédent)
... (see the contents in the preceding paragraph)
</action>
</action>
</rule>
</rule>
Line 561: Line 559:
</pre>
</pre>


Dans cette nouvelle règle, on trouve pour la première fois l'instruction '''lit''' qui va générer une chaîne de caractères, par opposition à '''lit-tag''' qui englobe la chaîne générée de '''< >''' pour qu'elle devienne une balise.
In this new rule, we find for the first time the instruction '''lit''' that will generate a string, contrarily to '''lit-tag''' which includes the generated string inside '''< >''' so that it becomes a tag.


Comme dans le texte de la source language à transférer, il n'y a qu'un mot (le nom commun mentionné dans le pattern), on accède à ses attributs par '''pos="1"''' alors que c'était '''pos="2"''' dans la première règle.
As in the text of the source language to be transferred, there is only one word (the common noun mentioned in the pattern), we can access its attributes by '''pos="1"''' whereas it was '''pos="2"''' in the first rule.


The 4 instructions needed to generate the analysis of the indefinite determinant have the following meaning:
Les 4 instructions nécessaires pour générer l'analyse de l'article indéfini possèdent la signification suivante :


{|class=wikitable
{|class=wikitable
! width=280 | Instruction !! Signification
! width=280 | Instruction !! Meaning
|-
|-
| <lit v="un"> || Générer le lemme "un".
| <lit v="un"/> || Generate the lemma "un".
|-
|-
| <lit-tag v="det.ind"/> || Générer une balise '''det''' suivie d'une balise '''ind''', c'est à dire le texte '''<det><ind>''' qui permet de préciser qu'on génère un ''article indéfini''.
| <lit-tag v="det.ind"/> || Generate a '''det''' tag followed by a '''ind''' tag, that is the text '''<det><ind>''' which makes it possible to specify that we generate a ''indefinite determinant''.
|-
|-
| <clip pos="1" side="tl" part="genre"/> || Récupérer le genre du nom commun.
| <clip pos="1" side="tl" part="genre"/> || Get the gender of the common noun.
|-
|-
| <clip pos="1" side="tl" part="nombre"/> || Récupérer le nombre du nom commun.
| <clip pos="1" side="tl" part="nombre"/> || Get the number of the common noun.
|-
|-
|}
|}


Les instructions pour générer la traduction en French du nom commun sont les mêmes que pour la règle précédent, à part que maintenant '''pos="1"'''.
The instructions to generate the translation in French of the common noun are the same ones as for the previous rule, except that now '''pos="1"'''.


=== Intervertir deux mots ===
=== Interchange two words ===


Now we will see a rule to change the order of two words during a translation.
Nous allons voir à présent une règle pour changer l'ordre de deux mots lors d'une traduction.


En Esperanto, il est préconisé de mettre l'adjectif avant le nom mais ce n'est pas imposé. Le traducteur Apertium espagnol -> Esperanto conserve l'ordre des mots de la phrase espagnole alors que le traducteur Apertium French -> Esperanto met l'adjectif avant le nom.
In Esperanto, it is recommended to put the adjective before the noun but it is not mandatory. The Apertium Spanish -> Esperanto translator preserves the word order of the Spanish sentence whereas The Apertium French -> Esperanto translator puts the adjective before the noun.


En French, la plupart des adjectifs se placent après le nom qu'ils qualifient, mais certains adjectifs se placent avant.
In French, most of the adjectives are placed after the noun they qualify, but some adjectives are placed before.


La solution complète traiterait tous les cas possibles en Esperanto comme en French. Nous allons nous limiter au cas le plus fréquent en réalisant une règle qui a partir d'une forme "la" + adjectif + nom en Esperanto, fournit une traduction du type "le/la/les" + nom + adjectif en French.
The complete solution would process all the possible cases in Esperanto as in French. We will limit ourselves to the most frequent case by writing a rule which starting from a form "la" + adjective + noun in Esperanto, provides a translation such as "le/la/les" + noun + adjective in French.


==== Rajout dans la section def-cats ====
==== To be added in the def-cats section ====


Dans cette section, nous allons rajouter une catégorie pour les adjectifs :
In this section, we will add a category for the adjectives:


<pre>
<pre>
Line 602: Line 600:
</pre>
</pre>


==== Rajout dans la section def-attrs ====
==== To be added in the def-attrs section ====


In the words type list (type_mot), we add adjectives:
Dans les types de mots, on rajoute les adjectifs :


<pre>
<pre>
Line 614: Line 612:
</pre>
</pre>


==== Rajout de la règle qui va intervertir l'adjectif et le nom ====
==== Adding the rule which will invert the adjective and the noun ====


<pre>
<pre>
Line 652: Line 650:
</pre>
</pre>


On constate dans cette règle qu'on génère d'abord le déterminant (pos = 1), puis le nom (pos = 3 dans le pattern) et enfin l'adjectif (pos = 2 dans le pattern). Pour intervertir deux mots, il a suffit de générer les unités lexicales '''<lu>...</lu>''' dans un ordre différent.
We can note that in this rule we generate first the determinant (pos = 1), then the noun (pos = 3 in the pattern) and finally the adjective (pos = 2 in the pattern). To swap two words, we only needed to generate the lexical units '''<lu>...</lu>''' in a different order.


In this rule, the determinant and the adjective agree in gender and number with the noun.
Dans cette règle, le déterminant et l'adjectif s'accordent en genre et en nombre avec le nom.


=== Changer des attributs en fonction de conditions ===
=== Changing attributes according to conditions ===


Now, we will examine a rule to translate a personal pronoun followed by a verb applying the conjugation rules.
A présent, nous allons examiner une règle permettant de traduire un pronom personnel suivi d'un verbe en appliquant les règles de conjugaison.


==== Recherche des modifications à apporter ====
==== Searching modifications to be made ====


* En Esperanto, le verbe est invariant par rapport au pronom personnel qui le précède (ou plus généralement par rapport au sujet)
* In Esperanto, the verb is invariant according to the personal pronoun witch is just before ((or more generally according to the subject).
* En French, le verbe s'accorde avec la personne et le nombre du pronom personnel (mais pas son genre)
* In French, the verb agrees with the person and the number of the personal pronoun (but not with its gender).


In addition, some of the French personal pronouns have no specific equivalent in Esperanto which is for this point like English:
De plus, certains pronoms personnel du French n'ont pas d'équivalent spécifique en Esperanto qui sur ce point est comme l'anglais :


* '''tu''' (2ème personne du singulier) et '''vous''' (2ème personne du pluriel) en French sont tous deux traduits par '''vi''' en Esperanto.
* '''tu''' (second person singular) and '''vous''' (second person plural) in French are both translated by '''vi''' in Esperanto.
* '''ils''' et '''elles''' (les formes masculines et féminines de la 3ème personne du pluriel) sont traduites par '''ili''' en Esperanto.
* '''ils''' and '''elles''' (masculine and feminine forms of the 3rd person plural) are translated by '''ili''' in Esperanto.


pour passer de l'Esperanto au French, on fera donc des choix :
To translate from Esperanto to French, we will then have to make choices:


* '''vi''' → '''vous''' 2ème personne du pluriel ou forme de politesse pour s'adresser à une seule personne
* '''vi''' → '''vous''' second person plural or polite form to speak to a single person
* '''ili''' → '''ils''' on choisit le masculin pour la 3ème personne du pluriel en French.
* '''ili''' → '''ils''' we choose the masculine for the 3rd person plural in French.


De même, l'Esperanto ne dispose que d'un temps pour le passé là où le French en a quatre. En plus, dans une analyse, les dictionnaires Esperanto et French n'utilisent pas la même abréviation pour le présent de l'indicatif. Il faudra donc changer tout ça lors de la traduction.
Similarly, Esperanto has only one tense for the past where French has four. In addition, in an analysis, Esperanto and French dictionaries do not use the same abbreviation for the present indicative. It will thus be necessary to change all that during the translation.


Nous allons voir ce que tout cela donne pour le verbe '''kanti''' → '''chanter''' conjugué au présent de l'indicatif.
We will see what all this gives for the verb '''kanti''' → '''chanter''' conjugated in the present indicative.


{|class=wikitable
{|class=wikitable
! Esperanto !! Analyse Esperanto !! Analyse Esperanto traduite || L'analyse qu'on voudrait !! French
! Esperanto !! Esperanto analyses !! Esperanto analyses translated || The analysis we would like to get !! French
|-
|-
| mi kantas || ^prpers<prn><subj><p1><mf><sg>$<br/> ^kanti<vbtr_ntr><pres>$
| mi kantas || ^prpers<prn><subj><p1><mf><sg>$<br/> ^kanti<vbtr_ntr><pres>$
Line 712: Line 710:
|}
|}


==== Écriture de la règle de transfer ====
==== Writing the transfer rule ====


===== Rajouts dans la section def-cats =====
===== To be added in the def-cats section =====


In this section, we will add a category for pronouns and a category for verbs:
Dans cette section, nous allons rajouter une catégorie pour les pronoms et une catégorie pour les verbes :


<pre>
<pre>
Line 732: Line 730:
</pre>
</pre>


Comme il existe en Esperanto plusieurs formes pour les verbes, on a mis plusieurs '''cat-item''' pour les énumérer toutes.
As there are in Esperanto many forms for verbs, we put several '''cat-item''' to list all of them.


===== Rajouts dans la section def-attrs =====
===== To be added in the def-attrs section =====


En ce qui concerne les verbes, différents mots clés sont utilisés en Esperanto, alors qu'en French, presque tous les verbes sont classés vblex.
According to verbs, different keywords are used in Esperanto, whereas in French, almost all the verbs are classified vblex.


In the words type list (type_mot), we add verbs (several possibilities) and pronouns:
Dans les types de mots, on rajoute les verbes (plusieurs possibilités) et les pronoms :


<pre>
<pre>
<def-attr n="type_mot">
<def-attr n="type_mot">
.......... (ce qu'il y avait avant)
.......... (what there was before)
<attr-item tags="prn"/>
<attr-item tags="prn"/>
<attr-item tags="vblex"/>
<attr-item tags="vblex"/>
Line 751: Line 749:
</pre>
</pre>


On rajoute aussi les 2 catégories ''personne'' et ''temps'' pour la conjugaison des verbes:
We also add the two categories ''personne'' and ''temps'' for the conjugation of verbs:


<pre>
<pre>
Line 769: Line 767:
</pre>
</pre>


Before writing the rules section, some changes are needed for the verb tenses and for the gender and number of pronouns.
Avant d'écrire la section rules, certaines transformations sont nécessaires pour le temps des verbes et pour le genre et le nombre des pronoms.


===== Transformation du temps =====
===== Transformation for the tense =====


Pour cet example, on se limitera aux temps de l'indicatif.
For this example, we will limit to the indicative tenses.


En Esperanto, il y a 3 temps pour l'indicatif :
In Esperanto, there are 3 indicative tenses:


* le passé : past
* the past : past
* le présent : pres
* the present : pres
* le futur : fti
* the future : fti


En French, il y a 6 temps plus ou moins courants pour l'indicatif :
In French, there are 6 more or less common tenses for the indicative:


* l'imparfait :pii
* the ''imparfait'' :pii
* le passé simple : ifi
* the ''passé simple'' (simple past) : ifi
* le passé composé qu'il faudrait fabriquer avec le verbe avoir + le participe passé.
* the ''passé composé'' (compound past) that should be made with the verb ''avoir'' (to have) + the past participle.
* le plus que parfait (même problème que pour le passé composé)
* the ''plus que parfait'' (plus perfect) (same problem as for the passé composé)
* le présent : pri
* the present : pri
* le futur : fti
* the future : fti


Pour les verbes au futur, l'attribut '''fti''' peut être conservé sans changement
For verbs at the future, the attribute '''fti''' can be kept unchanged


Pour les verbes au présent, il faudra remplacer l'attribut '''pres''' de l'Esperanto par '''pri'''.
For verbs at the present, it will be necessary to replace the attribute '''pres''' used in Esperanto by '''pri'''.


Pour les verbes au passé, le passé composé serait pas mal pour une traduction, mais moins facile à générer. On va pour cet example remplacer l'attribut '''past''' de l'Esperanto par '''pii''' (imparfait).
For verbs at the past, compound past should be nice for a translation, but less easy to generate. For this example we will replace the '''past''' attribute used in Esperanto by '''pii''' (imparfait).


In algorithmic form, that makes the following conditional transformations:
Sous forme algorithmique, cela donne les transformations conditionnelles suivantes :


<pre>
<pre>
SI temps = "pres" ALORS
IF temps = "pres" THEN
temps <- "pri"
temps <- "pri"
SINON SI temps = "past" ALORS
ELSE IF temps = "past" THEN
temps <- "pii"
temps <- "pii"
FIN SI
END IF
</pre>
</pre>


===== Transformation des attributs du pronom =====
===== Transformation of the pronoun attributes =====


For the pronoun, we will do the following changes:
Pour le pronom, on fera les changements suivants :


<pre>
<pre>
SI personne = "p2" ALORS
IF personne = "p2" THEN
nombre <- "pl"
nombre <- "pl"
SINON SI (personne = "p3" ET nombre = "pl" ALORS
ELSE IF (personne = "p3" AND nombre = "pl" THEN
genre <- "m"
genre <- "m"
FIN SI
END IF
</pre>
</pre>


===== Section rules =====
===== rules section =====


The new rule has the following contents:
La nouvelle règle a le contenu suivant :


<pre>
<pre>
Line 858: Line 856:
</choose>
</choose>


<choose> <!-- cas particuliers de transfers des pronoms -->
<choose> <!-- special cases for pronouns transfers -->
<when> <!-- 2ème personne toujours au pluriel : vi -> vous -->
<when> <!-- 2nd person always plural : vi -> vous -->
<test>
<test>
<equal>
<equal>
Line 871: Line 869:
</let>
</let>
</when>
</when>
<when> <!-- 3ème personne du pluriel toujours au masculin : ili -> ils -->
<when> <!-- 3rd person plural always masculine : ili -> ils -->
<test>
<test>
<and>
<and>
Line 912: Line 910:
</pre>
</pre>


Pour la première fois, la partie '''action''' de la règle ne se limite pas à un bloc '''<out>...</out>''', mais commence par deux blocs '''choose''' ayant chacun la structure suivante :
For the first time, the '''action''' part of the rule does not limit to a block '''<out>...</out>''', but starts with two '''choose''' blocks each having the following structure:


<pre>
<pre>
Line 918: Line 916:
<when>
<when>
<test>
<test>
.... (une condition)
.... (a condition)
</test>
</test>
<let>
<let>
.... (action si cette condition est réalisée)
.... (action if this condition is true)
</let>
</let>
</when>
</when>
<when>
<when>
<test>
<test>
.... (condition alternative à la précédente)
.... (alternative to the previous condition)
</test>
</test>
<let>
<let>
.... (action si la condition alternative est réalisée)
.... (action if the alternative condition is true)
</let>
</let>
</when>
</when>
Line 935: Line 933:
</pre>
</pre>


Examinons en détail le premier bloc '''<when>...</when>'''
Let us examine in detail the first block '''<when>...</when>'''


<pre>
<pre>
Line 952: Line 950:
</pre>
</pre>


We start from inside the tags, then we will go up towards the including tags.
Nous commençons par l'intérieur des balises, puis on remontera vers les balises englobantes.


{|class=wikitable
{|class=wikitable
! width=260 | Instruction !! Signification
! width=260 | Instruction !! Meaning
|-
|-
| <clip pos="2" side="sl" part="temps"/> || Récupère l'attribut "temps" du 2ème mot concerné par la règle (c'est à dire le verbe) du coté source language
| <clip pos="2" side="sl" part="temps"/> || Get the "temps" attribute of the 2nd word concerned with the rule (that is the verb) from the source language side
|-
|-
| <lit-tag v="pres"/> || Génère une balise '''pres'''
| <lit-tag v="pres"/> || Generate a '''pres''' tag
|-
|-
| <equal>...</equal> || Vérifie si'il y a égalité entre les 2 valeurs précédentes
| <equal>...</equal> || Check if the 2 preceding values are equal
|-
|-
| <test>...</test> || Décide si on doit exécuter le bloc d'instruction placé juste après.
| <test>...</test> || Decide if the block of instruction just afterwards must be executed.
|-
|-
|}
|}


Ensuite, voici ce qui est fait lorsque la condition du test est vérifiée :
Then, here is what is done when the test condition is true :


{|class=wikitable
{|class=wikitable
! width=260 | Instruction !! Signification
! width=260 | Instruction !! Meaning
|-
|-
| <clip pos="2" side="tl" part="temps"/> || Récupère (ou accède à) l'attribut "temps" du 2ème mot concerné par la règle (c'est à dire le verbe) du coté target language
| <clip pos="2" side="tl" part="temps"/> || Get (or access to) the "temps" attribute of the 2nd word concerned with the rule (that is the verb) from the target language side
|-
|-
| <lit-tag v="pri"/> || Génère une balise '''pri'''
| <lit-tag v="pri"/> || Generate a '''pri''' tag
|-
|-
| <let>...</let> || Semble correspondre à une affectation de la 2ème valeur dans la première
| <let>...</let> || Seems to be an assignment of the second value into the first one
|-
|-
|}
|}


De la même manière, le deuxième bloc '''<when>...</when>'''
By the same way, the second block '''<when>...</when>'''


<pre>
<pre>
Line 997: Line 995:
</pre>
</pre>


teste si le temps du verbe correspond au passé ("past") et dans ce cas lui donne la valeur "pii" pour la target language.
tests whether the tense of the verb is "past" and in this case gives it the value "pii" for the target language.


Dans les instructions conditionnelles qui concernent le pronom, on trouve un bloc '''test''' plus compliqué :
Inside the conditional instructions for the pronoun, there is a more complicated '''test''' block:


<pre>
<pre>
Line 1,016: Line 1,014:
</pre>
</pre>


à l'intérieur du bloc '''<and>...</and>''', il y a deux blocs '''<equal>...</equal>''' (il pourrait y en avoir davantage) et la condition est vraie si des deux égalités sont vérifiées simultanément : dans le cas présent "p3" pour l'attribut ''personne'' '''et''' "pl" pour l'attribut ''nombre''.
inside the block '''<and>...</and>''', there are two blocks '''<equal>...</equal>''' (there could be more ) and the condition is true if the two equalities are simultaneously verified : in this case "p3" for the attribute ''personne'' '''and''' "pl" for the attribute ''nombre''.


Dans d'autres règles, on pourrait aussi trouver des blocs '''<or>...</or>''' pour lesquels la condition est vraie si au moins l'une des conditions présentes dans le bloc l'est.
In other rules, we could also find '''<or>...</or>''' blocks for which the condition is true if at least one of the conditions inside the block is.


De même, il existe des balises '''<not>''' et '''</not>''' pour prendre l'opposé d'une condition. Si deux choses qu'on compare doivent être différentes, on écrira :
In the same way, there are '''<not>''' and '''</not>''' tags to take the opposite of a condition. If two things we compare must be different, we will write:


<pre>
<pre>
Line 1,030: Line 1,028:
</pre>
</pre>


Pour terminer, on pourrait se demander si les deux blocs '''choose''' de la règle qu'on vient d'étudier pourrait être regroupés en un seul.
To finish, we could wonder whether the two '''choose''' blocks of the rule we just studied could be combined in only one.


Un essai montre que non. Lorsqu'à l'intérieur d'un bloc '''<choose>...</choose>''' on trouve plusieurs blocs '''<when>...</when>''', le premier de ces blocs pour lequel la condition est réalisée voit les instructions du bloc '''<let>...</let>''' exécutées, et ensuite, les autres blocs '''<when>...</when>''' qui suivent ne sont pas traités. Les différents tests à l'intérieur des blocs '''<when>...</when>''' concernent des conditions exclusives que l'on traduit en langage algorithmique par ''SINON SI''. Il existe d'ailleurs la possibilité de mettre un bloc '''<otherwise>...</otherwise>''' pour préciser ce qui doit être fait lorsqu'aucune des conditions des différents blocs '''<when>...</when>''' n'est réalisée. Ce qui correspond en langage algorithmique au mot-clé ''SINON''.
A try shows that the answer is no. When inside a '''<choose>...</choose>''' block we find several '''<when>...</when>''' blocks, the first of these blocks for which the condition is true makes the instructions of '''<let>...</let>''' block executed, and then the other following '''<when>...</when>''' blocks are not processed. The various tests inside the '''<when>...</when>''' blocks relate to exclusive conditions that we translate into algorithmic language by ''ELSE IF''. There is also the possibility to put a '''<otherwise>...</otherwise>''' block to specify what must be done when none of the conditions of the various '''<when>...</when>''' blocks is true. It corresponds in algorithmic language to ''ELSE'' keyword.


The end of the rule:
La fin de la règle :


<pre>
<pre>
Line 1,059: Line 1,057:
</pre>
</pre>


do not present any new difficulty for understanding. We will send on output two lexical units each corresponding to the translation of the word, and to do this, we will use the new values ​​of attributes we just modified.
ne présente pas de nouvelle difficulté de compréhension. On va envoyer en sortie deux unités lexicales correspondants chacune à la traduction of a word, et pour le faire, on va utiliser les nouvelles valeurs des attributs que l'on a modifiés.


=== Writing only once instructions common to several rules ===
=== N'écrire qu'une fois des traitements communs à plusieurs règles ===


After writing a rule for a personal pronoun followed by a verb, we will add two others for a noun (subject in the sentence) followed by a verb and for a determinant, followed by a noun (subject), followed by a verb.
Après avoir écrit une règle pour un pronom personnel suivi d'un verbe, nous allons en rajouter 2 autres pour un nom (sujet dans la phrase) suivi d'un verbe et pour un article (déterminant), suivi d'un nom (sujet), puis d'un verbe.


Une première nouveauté est qu'on ne va pas se contenter de chercher des groupes de mots (déterminant, nom, verbe, adjectif, ...) mais qu'on rajoute une contrainte : le nom doit faire partie du sujet de la phrase. En Esperanto, un nom qui sert de sujet n'est pas terminé par la lettre ''n'' et dans son analyse, on trouvera la balise '''<nom>''' (nominatif) alors que pour un complément d'objet, on a la balise '''<acc>''' (accusatif).
A first innovation is that we will not only seek word groups (determinant, noun, verb, adjective, ...) but we add a constraint : the noun must belong to the subject of the sentence. In Esperanto, a noun used as the subject is not finished by letter ''n'' and in its analysis, we will find the '''<nom>''' (nominative) tag whereas for an object complement, we have the '''<acc>''' (accusative) tag.


Par ailleurs, les deux nouvelles règles ont un point commun avec la règle précédente : il faudra faire des transformations sur le temps du verbe qui ne s'écrit pas pareil dans tous les cas en Esperanto et en French. Mais cette transformation va être le même dans toutes les règles comprenant un verbe conjugué. Donc, autant ne l'écrire qu'à un seul endroit et l'utiliser autant de fois que nécessaire. Outre l'économie de code, un seul exemplaire sera plus facile à compléter pour rajouter les temps du conditionnel et du subjonctif ou n'importe quelle autre correction. En programmation, on utilise des ''fonctions'' pour définir des morceaux de codes utilisés à plusieurs endroits du programme. Pour les transfer rules, ce sont des ''macros''.
In addition, the two new rules have something in common with the previous rule: we will have to make changes to the tense of the verb which is not written the same in all cases in Esperanto and French. But this change will be the same one in every rule including a conjugated verb. So, better is to write in one place and to use it as often as necessary. Besides saving code, a single copy will be easier to complete to add tenses for conditional and subjunctive or any other correction. When programming, we use ''functions'' to define pieces of codes used in several places of the program. For transfer rules, these are ''macros''.


==== Définition d'un type de mot avec des attributs ====
==== Define a word type with attributes ====


Pour définir un nom possédant l'attribut '''<nom>''' dans ses balises, il suffit de rajouter une catégorie :
To define a noun having the attribute '''<nom>''' in its tags, we just have to add a category:


<pre>
<pre>
Line 1,079: Line 1,077:
</pre>
</pre>


La page [[Introduction aux transfer rules]] précise que le .* lorsqu'il n'est pas placé à la fin signifie "une seule balise". C'est la cas pour les analyses de la plupart des noms Esperanto qui n'ont pas de genre. Toutefois, il semble que cette définition fonctionne aussi avec 2 balises entre le '''n''' et le '''<nom>'''. Sinon, au pire, pour les noms possédant un genre (humains et animaux), on pourrait rajouter un deuxième '''cat-item''' :
The page [[A long introduction to transfer rules]] specifies that the .* when not placed at the end means "only one tag". This is the case for the analysis of most Esperanto nouns which do not have gender. However, it seems this definition also works with 2 tags between the '''n''' and the '''<nom>'''. Otherwise, at worst, for nouns having a gender (humans and animals), we could add a second '''cat-item''' :


<pre>
<pre>
Line 1,085: Line 1,083:
</pre>
</pre>


to specify 2 intermediate tags.
pour spécifier 2 balises intermédiaires.


==== Écriture d'une macro ====
==== Writing a macro ====


Maintenant, nous allons mettre dans une macro les opérations nécessaires au transfer du temps d'un verbe. Comme c'est notre première macro, il va falloir créer la section '''def-macros''' (qui est une section facultative) avec le contenu suivant :
Now, we will put inside a macro the operations necessary to the transfer of the tense of a verb. As it is our first macro, it will be necessary to create the '''def-macros''' section (which is an optional section) with the following contents:


<pre>
<pre>
<section-def-macros>
<section-def-macros>
<def-macro n="set_temps" npar="1"> <!-- concordance des temps -->
<def-macro n="set_temps" npar="1"> <!-- tenses concordance -->
<choose>
<choose>
<when>
<when>
Line 1,124: Line 1,122:
</pre>
</pre>


La seule vrai nouveauté est l'instruction : '''<def-macro n="set_temps" npar="1">''' :
The only true the innovation is the instruction: '''<def-macro n="set_temps" npar="1">''' :


Elle contient 2 informations :
It contains two informations:


{|class=wikitable
{|class=wikitable
! Paramètre !! Signification
! Paramètre !! Meaning
|-
|-
| n="set_temps" || le nom qu'on donne à la macro
| n="set_temps" || the name given to the macro
|-
|-
| npar="1" || le nombre de paramètres de la macro
| npar="1" || the number of parameters of the macro
|-
|-
|}
|}


Ensuite, le code est identique à celui qu'on avait écrit pour la règle pronom personnel + verbe, à part que dans cette règle, on précisait '''pos="2"''' (le verbe était le 2ème mot du pattern), alors qu'ici, on a '''pos="1"''' qui est le numéro du paramètre de la macro. Or cette macro n'a besoin que d'un paramètre de type verbe pour fonctionner.
Then, the code is identical to the one written for the rule personal pronoun + verb, except that in this rule, we specified '''pos="2"''' (the verb was the 2nd word of the pattern), whereas here, we have '''pos="1"''' which is the number of the parameter of the macro. And this macro only needs one parameter of verb type to work.


==== Règles de transfer utilisant la macro ====
==== Transfer Rules using the macro ====


Thus let us see how the macro is used in the previous rule (changed) and the two new rules:
Voyons donc comment est utilisée la macro dans la règle précédente (transformée) et les deux nouvelles règles :


<pre>
<pre>
Line 1,151: Line 1,149:


<action>
<action>
<choose> <!-- cas particuliers de transfers des pronoms -->
<choose> <!-- special cases for pronouns transfers -->
<when> <!-- 2ème personne toujours au pluriel : vi -> vous -->
<when> <!-- 2nd person always plural : vi -> vous -->
<test>
<test>
<equal>
<equal>
Line 1,164: Line 1,162:
</let>
</let>
</when>
</when>
<when> <!-- 3ème personne du pluriel toujours au masculin : ili -> ils -->
<when> <!-- 3rd person plural always masculine : ili -> ils -->
<test>
<test>
<and>
<and>
Line 1,285: Line 1,283:
</pre>
</pre>


In the two first rules corresponding to the following patterns:
Dans les deux premières règles correspondant aux patterns suivant :


<pre>
<pre>
Line 1,294: Line 1,292:
</pre>
</pre>


and
et


<pre>
<pre>
Line 1,303: Line 1,301:
</pre>
</pre>


on appelle la macro ainsi :
we call the macro as follows:


<pre>
<pre>
Line 1,311: Line 1,309:
</pre>
</pre>


alors que pour la dernière règle correspondant au pattern :
whereas for the last rule corresponding to the pattern:


<pre>
<pre>
Line 1,321: Line 1,319:
</pre>
</pre>


l'appel de la macro devient :
the macro call becomes:


<pre>
<pre>
Line 1,329: Line 1,327:
</pre>
</pre>


Dans chacun des 3 cas, la valeur de '''pos''' de la balise '''with-param''' correspond à la position du verbe dans le pattern. En procédant ainsi, on va transmettre à la macro toutes les informations concernant le verbe dans la source language et la target language.
For each of the three cases, the value of '''pos''' of the tag '''with-param''' corresponds to the position of the verb in the pattern. Doing like that, we will send the macro all the information about the verb in the source language and the target language.


Et si on voulait faire une macro avec plusieurs paramètres, il y aurait autant de balises '''with-param''' que de paramètres dans l'appel de cette nouvelle macro.
And if we wanted to make macro with several parameters, there would be as many '''with-param''' tags as parameters in the call for the new macro.


Le reste des deux dernières transfer rules n'offre pas de difficulté particulière :
The rest of the two last transfer rules does not include a particular difficulty:


* we generate the analysis of a determinant which agrees with the noun
* on génère l'analyse d'un déterminant accordé au nom
* then the one of the noun
* puis celle du nom


as we did it in the rules without a verb.
comme on le faisait dans les règles qui n'avaient pas de verbe.

Then, we generate the analysis of the verb, using the '''temps''' attribute updated in the macro. This verb is conjugated with the 3rd person with the number (singular or plural) of the subject noun in the sentence.

=== Using variables ===

To finish, we will examine a rule which requires to memorize a value into a variable.

This rule will translate a personal pronoun, followed by verb être (to be), followed by another verb to the past participle.

We already know how to process the pronoun followed by a verb, it was done in the paragraph [[Transfer_rules_examples#Changing_attributes_according_to_conditions|Changing attributes according to conditions]]. It will remain to put in concordance the past participle with the personal pronoun. But there is a problem :

* with 1st and the 2nd person, the personal pronoun must have the gender '''mf''' (masculine/féminine) to be generated,
* for the past participle, the authorized genders are ''' m ''' and ''' f ''' (masculine or féminine, but only one of these).

Consequently, we will not be able to always use the same tag for the gender of the personal pronoun and the gender of the past participle. The idea to do that is to build the gender of the past participle from the one of the personal pronoun and to use a variable to memorize the result.

Calculation of the gender of the past participle is the following:

<pre>
IF gender of pronoun = "mf" ALORS
genre_pp <- "m"
ELSE
genre_pp <- gender of pronoun
END IF
</pre>

The variable which memorizes the gender of the past participle is called ''genre_pp''. In the case of the personal pronoun used with 1st or 2nd person, it would be necessary to make a deep analysis to find (may be in a preceding sentence) the best gender to put the past participle in concordance. Apertium does not allow this kind of complex analysis. We will thus choose the masculine in this case. On the contrary, if the personal pronoun is used with the 3rd person, we will use its gender for the past participle.

A first thing to do is to declare the variable. For that, the '''def-vars''' section becomes :

<pre>
<section-def-vars>
<def-var n="genre_pp"/>
</section-def-vars>
</pre>

We did not yet write any rule using the verb être (to be) conjugated or past participle. It will thus be necessary to complete the section '''def-cats''' by adding the two declarations :

<pre>
<def-cat n="etre_conj">
<cat-item tags="vbser.pres"/>
<cat-item tags="vbser.past"/>
<cat-item tags="vbser.fti"/>
</def-cat>

<def-cat n="verbe_pp">
<cat-item tags="vbser.pp.*"/>
<cat-item tags="vblex.pp.*"/>
<cat-item tags="vbtr.pp.*"/>
<cat-item tags="vbntr.pp.*"/>
<cat-item tags="vbtr_ntr.pp.*"/>
</def-cat>
</pre>

The rule doing the required work is the following :

<pre>
<rule>
<pattern>
<pattern-item n="prn"/>
<pattern-item n="etre_conj"/>
<pattern-item n="verbe_pp"/>
</pattern>

<action>
<choose> <!-- particular case for pronouns transfers -->
<when> <!-- 2nd person allways plural : vi -> vous -->
<test>
<equal>
<clip pos="1" side="sl" part="personne"/>
<lit-tag v="p2"/>
</equal>
</test>
<let>
<clip pos="1" side="tl" part="nombre"/>
<lit-tag v="pl"/>
</let>
</when>
<when> <!-- 3rd person plural allways masculine : ili -> ils -->
<test>
<and>
<equal>
<clip pos="1" side="sl" part="personne"/>
<lit-tag v="p3"/>
</equal>
<equal>
<clip pos="1" side="sl" part="nombre"/>
<lit-tag v="pl"/>
</equal>
</and>
</test>
<let>
<clip pos="1" side="tl" part="genre"/>
<lit-tag v="m"/>
</let>
</when>
</choose>

<choose> <!-- if gender of the pronoun is mf, gender of the past participle will be m -->
<when>
<test>
<equal>
<clip pos="1" side="tl" part="genre"/>
<lit-tag v="mf"/>
</equal>
</test>
<let>
<var n="genre_pp"/>
<lit-tag v="m"/>
</let>
</when>
<otherwise>
<let>
<var n="genre_pp"/>
<clip pos="1" side="tl" part="genre"/>
</let>
</otherwise>
</choose>

<call-macro n="set_temps">
<with-param pos="2"/>
</call-macro>

<out>
<lu>
<clip pos="1" side="tl" part="lem"/>
<clip pos="1" side="tl" part="type_mot"/>
<clip pos="1" side="tl" part="personne"/>
<clip pos="1" side="tl" part="genre"/>
<clip pos="1" side="tl" part="nombre"/>
</lu>
<b />
<lu>
<clip pos="2" side="tl" part="lem"/>
<lit-tag v="vbser"/>
<clip pos="2" side="tl" part="temps"/>
<clip pos="1" side="tl" part="personne"/>
<clip pos="1" side="tl" part="nombre"/>
</lu>
<b />
<lu>
<clip pos="3" side="tl" part="lem"/>
<clip pos="3" side="tl" part="type_mot"/>
<lit-tag v="pp"/>
<var n="genre_pp"/>
<clip pos="1" side="tl" part="nombre"/>
</lu>
</out>
</action>
</rule>
</pre>

The really new part of the rule is this one :

<pre>
<choose> <!-- if gender of the pronoun is mf, gender of the past participle will be m -->
<when>
<test>
<equal>
<clip pos="1" side="tl" part="genre"/>
<lit-tag v="mf"/>
</equal>
</test>
<let>
<var n="genre_pp"/>
<lit-tag v="m"/>
</let>
</when>
<otherwise>
<let>
<var n="genre_pp"/>
<clip pos="1" side="tl" part="genre"/>
</let>
</otherwise>
</choose>
</pre>

It includes two assignments of values into the variable '''genre_pp''' :

<pre>
<let>
<var n="genre_pp"/>
<lit-tag v="m"/>
</let>
</pre>

allowing to put the tag '''<m>''' into '''genre_pp''',

<pre>
<let>
<var n="genre_pp"/>
<clip pos="1" side="tl" part="genre"/>
</let>
</pre>

allowing to put the gender of the personal pronoun into '''genre_pp'''.

We can also notice that the conditional processing performed uses for the first time the tags '''<otherwise>...</otherwise>''' .

The last thing to do is to use the variable '''genre_pp''' to generate the lexical unit for the past participle :

<pre>
<lu>
<clip pos="3" side="tl" part="lem"/>
<clip pos="3" side="tl" part="type_mot"/>
<lit-tag v="pp"/>
<var n="genre_pp"/>
<clip pos="1" side="tl" part="nombre"/>
</lu>
</pre>

It is the same instruction :

<pre>
<var n="genre_pp"/>
</pre>


that allows to initialise the variable or to access the value it countains.
Ensuite, on génère l'analyse du verbe, utilisant l'attribut '''temps''' mis à jour dans la macro. Ce verbe est conjugué à la 3ème personne avec le nombre (singulier ou pluriel) du nom sujet dans la phrase.


[[Category:Documentation in English]]
[[Category:Documentation in English]]
[[Category:Transfer]]

Latest revision as of 20:16, 26 June 2018

En français

This page is intended to supplement the page A long introduction to transfer rules. Examples used are taken from apertium-eo-fr pair. It is (at the beginning of 2013) a released pair for translating French to Esperanto. But Esperanto → French translation direction had not been implemented by the initial developer. It is another developer, full beginner for writing transfer rules who chose to do that. The examples given are the first rules written to translate a group of one, two or three Esperanto words into a group of two or three French words.

This page is only about writing the file with the suffix .t1x with rules intended to be used by the tool apertium-transfer. Writing tags used for chunking in a 3-stage transfer is not approached there.

Different steps for a translation with apertium[edit]

Let start by making a list of the different operations done for a translation.

Operation Role Concerned languages
Deformatting Allows to mark zones of the source text not to be translated. For example, HTML tags must not be translated in another language, but only the text of the Web page. The same software are used for every language pairs. It is the format of the data to be translated which will take to use a particular deformatter.
Analysis Each word of the source text is decomposed into a lemma followed by the type of the word and its attributes (gender, number, person and tense for a verb ...). For some words, several analyses are possible. In this case, they all are sent on output. Valid for every languages, it uses the morphological dictionary of the source language.
Disambiguation When there are several analysis for a word, this step permits to keep only one. Valid for every languages, it uses a file with .prob suffix
For non ambiguous languages as Esperanto, this step stays necessary to take off the surface form of each analysed word (pre-formatting for the transfer step).
Pre-transfer Processing multiwords before transfer step. All languages. Does not require a particular data file.
Transfer Transforms analyses from the source language into their translated version in the target language. Valid for every languages, it uses the bilingual dictionary and the transfer file with .t1x suffix.
Interchunk processing Allows processing on groups of words (the subject, a complement ...)
As indicated above, we will not deal with this step (nor of the following).
Used a priori to make the transfer step more simple, it needs to add several tags during the transfer step. It uses a file with .t2x suffix and eventually other files if several pass of this kind are done.
Postchunk End of interchunk processing(s) Needed if one or more interchunk processing were done. It uses a file with .t3x suffix.
Generation Generate the surface forms of the target language words from the decomposition in lemma + attributes obtained from the previous steps. Valid for every languages, it uses the morphological dictionary of the target language.
Post-generation Allows spelling corrections between following words when particular cases are not processed by the generation. Used in a lot of target languages (including French), may be not for all.
Reformatting Put the translated data back to the format of the source document. The same software are used for every language pairs. There is a reformatter for each available deformatter even in every reformatter do a similar work.

The page Preparing to use apertium-transfer-tools gives an example about how a Spanish sentence is changed at every step of the process to lead finally to an English translation.

How to find what must be done[edit]

Basically, the transfer step starts from a disambiguated analysis of the source language text to provide an equivalent in the target language. The generation step then does the inverse processing as the analysis. It has a consequence : data given to generator must be exactly what a new analysis of the text translated in the target language would give. Otherwise, the generation will be only partial with some # appearing at the beginning of some words that will be written as lemmas.

Example :

We want to translate in French the 3 Esperanto words :

la aŭtomata traduko

After analysis and disambiguation, we get :

^la<det><def><sp>$ ^aŭtomata<adj><sg><nom>$ ^traduko<n><sg><nom>$

A lexical transfer step (using only the bilingual dictionary) will give :

^le<det><def><sp>$ ^automatique<adj><sg><nom>$ ^traduction<n><f><sg><nom>$

The part of sentence we want to get in French is :

la traduction automatique

When analysing this part of sentence, we get :

^le<det><def><f><sg>$ ^traduction<n><f><sg>$ ^automatique<adj><mf><sg>$

which is the text we must give to the generator to get the desired translation.

So, during the structural transfer step, we will have to do the following changes :

Origin :

^le<det><def><sp>$ ^automatique<adj><sg><nom>$ ^traduction<n><f><sg><nom>$

Result :

^le<det><def><f><sg>$ ^traduction<n><f><sg>$ ^automatique<adj><mf><sg>$

For this, we write the transfer rules. Their goal is to add or remove several tags in words descriptions, and possibly to change the order of certain words.

Structure of a .t1x file[edit]

The file containing transfer rules has the suffix .t1x . This file is made of several mandatory sections and can also contain other optional sections. Each section will have to contain at least one element.

 <?xml version="1.0" encoding="UTF-8"?>
 <transfer>
    <section-def-cats>
       ..........
    </section-def-cats>

    <section-def-attrs>
       ..........
    </section-def-attrs>

    <section-def-vars>
       ..........
    </section-def-vars>

    <section-def-macros>
       ..........
    </section-def-macros>

    <section-rules>
       ..........
    </section-rules>
 </transfer>

def-cats section[edit]

The def-cats section is mandatory. It allows to declare categories of word that we will fetch to apply a particular transfer rule. It can be simple words (a determinant, a noun, an adjective, a verb, ...) or a little more complicated things as a noun with in its description the tag <nom> (nominative) meaning it is part of the subject of the sentence.

This section contains one or more element with the following structure :

    <def-cat n="name_of_what_we_want_to_describe">
      <cat-item tags="its_description"/>
      .... (there can be one or more <cat-item .../> tags)
    </def-cat>

def-attrs section[edit]

The def-attrs section is mandatory. It allows to put together by functionality attribute names for words defined in the section sdefs of a morphological dictionary. For example, we will put together in this section every tag corresponding to the :

  • gender of a word
  • number of a word (singular, plural, ...)
  • person of a verb
  • tense of a verb
  • ...

This section contains one or more element with the following structure :

    <def-attr n="name_of_a_list_of_attributes_with_a_common_rule">
      <attr-item tags="an_attribute_of_the_sdef_section_of_a_dictionary"/>
      .... (we have several tags <attr-item .../> as many as possible
            values for the attribute)
    </def-attr>

def-vars section[edit]

The def-vars section is mandatory and must contain at least one element with the following syntax <def-var n="..."/> . It lists the global variables used in the transfer rules. However, for the rules described in this page, we will not need any of these variables.

def-macros section[edit]

The def-macros section is optional. Nevertheless, it will be very useful to write shorter transfer files avoiding to duplicate identical (or almost) operations done in several transfer rules.

This section contains one or more element with the following structure :

    <def-macro n="name_of_the_macro" npar="number_of_parameters">
      .... (the code of the macro)
    </def-macro>

rules section[edit]

Finally, the rules section is mandatory. It is the longest of the transfer file and the one that justifies its existence. It indeed makes it possible to define the operations to be performed to translate groups of words (or sometimes single words, as we will see).

This section contains one or more element with the following structure :

    <rule>
      <pattern>
        <pattern-item n="name_defined_in_def-cat_corresponding_to_the_first_word_to_process"/>
        .... (as many tags <pattern-item ..../> as words we want to process together)
      </pattern>
      <action>
        .... (description of the transfer rule)
      </action>
    </rule>

Examples of transfer rules[edit]

Transferring two words making them agree[edit]

We will start to translate to French the Esperanto determinant la followed by a common noun.

Search for modifications[edit]

In Esperanto, the definite determinant la is invariant, while in French, it has three forms: le, la, les according to gender and number of the noun to which it agrees.

For the common noun, there are two forms in Esperanto depending on whether it belongs to the subject or to the object complement in the sentence. In French, it is written the same way in both cases.

Examples :

Esperanto Esperanto analyses French French analyses
la tago
la tagon
^la<det><def><sp>$ ^tago<n><sg><nom>$
^la<det><def><sp>$ ^tago<n><sg><acc>$
le jour ^le<det><def><m><sg>$ ^jour<n><m><sg>$
la nokto
la nokton
^la<det><def><sp>$ ^nokto<n><sg><nom>$
^la<det><def><sp>$ ^nokto<n><sg><acc>$
la nuit ^le<det><def><f><sg>$ ^nuit<n><f><sg>$
la tagoj
la tagojn
^la<det><def><sp>$ ^tago<n><pl><nom>$
^la<det><def><sp>$ ^tago<n><pl><acc>$
les jours ^le<det><def><mf><pl>$ ^jour<n><m><pl>$
la noktoj
la noktojn
^la<det><def><sp>$ ^nokto<n><pl><nom>$
^la<det><def><sp>$ ^nokto<n><pl><acc>$
les nuits ^le<det><def><mf><pl>$ ^nuit<n><f><pl>$

Let examine what the lexical translation of the Esperanto analysis gives and compare it to the analysis in French we wants to submit to the generator:

Esperanto analyses Esperanto analyses translated in French The analyses in French of what we want to get
^la<det><def><sp>$ ^tago<n><sg><nom>$
^la<det><def><sp>$ ^tago<n><sg><acc>$
^le<det><def><sp>$ ^jour<n><m><sg><nom>$
^le<det><def><sp>$ ^jour<n><m><sg><acc>$
^le<det><def><m><sg>$ ^jour<n><m><sg>$
^la<det><def><sp>$ ^nokto<n><sg><nom>$
^la<det><def><sp>$ ^nokto<n><sg><acc>$
^le<det><def><sp>$ ^nuit<n><f><sg><nom>$
^le<det><def><sp>$ ^nuit<n><f><sg><acc>$
^le<det><def><f><sg>$ ^nuit<n><f><sg>$
^la<det><def><sp>$ ^tago<n><pl><nom>$
^la<det><def><sp>$ ^tago<n><pl><acc>$
^le<det><def><sp>$ ^jour<n><m><pl><nom>$
^le<det><def><sp>$ ^jour<n><m><pl><acc>$
^le<det><def><m><pl>$ ^jour<n><m><pl>$
^la<det><def><sp>$ ^nokto<n><pl><nom>$
^la<det><def><sp>$ ^nokto<n><pl><acc>$
^le<det><def><sp>$ ^nuit<n><f><sg><nom>$
^le<det><def><sp>$ ^nuit<n><f><pl><acc>$
^le<det><def><f><pl>$ ^nuit<n><f><pl>$

We can note :

  • for the determinant, the lexical translation always gives ^le<det><def><sp>$ . It will be necessary to replace the last tag <sp> (singular or plural) by tags used by the common noun giving its gender and number.
  • for the common noun, the lexical translation found (in the bilingual dictionary) the gender of the noun translated to French. To know if this noun is singular or plural, it kept the number attribute of the original language. But the attribute <nom> or <acc> which is not needed in French was also kept and it can prevent to generate the word. So, this attribute will have to be removed by the transfer rule.

Writing the transfer rule[edit]

For this first rule, we start from a "empty" file with .t1x suffix having the structure described here.

As the def-macros section is optional and not used for the first transfer rules described in this page, we will not put it for the present.

The def-vars section is mandatory. Although it will never be used in the examples this page, we will just put a minimum content so that the file .t1x can be compiled:

  <section-def-vars>
    <def-var n="aucune_variable"/>
  </section-def-vars>

The other sections may contain useful information for our first transfer rule.

def-cats section[edit]

In this section, we will define 2 word categories :

  • determinants written as det which are identified in analysis by the tag <det> followed by anything.
  • common noun written as nom_commun which are identified in analysis by the tag <n> followed by anything.

The def-cats section will be written as follow :

  <section-def-cats>
    <def-cat n="det">
      <cat-item tags="det.*"/>
    </def-cat>

    <def-cat n="nom_commun">
      <cat-item tags="n.*"/>
    </def-cat>
  </section-def-cats>
  • names of word categories are in the attribute n of <def-cat n="..."> tags
  • descriptions of what must be found into analysis to recognize the word category are in the attribute tags of <cat-item tags="..."/> tags.
def-attrs section[edit]

Now we will define possible attributes to the various tags of words

  <section-def-attrs>
    <def-attr n="type_mot">
      <attr-item tags="n"/>
      <attr-item tags="det"/>
    </def-attr>

    <def-attr n="genre">
      <attr-item tags="m"/>
      <attr-item tags="f"/>
      <attr-item tags="mf"/>
    </def-attr>

    <def-attr n="nombre">
      <attr-item tags="sg"/>
      <attr-item tags="pl"/>
      <attr-item tags="sp"/>
    </def-attr>
  </section-def-attrs>
  • In the n attribute of tags <def-attr n="...">, we give a name to the various characteristics of the words we want to process
  • for each of these characteristics, <attr-item tags="..."/> tags indicate the different possible values ​​of this characteristic.

For the rule we want to write, we defined 3 characteristics :

  • type_mot (may be mandatory, but there is no documented alternative solution). Presently, the available types are:
    • n (common noun)
    • det (determinant)
We will add some others later when we will write other rules.
  • genre with the possible values
    • m (masculine)
    • f (feminine)
    • mf (masculine or feminine)
  • nombre with the possible values
    • sg (singular)
    • pl (plural)
    • sp (singular or plural)
rules section[edit]

A rules section containing only the rule we want to write will contain:

  <section-rules>
    <rule>
      <pattern>
        <pattern-item n="det"/>
        <pattern-item n="nom_commun"/>
      </pattern>
      <action>
        <out>
          <lu>
            <clip pos="1" side="tl" part="lem"/>
            <clip pos="1" side="tl" part="type_mot"/>
            <lit-tag v="def"/>
            <clip pos="2" side="tl" part="genre"/>
            <clip pos="2" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="2" side="tl" part="lem"/>
            <clip pos="2" side="tl" part="type_mot"/>
            <clip pos="2" side="tl" part="genre"/>
            <clip pos="2" side="tl" part="nombre"/>
          </lu>
        </out>
      </action>
    </rule>
  </section-rules>

The rule is made of 2 sections :

      <pattern>
        <pattern-item n="det"/>
        <pattern-item n="nom_commun"/>
      </pattern>

In this part, we specifies which are the successive categories of words that must be found in the analysis of the source text so that the rule can apply. In this case, we will have to find a determinant, followed of a common noun. The attributes of <pattern-item n="..."/> tags must all have been defined in the def-cats section, otherwise the rule could never be applied.

The most interesting part of the rule is starting from the <action> tag. It has the following structure:

      <action>
        <out>
          <lu>
            ... (generation of the lexical unit for the first word)
          </lu>
          <b />
          <lu>
            ... (generation of the lexical unit for the second word)
          </lu>
        </out>
      </action>

In this rule, we only generate data that we send on output. The contents of <action> tag is therefore limited to the generation of the text that is indicated in <out> tag.

We will have to generate the analysis of 2 words in the target language. Analysis of each word is a lexical unit]] (<lu> tag) which on output will be symbolized by the characters ^...$ where the description of the lexical unit will replace the dotted lines.

Between the two lexical units, we will leave a space ( tag) otherwise, the two words generated would be stick.

Let us examine how lexical units are written :

The first tag <clip pos="1" side="tl" part="lem"/> has element by element the following meaning:

Part Meaning
clip This is a keyword which can be translated by "get"
pos="1" It is the number of the pattern-item in the list <pattern>...</pattern> of the rule. Here, pos="1" corresponds to the analysis of the determinant
side="tl" We get the information from the target language. To access to the source language, we would write side="sl"
part="lem" This is a reserved keyword corresponding to the lemma.

The third <lit-tag v="def"/> tag has element by element the following meaning:

Part Meaning
lit-tag This is a keyword which can be translated by "generate a tag"
v="def" Here we specify the contents of the tag. In this case, <def> will be generated.

The 5 instruction necessary to generate the analysis of the determinant have the following meaning:

Instruction Meaning
<clip pos="1" side="tl" part="lem"/> Get the lemma of the first word of the pattern in the target language. It will always be French article "le".
<clip pos="1" side="tl" part="type_mot"/> Get the type of the first word of the pattern in the target language. It will be det.
<lit-tag v="def"/> Generate a def tag, that is the text <def> which specifies that the determinant is defined.
<clip pos="2" side="tl" part="genre"/> Get the gender of the second word of the pattern in the target language, that is the gender of the common noun.
<clip pos="2" side="tl" part="nombre"/> Get the number of the second word of the pattern in the target language, that is the number of the common noun.

The 5 elements we got constitute constitutes the lexical unit <lu>...</lu> that will be sent on output using the tag <out>...</out>

For the second lexical unit corresponding to the common noun translation, we can notice that we have on each line : pos="2" side="tl" meaning that we will simply copy several tags of the common noun (2nd word of the rule).

Detailed explanation of the four instructions:

Instruction Meaning
<clip pos="2" side="tl" part="lem"/> Get the lemma of the second word of the pattern in the target language (the common noun in French).
<clip pos="2" side="tl" part="type_mot"/> Get the type of the second word. That will be n.
<clip pos="2" side="tl" part="genre"/> Get the gender of the common noun.
<clip pos="2" side="tl" part="nombre"/> Get the number of the common noun.
Note[edit]

If we send to the generator the result of the of the transfer, we don't get exactly what is needed :

French analysis Result of the generation What is needed
^le<det><def><m><sg>$ ^jour<n><m><sg>$ ~le jour le jour
^le<det><def><f><sg>$ ^nuit<n><f><sg>$ ~la nuit la nuit
^le<det><def><mf><pl>$ ^jour<n><m><pl>$ ~les jours les jours
^le<det><def><mf><pl>$ ^nuit<n><f><pl>$ ~les nuits les nuits
^le<det><def><m><sg>$ ^arbre<n><m><sg>$ ~le arbre l'arbre
^le<det><def><f><sg>$ ^histoire<n><f><sg>$ ~la histoire l'histoire
^le<det><def><m><pl>$ ^arbre<n><m><pl>$ ~les arbres les arbres
^le<det><def><f><pl>$ ^histoire<n><f><pl>$ ~les histoires les histoires

The replacement of the determinant le/la by l' according to the first letter of the following word is not done during the generation but just after during the post-generation step which process the words marked by a ~ . This remark being done, the post-generation will not be mentioned again in this page.

Adding a word in the target language text[edit]

Esperanto does not have any indefinite determinant. To translate un, une, des, we simply do not put the definite determinant la before the common noun. A common noun written alone in Esperanto will have to be preceded by the correct indefinite determinant un, une or des, if it is translated in French.

Our second rule will make this transformation.

Let examine what gives the lexical translation of the Esperanto analysis and compare it to the analysis in French we want to submit to the generator:

Esperanto analysis Esperanto analysis translated to French French analysis that we want to get
^tago<n><sg><nom>$
^tago<n><sg><acc>$
^jour<n><m><sg><nom>$
^jour<n><m><sg><acc>$
^un<det><ind><m><sg>$ ^jour<n><m><sg>$
^nokto<n><sg><nom>$
^nokto<n><sg><acc>$
^nuit<n><f><sg><nom>$
^nuit<n><f><sg><acc>$
^un<det><ind><f><sg>$ ^nuit<n><f><sg>$
^tago<n><pl><nom>$
^tago<n><pl><acc>$
^jour<n><m><pl><nom>$
^jour<n><m><pl><acc>$
^un<det><ind><m><pl>$ ^jour<n><m><pl>$
^nokto<n><pl><nom>$
^nokto<n><pl><acc>$
^nuit<n><f><sg><nom>$
^nuit<n><f><pl><acc>$
^un<det><ind><f><pl>$ ^nuit<n><f><pl>$

Compared to the previous rule, instead of generating ^le<det><def><gender><number>$ we will generate ^un<det><ind><gender><number>$. Everything else is unchanged.

To write the new rule, we already have all what we need in def-cats and def-attrs sections . So, we will just have to add the new rule in the rules section that will become:

  <section-rules>
    <rule>
      <pattern>
        <pattern-item n="det"/>
        <pattern-item n="nom_commun"/>
      </pattern>
      <action>
        ... (see the contents in the preceding paragraph)
      </action>
    </rule>

    <rule>
      <pattern>
        <pattern-item n="nom_commun"/>
      </pattern>
      <action>
        <out>
          <lu>
            <lit v="un"/>
            <lit-tag v="det.ind"/>
            <clip pos="1" side="tl" part="genre"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="1" side="tl" part="lem"/>
            <clip pos="1" side="tl" part="type_mot"/>
            <clip pos="1" side="tl" part="genre"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
        </out>
      </action>
    </rule>

In this new rule, we find for the first time the instruction lit that will generate a string, contrarily to lit-tag which includes the generated string inside < > so that it becomes a tag.

As in the text of the source language to be transferred, there is only one word (the common noun mentioned in the pattern), we can access its attributes by pos="1" whereas it was pos="2" in the first rule.

The 4 instructions needed to generate the analysis of the indefinite determinant have the following meaning:

Instruction Meaning
<lit v="un"/> Generate the lemma "un".
<lit-tag v="det.ind"/> Generate a det tag followed by a ind tag, that is the text <det><ind> which makes it possible to specify that we generate a indefinite determinant.
<clip pos="1" side="tl" part="genre"/> Get the gender of the common noun.
<clip pos="1" side="tl" part="nombre"/> Get the number of the common noun.

The instructions to generate the translation in French of the common noun are the same ones as for the previous rule, except that now pos="1".

Interchange two words[edit]

Now we will see a rule to change the order of two words during a translation.

In Esperanto, it is recommended to put the adjective before the noun but it is not mandatory. The Apertium Spanish -> Esperanto translator preserves the word order of the Spanish sentence whereas The Apertium French -> Esperanto translator puts the adjective before the noun.

In French, most of the adjectives are placed after the noun they qualify, but some adjectives are placed before.

The complete solution would process all the possible cases in Esperanto as in French. We will limit ourselves to the most frequent case by writing a rule which starting from a form "la" + adjective + noun in Esperanto, provides a translation such as "le/la/les" + noun + adjective in French.

To be added in the def-cats section[edit]

In this section, we will add a category for the adjectives:

    <def-cat n="adj">
      <cat-item tags="adj.*"/>
    </def-cat>

To be added in the def-attrs section[edit]

In the words type list (type_mot), we add adjectives:

    <def-attr n="type_mot">
      <attr-item tags="n"/>
      <attr-item tags="det"/>
      <attr-item tags="adj"/>
    </def-attr>

Adding the rule which will invert the adjective and the noun[edit]

    <rule>
      <pattern>
        <pattern-item n="det"/>
        <pattern-item n="adj"/>
        <pattern-item n="nom_commun"/>
      </pattern>

      <action>
        <out>
          <lu>
            <clip pos="1" side="tl" part="lem"/>
            <clip pos="1" side="tl" part="type_mot"/>
            <lit-tag v="def"/>
            <clip pos="3" side="tl" part="genre"/>
            <clip pos="3" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="3" side="tl" part="lem"/>
            <clip pos="3" side="tl" part="type_mot"/>
            <clip pos="3" side="tl" part="genre"/>
            <clip pos="3" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="2" side="tl" part="lem"/>
            <clip pos="2" side="tl" part="type_mot"/>
            <clip pos="3" side="tl" part="genre"/>
            <clip pos="3" side="tl" part="nombre"/>
          </lu>
        </out>
      </action>
    </rule>

We can note that in this rule we generate first the determinant (pos = 1), then the noun (pos = 3 in the pattern) and finally the adjective (pos = 2 in the pattern). To swap two words, we only needed to generate the lexical units <lu>...</lu> in a different order.

In this rule, the determinant and the adjective agree in gender and number with the noun.

Changing attributes according to conditions[edit]

Now, we will examine a rule to translate a personal pronoun followed by a verb applying the conjugation rules.

Searching modifications to be made[edit]

  • In Esperanto, the verb is invariant according to the personal pronoun witch is just before ((or more generally according to the subject).
  • In French, the verb agrees with the person and the number of the personal pronoun (but not with its gender).

In addition, some of the French personal pronouns have no specific equivalent in Esperanto which is for this point like English:

  • tu (second person singular) and vous (second person plural) in French are both translated by vi in Esperanto.
  • ils and elles (masculine and feminine forms of the 3rd person plural) are translated by ili in Esperanto.

To translate from Esperanto to French, we will then have to make choices:

  • vivous second person plural or polite form to speak to a single person
  • iliils we choose the masculine for the 3rd person plural in French.

Similarly, Esperanto has only one tense for the past where French has four. In addition, in an analysis, Esperanto and French dictionaries do not use the same abbreviation for the present indicative. It will thus be necessary to change all that during the translation.

We will see what all this gives for the verb kantichanter conjugated in the present indicative.

Esperanto Esperanto analyses Esperanto analyses translated The analysis we would like to get French
mi kantas ^prpers<prn><subj><p1><mf><sg>$
^kanti<vbtr_ntr><pres>$
^prpers<prn><p1><mf><sg>$
^chanter<vblex><pres>$
^prpers<prn><p1><mf><sg>$
^chanter<vblex><pri><p1><sg>$
je chante
vi kantas ^prpers<prn><subj><p2><mf><sp>$
^kanti<vbtr_ntr><pres>$
^prpers<prn><p2><mf><sp>$
^chanter<vblex><pres>$
^prpers<prn><p2><mf><pl>$
^chanter<vblex><pri><p2><pl>$
tu chantes →
vous chantez
li kantas ^prpers<prn><subj><p3><m><sg>$
^kanti<vbtr_ntr><pres>$
^prpers<prn><p3><m><sg>$
^chanter<vblex><pres>$
^prpers<prn><p3><m><sg>$
^chanter<vblex><pri><p3><sg>$
il chante
ŝi kantas ^prpers<prn><subj><p3><f><sg>$
^kanti<vbtr_ntr><pres>$
^prpers<prn><p3><f><sg>$
^chanter<vblex><pres>$
^prpers<prn><p3><f><sg>$
^chanter<vblex><pri><p3><sg>$
elle chante
ni kantas ^prpers<prn><subj><p1><mf><pl>$
^kanti<vbtr_ntr><pres>$
^prpers<prn><p1><mf><pl>$
^chanter<vblex><pres>$
^prpers<prn><p1><mf><pl>$
^chanter<vblex><pri><p1><pl>$
nous chantons
vi kantas ^prpers<prn><subj><p2><mf><sp>$
^kanti<vbtr_ntr><pres>$
^prpers<prn><p2><mf><sp>$
^chanter<vblex><pres>$
^prpers<prn><p2><mf><pl>$
^chanter<vblex><pri><p2><pl>$
vous chantez
ili kantas ^prpers<prn><subj><p3><mf><pl>$
^kanti<vbtr_ntr><pres>$
^prpers<prn><p3><mf><pl>$
^chanter<vblex><pres>$
^prpers<prn><p3><m><pl>$
^chanter<vblex><pri><p3><pl>$
ils chantent
(elles chantent)

Writing the transfer rule[edit]

To be added in the def-cats section[edit]

In this section, we will add a category for pronouns and a category for verbs:

    <def-cat n="prn">
      <cat-item tags="prn.*"/>
    </def-cat>

    <def-cat n="verbe">
      <cat-item tags="vbser.*"/>
      <cat-item tags="vblex.*"/>
      <cat-item tags="vbtr.*"/>
      <cat-item tags="vbntr.*"/>
      <cat-item tags="vbtr_ntr.*"/>
    </def-cat>

As there are in Esperanto many forms for verbs, we put several cat-item to list all of them.

To be added in the def-attrs section[edit]

According to verbs, different keywords are used in Esperanto, whereas in French, almost all the verbs are classified vblex.

In the words type list (type_mot), we add verbs (several possibilities) and pronouns:

    <def-attr n="type_mot">
      .......... (what there was before)
      <attr-item tags="prn"/>
      <attr-item tags="vblex"/>
      <attr-item tags="vbmod"/>
      <attr-item tags="vbser"/>
      <attr-item tags="vbhaver"/>
    </def-attr>

We also add the two categories personne and temps for the conjugation of verbs:

    <def-attr n="personne">
      <attr-item tags="p1"/>
      <attr-item tags="p2"/>
      <attr-item tags="p3"/>
    </def-attr>

    <def-attr n="temps">
      <attr-item tags="pres"/>
      <attr-item tags="past"/>
      <attr-item tags="pri"/>
      <attr-item tags="pii"/>
      <attr-item tags="fti"/>
    </def-attr>

Before writing the rules section, some changes are needed for the verb tenses and for the gender and number of pronouns.

Transformation for the tense[edit]

For this example, we will limit to the indicative tenses.

In Esperanto, there are 3 indicative tenses:

  • the past : past
  • the present : pres
  • the future : fti

In French, there are 6 more or less common tenses for the indicative:

  • the imparfait :pii
  • the passé simple (simple past) : ifi
  • the passé composé (compound past) that should be made with the verb avoir (to have) + the past participle.
  • the plus que parfait (plus perfect) (same problem as for the passé composé)
  • the present : pri
  • the future : fti

For verbs at the future, the attribute fti can be kept unchanged

For verbs at the present, it will be necessary to replace the attribute pres used in Esperanto by pri.

For verbs at the past, compound past should be nice for a translation, but less easy to generate. For this example we will replace the past attribute used in Esperanto by pii (imparfait).

In algorithmic form, that makes the following conditional transformations:

 IF temps = "pres" THEN
     temps <- "pri"
 ELSE IF temps = "past" THEN
     temps <- "pii"
 END IF
Transformation of the pronoun attributes[edit]

For the pronoun, we will do the following changes:

 IF personne = "p2" THEN
     nombre <- "pl"
 ELSE IF (personne = "p3" AND nombre = "pl" THEN
     genre <- "m"
 END IF
rules section[edit]

The new rule has the following contents:

    <rule>
      <pattern>
        <pattern-item n="prn"/>
        <pattern-item n="verbe"/>
      </pattern>

      <action>

        <choose>
          <when>
            <test>
               <equal>
                  <clip pos="2" side="sl" part="temps"/>
                  <lit-tag v="pres"/>
                </equal>
            </test>
            <let>
              <clip pos="2" side="tl" part="temps"/>
              <lit-tag v="pri"/>
            </let>
          </when>
          <when>
            <test>
               <equal>
                  <clip pos="2" side="sl" part="temps"/>
                  <lit-tag v="past"/>
                </equal>
            </test>
            <let>
              <clip pos="2" side="tl" part="temps"/>
              <lit-tag v="pii"/>
            </let>
          </when>
        </choose>

        <choose>    <!--  special cases for pronouns transfers  -->
          <when>    <!--  2nd person always plural : vi -> vous  -->
            <test>
               <equal>
                  <clip pos="1" side="sl" part="personne"/>
                  <lit-tag v="p2"/>
                </equal>
            </test>
            <let>
              <clip pos="1" side="tl" part="nombre"/>
              <lit-tag v="pl"/>
            </let>
          </when>
          <when>    <!--  3rd person plural always masculine : ili -> ils  -->
            <test>
               <and>
                 <equal>
                    <clip pos="1" side="sl" part="personne"/>
                    <lit-tag v="p3"/>
                  </equal>
                 <equal>
                    <clip pos="1" side="sl" part="nombre"/>
                    <lit-tag v="pl"/>
                  </equal>
                </and>
            </test>
            <let>
              <clip pos="1" side="tl" part="genre"/>
              <lit-tag v="m"/>
            </let>
          </when>
        </choose>

        <out>
          <lu>
            <clip pos="1" side="tl" part="lem"/>
            <clip pos="1" side="tl" part="type_mot"/>
            <clip pos="1" side="tl" part="personne"/>
            <clip pos="1" side="tl" part="genre"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="2" side="tl" part="lem"/>
            <clip pos="2" side="tl" part="type_mot"/>
            <clip pos="2" side="tl" part="temps"/>
            <clip pos="1" side="tl" part="personne"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
        </out>
      </action>
    </rule>

For the first time, the action part of the rule does not limit to a block <out>...</out>, but starts with two choose blocks each having the following structure:

        <choose>
          <when>
            <test>
               .... (a condition)
            </test>
            <let>
               .... (action if this condition is true)
            </let>
          </when>
          <when>
            <test>
               .... (alternative to the previous condition)
            </test>
            <let>
               .... (action if the alternative condition is true)
            </let>
          </when>
        </choose>

Let us examine in detail the first block <when>...</when>

          <when>
            <test>
               <equal>
                  <clip pos="2" side="sl" part="temps"/>
                  <lit-tag v="pres"/>
                </equal>
            </test>
            <let>
              <clip pos="2" side="tl" part="temps"/>
              <lit-tag v="pri"/>
            </let>
          </when>

We start from inside the tags, then we will go up towards the including tags.

Instruction Meaning
<clip pos="2" side="sl" part="temps"/> Get the "temps" attribute of the 2nd word concerned with the rule (that is the verb) from the source language side
<lit-tag v="pres"/> Generate a pres tag
<equal>...</equal> Check if the 2 preceding values are equal
<test>...</test> Decide if the block of instruction just afterwards must be executed.

Then, here is what is done when the test condition is true :

Instruction Meaning
<clip pos="2" side="tl" part="temps"/> Get (or access to) the "temps" attribute of the 2nd word concerned with the rule (that is the verb) from the target language side
<lit-tag v="pri"/> Generate a pri tag
<let>...</let> Seems to be an assignment of the second value into the first one

By the same way, the second block <when>...</when>

          <when>
            <test>
               <equal>
                  <clip pos="2" side="sl" part="temps"/>
                  <lit-tag v="past"/>
                </equal>
            </test>
            <let>
              <clip pos="2" side="tl" part="temps"/>
              <lit-tag v="pii"/>
            </let>
          </when>

tests whether the tense of the verb is "past" and in this case gives it the value "pii" for the target language.

Inside the conditional instructions for the pronoun, there is a more complicated test block:

            <test>
               <and>
                 <equal>
                    <clip pos="1" side="sl" part="personne"/>
                    <lit-tag v="p3"/>
                  </equal>
                 <equal>
                    <clip pos="1" side="sl" part="nombre"/>
                    <lit-tag v="pl"/>
                  </equal>
                </and>
            </test>

inside the block <and>...</and>, there are two blocks <equal>...</equal> (there could be more ) and the condition is true if the two equalities are simultaneously verified : in this case "p3" for the attribute personne and "pl" for the attribute nombre.

In other rules, we could also find <or>...</or> blocks for which the condition is true if at least one of the conditions inside the block is.

In the same way, there are <not> and </not> tags to take the opposite of a condition. If two things we compare must be different, we will write:

     <not>
       <equal>
          ......
       </equal>
     </not>

To finish, we could wonder whether the two choose blocks of the rule we just studied could be combined in only one.

A try shows that the answer is no. When inside a <choose>...</choose> block we find several <when>...</when> blocks, the first of these blocks for which the condition is true makes the instructions of <let>...</let> block executed, and then the other following <when>...</when> blocks are not processed. The various tests inside the <when>...</when> blocks relate to exclusive conditions that we translate into algorithmic language by ELSE IF. There is also the possibility to put a <otherwise>...</otherwise> block to specify what must be done when none of the conditions of the various <when>...</when> blocks is true. It corresponds in algorithmic language to ELSE keyword.

The end of the rule:

       ......
        <out>
          <lu>
            <clip pos="1" side="tl" part="lem"/>
            <clip pos="1" side="tl" part="type_mot"/>
            <clip pos="1" side="tl" part="personne"/>
            <clip pos="1" side="tl" part="genre"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="2" side="tl" part="lem"/>
            <clip pos="2" side="tl" part="type_mot"/>
            <clip pos="2" side="tl" part="temps"/>
            <clip pos="1" side="tl" part="personne"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
        </out>
      </action>
    </rule>

do not present any new difficulty for understanding. We will send on output two lexical units each corresponding to the translation of the word, and to do this, we will use the new values ​​of attributes we just modified.

Writing only once instructions common to several rules[edit]

After writing a rule for a personal pronoun followed by a verb, we will add two others for a noun (subject in the sentence) followed by a verb and for a determinant, followed by a noun (subject), followed by a verb.

A first innovation is that we will not only seek word groups (determinant, noun, verb, adjective, ...) but we add a constraint : the noun must belong to the subject of the sentence. In Esperanto, a noun used as the subject is not finished by letter n and in its analysis, we will find the <nom> (nominative) tag whereas for an object complement, we have the <acc> (accusative) tag.

In addition, the two new rules have something in common with the previous rule: we will have to make changes to the tense of the verb which is not written the same in all cases in Esperanto and French. But this change will be the same one in every rule including a conjugated verb. So, better is to write in one place and to use it as often as necessary. Besides saving code, a single copy will be easier to complete to add tenses for conditional and subjunctive or any other correction. When programming, we use functions to define pieces of codes used in several places of the program. For transfer rules, these are macros.

Define a word type with attributes[edit]

To define a noun having the attribute <nom> in its tags, we just have to add a category:

    <def-cat n="nom_sujet">
      <cat-item tags="n.*.nom"/>
    </def-cat>

The page A long introduction to transfer rules specifies that the .* when not placed at the end means "only one tag". This is the case for the analysis of most Esperanto nouns which do not have gender. However, it seems this definition also works with 2 tags between the n and the <nom>. Otherwise, at worst, for nouns having a gender (humans and animals), we could add a second cat-item :

      <cat-item tags="n.*.*.nom"/>

to specify 2 intermediate tags.

Writing a macro[edit]

Now, we will put inside a macro the operations necessary to the transfer of the tense of a verb. As it is our first macro, it will be necessary to create the def-macros section (which is an optional section) with the following contents:

  <section-def-macros>
    <def-macro n="set_temps" npar="1">    <!--  tenses concordance  -->
      <choose>
        <when>
          <test>
             <equal>
                <clip pos="1" side="sl" part="temps"/>
                <lit-tag v="pres"/>
              </equal>
          </test>
          <let>
            <clip pos="1" side="tl" part="temps"/>
            <lit-tag v="pri"/>
          </let>
        </when>
        <when>
          <test>
             <equal>
                <clip pos="1" side="sl" part="temps"/>
                <lit-tag v="past"/>
              </equal>
          </test>
          <let>
            <clip pos="1" side="tl" part="temps"/>
            <lit-tag v="pii"/>
          </let>
        </when>
      </choose>
    </def-macro>
  </section-def-macros>

The only true the innovation is the instruction: <def-macro n="set_temps" npar="1"> :

It contains two informations:

Paramètre Meaning
n="set_temps" the name given to the macro
npar="1" the number of parameters of the macro

Then, the code is identical to the one written for the rule personal pronoun + verb, except that in this rule, we specified pos="2" (the verb was the 2nd word of the pattern), whereas here, we have pos="1" which is the number of the parameter of the macro. And this macro only needs one parameter of verb type to work.

Transfer Rules using the macro[edit]

Thus let us see how the macro is used in the previous rule (changed) and the two new rules:

    <rule>
      <pattern>
        <pattern-item n="prn"/>
        <pattern-item n="verbe"/>
      </pattern>

      <action>
        <choose>    <!--  special cases for pronouns transfers  -->
          <when>    <!--  2nd person always plural : vi -> vous  -->
            <test>
               <equal>
                  <clip pos="1" side="sl" part="personne"/>
                  <lit-tag v="p2"/>
                </equal>
            </test>
            <let>
              <clip pos="1" side="tl" part="nombre"/>
              <lit-tag v="pl"/>
            </let>
          </when>
          <when>    <!--  3rd person plural always masculine : ili -> ils  -->
            <test>
               <and>
                 <equal>
                    <clip pos="1" side="sl" part="personne"/>
                    <lit-tag v="p3"/>
                  </equal>
                 <equal>
                    <clip pos="1" side="sl" part="nombre"/>
                    <lit-tag v="pl"/>
                  </equal>
                </and>
            </test>
            <let>
              <clip pos="1" side="tl" part="genre"/>
              <lit-tag v="m"/>
            </let>
          </when>
        </choose>

        <call-macro n="set_temps">
          <with-param pos="2"/>
        </call-macro>

        <out>
          <lu>
            <clip pos="1" side="tl" part="lem"/>
            <clip pos="1" side="tl" part="type_mot"/>
            <clip pos="1" side="tl" part="personne"/>
            <clip pos="1" side="tl" part="genre"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="2" side="tl" part="lem"/>
            <clip pos="2" side="tl" part="type_mot"/>
            <clip pos="2" side="tl" part="temps"/>
            <clip pos="1" side="tl" part="personne"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
        </out>
      </action>
    </rule>

    <rule>
      <pattern>
        <pattern-item n="nom_sujet"/>
        <pattern-item n="verbe"/>
      </pattern>

      <action>
        <call-macro n="set_temps">
          <with-param pos="2"/>
        </call-macro>

        <out>
          <lu>
            <lit v="un"/>
            <lit-tag v="det.ind"/>
            <clip pos="1" side="tl" part="genre"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="1" side="tl" part="lem"/>
            <clip pos="1" side="tl" part="type_mot"/>
            <clip pos="1" side="tl" part="genre"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="2" side="tl" part="lem"/>
            <clip pos="2" side="tl" part="type_mot"/>
            <clip pos="2" side="tl" part="temps"/>
            <lit-tag v="p3"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
        </out>
      </action>
    </rule>

    <rule>
      <pattern>
        <pattern-item n="det"/>
        <pattern-item n="nom_sujet"/>
        <pattern-item n="verbe"/>
      </pattern>

      <action>
        <call-macro n="set_temps">
          <with-param pos="3"/>
        </call-macro>

        <out>
          <lu>
            <clip pos="1" side="tl" part="lem"/>
            <clip pos="1" side="tl" part="type_mot"/>
            <lit-tag v="def"/>
            <clip pos="2" side="tl" part="genre"/>
            <clip pos="2" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="2" side="tl" part="lem"/>
            <clip pos="2" side="tl" part="type_mot"/>
            <clip pos="2" side="tl" part="genre"/>
            <clip pos="2" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="3" side="tl" part="lem"/>
            <clip pos="3" side="tl" part="type_mot"/>
            <clip pos="3" side="tl" part="temps"/>
            <lit-tag v="p3"/>
            <clip pos="2" side="tl" part="nombre"/>
          </lu>
        </out>
      </action>
    </rule>

In the two first rules corresponding to the following patterns:

      <pattern>
        <pattern-item n="prn"/>
        <pattern-item n="verbe"/>
      </pattern>

and

      <pattern>
        <pattern-item n="nom_sujet"/>
        <pattern-item n="verbe"/>
      </pattern>

we call the macro as follows:

        <call-macro n="set_temps">
          <with-param pos="2"/>
        </call-macro>

whereas for the last rule corresponding to the pattern:

      <pattern>
        <pattern-item n="det"/>
        <pattern-item n="nom_sujet"/>
        <pattern-item n="verbe"/>
      </pattern>

the macro call becomes:

        <call-macro n="set_temps">
          <with-param pos="3"/>
        </call-macro>

For each of the three cases, the value of pos of the tag with-param corresponds to the position of the verb in the pattern. Doing like that, we will send the macro all the information about the verb in the source language and the target language.

And if we wanted to make macro with several parameters, there would be as many with-param tags as parameters in the call for the new macro.

The rest of the two last transfer rules does not include a particular difficulty:

  • we generate the analysis of a determinant which agrees with the noun
  • then the one of the noun

as we did it in the rules without a verb.

Then, we generate the analysis of the verb, using the temps attribute updated in the macro. This verb is conjugated with the 3rd person with the number (singular or plural) of the subject noun in the sentence.

Using variables[edit]

To finish, we will examine a rule which requires to memorize a value into a variable.

This rule will translate a personal pronoun, followed by verb être (to be), followed by another verb to the past participle.

We already know how to process the pronoun followed by a verb, it was done in the paragraph Changing attributes according to conditions. It will remain to put in concordance the past participle with the personal pronoun. But there is a problem :

  • with 1st and the 2nd person, the personal pronoun must have the gender mf (masculine/féminine) to be generated,
  • for the past participle, the authorized genders are m and f (masculine or féminine, but only one of these).

Consequently, we will not be able to always use the same tag for the gender of the personal pronoun and the gender of the past participle. The idea to do that is to build the gender of the past participle from the one of the personal pronoun and to use a variable to memorize the result.

Calculation of the gender of the past participle is the following:

 IF gender of pronoun = "mf" ALORS
     genre_pp <- "m"
 ELSE
     genre_pp <- gender of pronoun
 END IF

The variable which memorizes the gender of the past participle is called genre_pp. In the case of the personal pronoun used with 1st or 2nd person, it would be necessary to make a deep analysis to find (may be in a preceding sentence) the best gender to put the past participle in concordance. Apertium does not allow this kind of complex analysis. We will thus choose the masculine in this case. On the contrary, if the personal pronoun is used with the 3rd person, we will use its gender for the past participle.

A first thing to do is to declare the variable. For that, the def-vars section becomes :

  <section-def-vars>
    <def-var n="genre_pp"/>
  </section-def-vars>

We did not yet write any rule using the verb être (to be) conjugated or past participle. It will thus be necessary to complete the section def-cats by adding the two declarations :

    <def-cat n="etre_conj">
      <cat-item tags="vbser.pres"/>
      <cat-item tags="vbser.past"/>
      <cat-item tags="vbser.fti"/>
    </def-cat>

    <def-cat n="verbe_pp">
      <cat-item tags="vbser.pp.*"/>
      <cat-item tags="vblex.pp.*"/>
      <cat-item tags="vbtr.pp.*"/>
      <cat-item tags="vbntr.pp.*"/>
      <cat-item tags="vbtr_ntr.pp.*"/>
    </def-cat>

The rule doing the required work is the following :

    <rule>
      <pattern>
        <pattern-item n="prn"/>
        <pattern-item n="etre_conj"/>
        <pattern-item n="verbe_pp"/>
      </pattern>

      <action>
        <choose>    <!--  particular case for pronouns transfers  -->
          <when>    <!--  2nd person allways plural : vi -> vous  -->
            <test>
               <equal>
                  <clip pos="1" side="sl" part="personne"/>
                  <lit-tag v="p2"/>
                </equal>
            </test>
            <let>
              <clip pos="1" side="tl" part="nombre"/>
              <lit-tag v="pl"/>
            </let>
          </when>
          <when>    <!--  3rd person plural allways masculine : ili -> ils  -->
            <test>
               <and>
                 <equal>
                    <clip pos="1" side="sl" part="personne"/>
                    <lit-tag v="p3"/>
                  </equal>
                 <equal>
                    <clip pos="1" side="sl" part="nombre"/>
                    <lit-tag v="pl"/>
                  </equal>
                </and>
            </test>
            <let>
              <clip pos="1" side="tl" part="genre"/>
              <lit-tag v="m"/>
            </let>
          </when>
        </choose>

        <choose>    <!--  if gender of the pronoun is mf, gender of the past participle will be m  -->
          <when>
            <test>
               <equal>
                  <clip pos="1" side="tl" part="genre"/>
                  <lit-tag v="mf"/>
                </equal>
            </test>
            <let>
              <var n="genre_pp"/>
              <lit-tag v="m"/>
            </let>
          </when>
          <otherwise>
            <let>
              <var n="genre_pp"/>
              <clip pos="1" side="tl" part="genre"/>
            </let>
          </otherwise>
        </choose>

        <call-macro n="set_temps">
          <with-param pos="2"/>
        </call-macro>

        <out>
          <lu>
            <clip pos="1" side="tl" part="lem"/>
            <clip pos="1" side="tl" part="type_mot"/>
            <clip pos="1" side="tl" part="personne"/>
            <clip pos="1" side="tl" part="genre"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="2" side="tl" part="lem"/>
            <lit-tag v="vbser"/>
            <clip pos="2" side="tl" part="temps"/>
            <clip pos="1" side="tl" part="personne"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
          <b />
          <lu>
            <clip pos="3" side="tl" part="lem"/>
            <clip pos="3" side="tl" part="type_mot"/>
            <lit-tag v="pp"/>
            <var n="genre_pp"/>
            <clip pos="1" side="tl" part="nombre"/>
          </lu>
        </out>
      </action>
    </rule>

The really new part of the rule is this one :

    <choose>    <!--  if gender of the pronoun is mf, gender of the past participle will be m  -->
      <when>
        <test>
           <equal>
              <clip pos="1" side="tl" part="genre"/>
              <lit-tag v="mf"/>
            </equal>
        </test>
        <let>
          <var n="genre_pp"/>
          <lit-tag v="m"/>
        </let>
      </when>
      <otherwise>
        <let>
          <var n="genre_pp"/>
          <clip pos="1" side="tl" part="genre"/>
        </let>
      </otherwise>
    </choose>

It includes two assignments of values into the variable genre_pp :

        <let>
          <var n="genre_pp"/>
          <lit-tag v="m"/>
        </let>

allowing to put the tag <m> into genre_pp,

        <let>
          <var n="genre_pp"/>
          <clip pos="1" side="tl" part="genre"/>
        </let>

allowing to put the gender of the personal pronoun into genre_pp.

We can also notice that the conditional processing performed uses for the first time the tags <otherwise>...</otherwise> .

The last thing to do is to use the variable genre_pp to generate the lexical unit for the past participle :

    <lu>
      <clip pos="3" side="tl" part="lem"/>
      <clip pos="3" side="tl" part="type_mot"/>
      <lit-tag v="pp"/>
      <var n="genre_pp"/>
      <clip pos="1" side="tl" part="nombre"/>
    </lu>

It is the same instruction :

      <var n="genre_pp"/>

that allows to initialise the variable or to access the value it countains.