Letter case handling

From Apertium
Jump to navigation Jump to search

The same input word in a lexical processing module can be written differently regarding letter case. The most frequent cases are:

  1. The whole word is in lower case.
    e.g. beer
  2. The whole word is in upper case.
    e.g. IBM
  3. The first letter is capitalised and the rest is in lower case (typical case for proper nouns)
    e.g. Peter
  4. The word contains a jumble of cases,
    e.g. LaTeX

The transductions in the dictionary can also be found in these three states. The way in which one word is written in the dictionary is used to discard possible analysis of the word, according to the following rules:

  1. If the input letter is upper case and in the current analysis state there are concordant transitions in lower case, these transductions are made.
  2. If the input letter is lower case and in the current state there are not concordant transitions in lower case, the transductions are not made.

Thanks to this policy, a surface form that is not capitalised can not be analysed as a proper noun.

The case of an input word will be maintained in the output of the translator unless it is decided not to do so. The case can be changed in the structural transfer module; this option is useful, for example, when there is a reordering of words or when a word is added before a capitalised word at the beginning of a sentence, such as in the translation of the Catalan phrase Vindran into English: They will come.

Examples

Given the examples above in the dictionary in the case in which they are shown.

'*' denotes unanalysed (unknown word)
Input Dictionary Output
beer beer ^beer/beer<n><sg>$
BEER beer ^BEER/BEER<n><sg>$
Beer beer ^Beer/Beer<n><sg>$
beeR beer ^beeR/beer<n><sg>$
IBM IBM ^IBM/IBM<noun><singular>$
ibm IBM ^ibm/*ibm$
Ibm IBM ^Ibm/*Ibm$
IBm IBM ^IBm/*IBm$
Peter Peter ^Peter/Peter<n><sg>$
peter Peter ^peter/*peter$
PEter Peter ^PEter/PEter<n><sg>$
PETER Peter ^PETER/PETER<n><sg>$
LaTeX LaTeX ^LaTeX/LaTeX<n><sg>$
LateX LaTeX ^LateX/*LateX$
Latex LaTeX ^Latex/*Latex$
latex LaTeX ^latex/*latex$
LATEX LaTeX ^LATEX/LATEX<n><sg>$