Letter case handling
The same input word in a lexical processing module can be written differently regarding letter case. The most frequent cases are:
- The whole word is in lower case.
- e.g. beer
- The whole word is in upper case.
- e.g. IBM
- The first letter is capitalised and the rest is in lower case (typical case for proper nouns)
- e.g. Peter
- The word contains a jumble of cases,
- e.g. LaTeX
The transductions in the dictionary can also be found in these three states. The way in which one word is written in the dictionary is used to discard possible analysis of the word, according to the following rules:
- If the input letter is upper case and in the current analysis state there are concordant transitions in lower case, these transductions are made.
- If the input letter is lower case and in the current state there are not concordant transitions in lower case, the transductions are not made.
Thanks to this policy, a surface form that is not capitalised can not be analysed as a proper noun.
The case of an input word will be maintained in the output of the translator unless it is decided not to do so. The case can be changed in the structural transfer module; this option is useful, for example, when there is a reordering of words or when a word is added before a capitalised word at the beginning of a sentence, such as in the translation of the Catalan phrase Vindran into English: They will come.
Examples
Given the examples above, and the dictionary which makes the lt-expand
output that follows,
beer:beer<n><sg> IBM:IBM<np><org><sg> Peter:Peter<np><ant><m><sg> LaTeX:LaTeX<np><al><sg>
The following table gives the analyses that would be output in regular case-handling mode.
Input | Dictionary | Output |
---|---|---|
beer | beer | ^beer/beer<n><sg>$
|
BEER | beer | ^BEER/BEER<n><sg>$
|
Beer | beer | ^Beer/Beer<n><sg>$
|
beeR | beer | ^beeR/beer<n><sg>$
|
BeeR | beer | ^BeeR/BEER<n><sg>$
|
BeEr | beer | ^BeEr/Beer<n><sg>$
|
IBM | IBM | ^IBM/IBM<np><org><sg>$
|
ibm | IBM | ^ibm/*ibm$
|
Ibm | IBM | ^Ibm/*Ibm$
|
IBm | IBM | ^IBm/*IBm$
|
Peter | Peter | ^Peter/Peter<np><ant><m><sg>$
|
peter | Peter | ^peter/*peter$
|
PEter | Peter | ^PEter/PEter<np><ant><m><sg>$
|
PETER | Peter | ^PETER/PETER<np><ant><m><sg>$
|
LaTeX | LaTeX | ^LaTeX/LaTeX<np><al><sg>$
|
LateX | LaTeX | ^LateX/*LateX$
|
Latex | LaTeX | ^Latex/*Latex$
|
latex | LaTeX | ^latex/*latex$
|
LATEX | LaTeX | ^LATEX/LATEX<np><al><sg>$
|