Difference between revisions of "User talk:Unhammer"

From Apertium
Jump to navigation Jump to search
(numbering)
Line 13: Line 13:


But we can't just look at the first and last character if the lemma is eg. an acronym, we have to look at the first '''lowercase character''' in the lemma (baseform):
But we can't just look at the first and last character if the lemma is eg. an acronym, we have to look at the first '''lowercase character''' in the lemma (baseform):

<pre>
in: bcg-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine
# in: bcg-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine
in: BCG-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine
# in: BCG-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine
in: BCG-VAKSINE/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-VAKSINE
# in: BCG-VAKSINE/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-VAKSINE
in: Bcg-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine
# in: Bcg-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine
in: Bcg-Vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-Vaksine
# in: Bcg-Vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-Vaksine

</pre>
so here in the third line, the first lowercase character is the 'v', if _that_ one is uppercased and the final one is, we uppercase. If that one is uppercased while the final one is lowercased, we capitalise.
so in 3. above, the first lowercase character is the 'v', if _that_ one is uppercased and the final one is, we uppercase. If that one is uppercased while the final one is lowercased, as in 5 above, we capitalise.

Revision as of 07:39, 14 August 2009

Welcome to the Apertium Wiki! - Francis&nbsp;Tyers 22:46, 21 March 2009 (UTC)

vislcg3 -w capitalisation option

The vislcg3 -w option already outputs this:

in: JEG/jeg<prn>, out: JEG/JEG<prn>  
in: JeG/jeg<prn>, out: JeG/JEG<prn> 
in: jeG/jeg<prn>, out: jeG/jeg<prn> 
in: Jeg/jeg<prn>, out: Jeg/Jeg<prn> 
in: jeg/jeg<prn>, out: jeg/jeg<prn>

But we can't just look at the first and last character if the lemma is eg. an acronym, we have to look at the first lowercase character in the lemma (baseform):

  1. in: bcg-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine
  2. in: BCG-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine
  3. in: BCG-VAKSINE/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-VAKSINE
  4. in: Bcg-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine
  5. in: Bcg-Vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-Vaksine

so in 3. above, the first lowercase character is the 'v', if _that_ one is uppercased and the final one is, we uppercase. If that one is uppercased while the final one is lowercased, as in 5 above, we capitalise.