Difference between revisions of "User talk:Unhammer"
Jump to navigation
Jump to search
(New page: Welcome to the Apertium Wiki! - ~~~~) |
(→vislcg3 -w capitalisation option: new section) |
||
Line 1: | Line 1: | ||
Welcome to the Apertium Wiki! - [[User:Francis Tyers|Francis Tyers]] 22:46, 21 March 2009 (UTC) |
Welcome to the Apertium Wiki! - [[User:Francis Tyers|Francis Tyers]] 22:46, 21 March 2009 (UTC) |
||
== vislcg3 -w capitalisation option == |
|||
The vislcg3 -w option already outputs this: |
|||
<pre> |
|||
in: JEG/jeg<prn>, out: JEG/JEG<prn> |
|||
in: JeG/jeg<prn>, out: JeG/JEG<prn> |
|||
in: jeG/jeg<prn>, out: jeG/jeg<prn> |
|||
in: Jeg/jeg<prn>, out: Jeg/Jeg<prn> |
|||
in: jeg/jeg<prn>, out: jeg/jeg<prn> |
|||
</pre> |
|||
But we can't just look at the first and last character if the lemma is eg. an acronym, we have to look at the first '''lowercase character''' in the lemma (baseform): |
|||
<pre> |
|||
in: bcg-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine |
|||
in: BCG-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine |
|||
in: BCG-VAKSINE/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-VAKSINE |
|||
in: Bcg-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine |
|||
in: Bcg-Vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-Vaksine |
|||
</pre> |
|||
so here in the third line, the first lowercase character is the 'v', if _that_ one is uppercased and the final one is, we uppercase. If that one is uppercased while the final one is lowercased, we capitalise. |
Revision as of 07:38, 14 August 2009
Welcome to the Apertium Wiki! - Francis Tyers 22:46, 21 March 2009 (UTC)
vislcg3 -w capitalisation option
The vislcg3 -w option already outputs this:
in: JEG/jeg<prn>, out: JEG/JEG<prn> in: JeG/jeg<prn>, out: JeG/JEG<prn> in: jeG/jeg<prn>, out: jeG/jeg<prn> in: Jeg/jeg<prn>, out: Jeg/Jeg<prn> in: jeg/jeg<prn>, out: jeg/jeg<prn>
But we can't just look at the first and last character if the lemma is eg. an acronym, we have to look at the first lowercase character in the lemma (baseform):
in: bcg-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine in: BCG-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine in: BCG-VAKSINE/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-VAKSINE in: Bcg-vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-vaksine in: Bcg-Vaksine/BCG-vaksine<n><m><sg><ind> out: bcg-vaksine/BCG-Vaksine
so here in the third line, the first lowercase character is the 'v', if _that_ one is uppercased and the final one is, we uppercase. If that one is uppercased while the final one is lowercased, we capitalise.