Difference between revisions of "Unicode issues"
Jump to navigation
Jump to search
(New page: Some issues (potential and otherwise) with Unicode support. ==Combining vs. pre-combined characters== When a character has an accent, sometimes there is more than one way of representing...) |
|||
Line 11: | Line 11: | ||
The best thing to do is probably standardise on one variant for analysis/generation, and then normalise all input coming into the analyser using a transliterator or something similar. |
The best thing to do is probably standardise on one variant for analysis/generation, and then normalise all input coming into the analyser using a transliterator or something similar. |
||
==Zero-width non-joiner (ZWNJ)== |
Revision as of 15:43, 13 June 2007
Some issues (potential and otherwise) with Unicode support.
Combining vs. pre-combined characters
When a character has an accent, sometimes there is more than one way of representing it, using either pre-combined or combining characters. These look different in UTF-8, but the same to the user.
UTF-8 0xC3 0xA0 vs. 0x61 0xCC 0x81 á vs. á
The best thing to do is probably standardise on one variant for analysis/generation, and then normalise all input coming into the analyser using a transliterator or something similar.