Difference between revisions of "Unicode issues"

Revision as of 15:43, 13 June 2007

Some issues (potential and otherwise) with Unicode support.

Combining vs. pre-combined characters

When a character has an accent, sometimes there is more than one way of representing it, using either pre-combined or combining characters. These look different in UTF-8, but the same to the user.

UTF-8 0xC3 0xA0 vs. 0x61 0xCC 0x81
      á         vs.      á

The best thing to do is probably standardise on one variant for analysis/generation, and then normalise all input coming into the analyser using a transliterator or something similar.

Revision as of 15:43, 13 June 2007 (edit) Francis Tyers (talk \| contribs) (New page: Some issues (potential and otherwise) with Unicode support. ==Combining vs. pre-combined characters== When a character has an accent, sometimes there is more than one way of representing...)		Revision as of 15:43, 13 June 2007 (edit) (undo) Francis Tyers (talk \| contribs) Newer edit →
Line 11:		Line 11:

	The best thing to do is probably standardise on one variant for analysis/generation, and then normalise all input coming into the analyser using a transliterator or something similar.		The best thing to do is probably standardise on one variant for analysis/generation, and then normalise all input coming into the analyser using a transliterator or something similar.

			==Zero-width non-joiner (ZWNJ)==

Difference between revisions of "Unicode issues"

Revision as of 15:43, 13 June 2007

Combining vs. pre-combined characters

Zero-width non-joiner (ZWNJ)

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools