Chapter 4. Coded Character Sets And Encodings in the World
28
4.4.3 Problems on Unicode
All standards are not free from politics and compromise. Though a concept of united single CCS
for all characters in the world is very nice, Unicode had to consider compatibility with preceding
international and local standards. And more, unlike the ideal concept, Unicode people considered
efficiency too much. IMHO, surrogate pair is a mess caused by lack of 16bit code space. I will
introduce a few problems on Unicode.
Han Unification
This is the point on which Unicode is criticized most strongly among many Japanese people.
A region of 0x4e00   0x9fff in UCS 2 is used for Eastern Asian ideographs (Japanese Kanji, Chinese
Hanzi, and Korean Hanja). There are similar characters in these four character sets. (There are
two sets of Chinese characters, simplified Chinese used in P. R. China and traditional Chinese
used in Taiwan). To reduce the number of these ideograms to be encoded (the region for these
characters can contain only 20992 characters while only Taiwan CNS 11643 standard contains
48711 characters), these similar characters are assumed to be the same. This is Han Unification.
However these characters are not exactly the same. If fonts for these characters are made from
Chinese one, Japanese people will regard them wrong characters, though they may be able to
read. Unicode people think these united characters are the same character with different glyphs.
An example of Han Unification is available at U+9AA8 (
http://www.unicode.org/cgi bin/
GetUnihanData.pl?codepoint=9AA8
). This is a Kanji character for 'bone'. U+8FCE (
http:
//www.unicode.org/cgi bin/GetUnihanData.pl?codepoint=8FCE
) is an another exam 
ple of a Kanji character for 'welcome'. The part from left side to bottom side is 'run' radical. 'Run'
radical is used for many Kanjis and all of them have the same problem. U+76F4 (
http://www.
unicode.org/cgi bin/GetUnihanData.pl?codepoint=76F4
) is an another example of a
Kanji character for 'straight'. I, a native Japanese speaker, cannot recognize Chiense version at all.
Unicode font vendors will hesitate to choose fonts for these characters, simplified Chinese char 
acter, traditional Chinese one, Japanese one, or Korean one. One method is to supply four fonts
of simplified Chinese version, traditional Chinese version, Japanese version, and Korean version.
Commercial OS vendor can release localized version of their OS   for example, Japanese version
of MS Windows can include Japanese version of Unicode font (this is what they are exactly doing).
However, how should XFree86 or Debian do? I don't know. . .
7 8
7
XFree86 4.0 includes Japanese and Korean versions of ISO 10646 1 fonts.
8
I heard that Chinese and Korean people don't mind the glyph of these characters. If this is always true, Japanese
glyphs should be the default glyphs for these problematic characters for international systems such as Debian.






footer




 

 

 

 

 Home | About Us | Network | Services | Support | FAQ | Control Panel | Order Online | Sitemap | Contact

indiana web hosting

 

Our partners: PHP: Hypertext Preprocessor Best Web Hosting Java Web Hosting Inexpensive Web Hosting  Jsp Web Hosting

Cheapest Web Hosting Jsp Hosting Cheap Hosting

Visionwebhosting.net Business web hosting division of Web Design Plus. All rights reserved