Chapter 3. Important Concepts for Character Coding Systems
12
example, citizen registration system), this is too heavy theme for this document. This is because
there are no standardized or encouraged methods to handle these ACR. You may have to build
the whole system for such purposes. Good luck!
CCS in  the Report  is same as what I wrote in this document. It has concrete examples: ASCII,
ISO 8859 {1,2,. . . ,15}, JISX 0201, JISX 0208, JISX 0212, KSX 1001, KSX 1002, GB 2312, Big5, CNS
11643, TIS 620, VISCII, TCVN 5712, UCS2, UCS4, and so on. Some of them are national standards,
some are international standards, and others are de facto standards.
CEF and CES in  the Report  correspond to CES in this document. This document will not dis 
tinguish these two, since I think there are no inconvenience. An encoding with a significant CEF
doesn't have a significant CES (in  the Report  meaning), and vice versa. Then why should we
have to distinguish these two? The only exception is UTF 16 series. In UTF 16 series, UTF 16 is
a CEF and UTF 16BE is a CES. This is the only case where we need distinction between CEF and
CES.
Now, CES is a concrete concept with concrete examples: ASCII, ISO 8859 {1,2,. . . ,15}, EUC JP,
EUC KR, ISO 2022 JP, ISO 2022 JP 1, ISO 2022 JP 2, ISO 2022 CN, ISO 2022 CN EXT, ISO 2022 
KR, ISO 2022, VISCII, UTF 7, UTF 8, UTF 16LE, UTF 16BE, and so on. Now they are encodings
themselves.
The most important concept in this section is distinction between coded character set and encod 
ing. Coded character set is a component of encoding. Text data are described in encoding, not coded
character set.
3.2 Stateless and Stateful
To construct an encoding with two or more CCS, CES has to supply a method to avoid collision
between these CCS. There are two ways to do that. One is to make all characters in the all CCS
have unique code points. The other is to allow characters from different CCS to have the same
code point and to have a code such as escape sequence to switch SHIFT STATE, that is, to select
one character set.
An encoding with shift states is called STATEFUL and one without shift states is called STATE 
LESS.
Examples of stateful encodings are: ISO 2022 JP, ISO 2022 KR, ISO 2022 INT 1, ISO 2022 INT 2,
and so on.
For example, in ISO 2022 JP, two bytes of
0x24 0x2c
may mean a Japanese Hiragana character
'GA' or two ASCII character of '$' and ',' according to the shift state.






footer




 

 

 

 

 Home | About Us | Network | Services | Support | FAQ | Control Panel | Order Online | Sitemap | Contact

indiana web hosting

 

Our partners: PHP: Hypertext Preprocessor Best Web Hosting Java Web Hosting Inexpensive Web Hosting  Jsp Web Hosting

Cheapest Web Hosting Jsp Hosting Cheap Hosting

Visionwebhosting.net Business web hosting division of Web Design Plus. All rights reserved