Chapter 3. Important Concepts for Character Coding Systems
10
'fi'). For almost cases, text data, which intend to contain not visual information but abstract
idea, don't have to have information on glyphs, since difference between glyphs does not
affect the meaning of the text. However, distinction between different glyphs for a single
CJK ideogram may be sometimes important for proper noun such as names of persons and
places. However, there are no standardized method for plain text to have informations on
glyphs so far. This makes plain texts cannot be used for some special fields such as citizen
registration system, serious DTP such as newspaper system, and so on.
Encoding Encoding is a rule where characters and texts are expressed in combinations of bits or
bytes in order to treat characters in computers. Words of character coding system, character
code, charset, and so on are used to express the same meaning. Basically, encoding takes care
of characters, not glyphs. There are many official and de facto standards of encodings such as
ASCII, ISO 8859 {1,2,. . . ,15}, ISO 2022 {JP, JP 1, JP 2, KR, CN, CN EXT, INT 1, INT 2}, EUC 
{JP, KR, CN, TW}, Johab, UHC, Shift JIS, Big5, TIS 620, VISCII, VSCII, so called 'CodePages',
UTF 7, UTF 8, UTF 16LE, UTF 16BE, KOI8 R, and so on so on. To construct an encoding, we
have to consider the following concepts. (Encoding = one or more CCS + one CES).
Character Set Character set is a set of characters. This determines a range of characters where
the encoding can handle. In contrast to coded character set, this is often called as non coded
character set.
Coded Character Set (CCS) Coded character set (CCS) is a word defined in RFC 2050 (
http:
//www.faqs.org/rfcs/rfc2050.html
) and means a character set where all characters
have unique numbers by some method. There are many national and international stan 
dards for CCS. Many national standards for CCS adopt the way of coding so that they obey
some of international standards such as ISO 646 or ISO 2022. ASCII, BS 4730, JISX 0201 Ro 
man, and so on are examples of ISO 646 variants. All ISO 646 variants, ISO 8859 *, JISX 0208,
JISX 0212, KSX 1001, GB 2312, CNS 11643, CCCII, TIS 620, TCVN 5712, and so on are exam 
ples of ISO 2022 compliant CCS. VISCII and Big5 are examples of non ISO 2022 compliant
CCS. UCS 2 and UCS 4 (ISO 10646) are also examples of CCS.
Character Encoding Scheme (CES) Character Encoding Scheme is also a word defined in RFC
2050 (
http://www.faqs.org/rfcs/rfc2050.html
) to call methods to construct an en 
coding using one or more CCS. This is important when two or more CCS are used to con 
struct an encoding. ISO 2022 is a method to construct an encoding from one or more ISO
2022 compliant CCS. ISO 2022 is very complex system and subsets of ISO 2022 are usually
used such as EUC JP (ASCII and JISX 0208), ISO 2022 KR (ASCII and KSX 1001), and so on.
CES is not important for encodings with only one 8bit CCS. UTF series (UTF 8, UTF 16LE,
UTF 16BE, and so on) can be regarded as CES whose CCS is Unicode or ISO 10646.
Some other words are usually used related to character codes.
Character code is a widely used word to mean encoding. This is an primitive and crude word
to call the way a computer handles characters with assigning numbers. For example, character






footer




 

 

 

 

 Home | About Us | Network | Services | Support | FAQ | Control Panel | Order Online | Sitemap | Contact

indiana web hosting

 

Our partners: PHP: Hypertext Preprocessor Best Web Hosting Java Web Hosting Inexpensive Web Hosting  Jsp Web Hosting

Cheapest Web Hosting Jsp Hosting Cheap Hosting

Visionwebhosting.net Business web hosting division of Web Design Plus. All rights reserved