Chapter 3. Important Concepts for Character Coding Systems
13
3.3 Multibyte encodings
Encodings are classified into multibyte ones and the others, according to the relationship between
number of characters and number of bytes in the encoding.
In non multibyte encoding, one character is always expressed by one byte. On the other hand,
one character may expressed in one or more bytes in multibyte encoding. Note that the number is
not fixed even in a single encoding.
Examples of multibyte encodings are: EUC JP, EUC KR, ISO 2022 JP, Shift JIS, Big5, UHC, UTF 8,
and so on. Note that all of UTF * are multibyte.
Examples of non multibyte encodings are: ISO 8859 1, ISO 8859 2, TIS 620, VISCII, and so on.
Note that even in non multibyte encoding, number of characters and number of bytes may differ
if the encoding is stateful.
Ken Lunde's  CJKV Information Processing 
3
classifies encoding methods into the following
three categories:
  modal
  non modal
  fixed length
Modal corresponds to stateful in this document. Other two are stateless, where non modal is multibyte
and fixed length is non multibyte. However, I think stateful   stateless and multibyte   non multibyte
are independent concept.
4
3.4 Number of Bytes, Number of Characters, and Number of Columns
One ASCII character is always expressed by one byte and occupies one column on console or X
terminal emulators (fixed font for X). One must not make such an assumption for I18N program 
ming and have to clearly distinguish number of bytes, characters, and columns.
Speaking of relationship between characters and bytes, in multibyte encodings, two or more bytes
may be needed to express one character. In stateful encodings, escape sequences are not related to
any characters.
Number of columns is not defined in any standards. However, it is usual that CJK ideograms,
Japanese Hiragana and Katakana, and Korean Hangul occupy two columns in console or X termi 
nal emulators. Note that 'Full width forms' in UCS 2 and UCS 4 coded character set will occupy
3
ISBN 1 56592 224 7, O'Reilly, 1999
4
though there are no existing encodings which is stateful and non multibyte.






footer




 

 

 

 

 Home | About Us | Network | Services | Support | FAQ | Control Panel | Order Online | Sitemap | Contact

indiana web hosting

 

Our partners: PHP: Hypertext Preprocessor Best Web Hosting Java Web Hosting Inexpensive Web Hosting  Jsp Web Hosting

Cheapest Web Hosting Jsp Hosting Cheap Hosting

Visionwebhosting.net Business web hosting division of Web Design Plus. All rights reserved