Jump to content

Character Encodings/Code Tables/EBCDIC/Code page 930

From Wikibooks, open books for an open world

CCSID 930 (sometimes known as CP930 or codepage 930) is one of several Japanese EBCDIC code pages created by IBM for representation of Japanese text. It is commonly used on IBM z/OS and IBM System i operating system.

It encodes halfwidth Katakana, fullwidth Katakana, Hiragana and Kanji.

Technical detail

[edit | edit source]

CCSID 930 uses a stateful EBCDIC encoding scheme that uses 1 byte to encode halfwidth Katakana and 2 bytes to encode all other Japanese characters.[1] The single byte portion is CCSID 290, which is also known as EBCDIK (Extended Binary Coded Decimal Interchange Kana). The double byte portion is code page 300 (CCSIDs 300, 16684, and 24876),[2][3][4][5][6] which is shared with CCSID 939.[7][8] If only halfwidth Katakana mixed with Latin characters is used, which was the standard till the 80s, CCSID 930 can be considered a pure 8-bit encoding. When other types of Japanese or fullwidth characters are used, it is a multibyte encoding where the Shift-Out 0x0E and Shift-In 0x0F bytes are used to indicate the start and end of a double-byte encoding.

The most recent versions of CCSID 930 (CCSID 1390) supports JIS X 0213.[9]

It was invented by Alan Lloyd Jones at IBM Hursley Laboratories, UK.[citation needed]

Practical considerations

[edit | edit source]

CCSID 930 itself and its encoding scheme contains a number of idiosyncrasies that makes working with CCSID 930 in practice hard (see also EBCDIC for idiosyncrasies of the EBCDIC standard) and are of some practical relevance.

  • Because of the Shift-In, Shift-Out codes parsing a byte sequence from the middle is hard. Interpretation of the bytes requires backing up until one of the shift bytes is encountered.
  • Although CCSID 930 allows for mixed halfwidth and fullwidth character text, many database schemas strictly distinguish between columns containing only single byte halfwidth Katakana and such containing only double byte fullwidth characters. This is a convenience created for software developers to make text length prediction for a given column size in bytes easier and vice versa.
  • On the downside the above means that for consistency Latin text in such fullwidth character column will have to be entered or converted into fullwidth Alphabetic characters (interesting when doing database searches) such that they are encoded as double byte characters
  • When database columns are implicitly defined as pure fullwidth character text the Shift-In, Shift-Out codes are often omitted, which results in strictly speaking incorrect encoding. When the shift codes are missing, usually CCSID 290 or CCSID 300/16684/24876 needs to be used for proper conversion to another charset, like the more portable Unicode.
  • The encoding of lowercase Latin letters a–z in CCSID 290/930 is different from their common encoding in EBCDIC. This means, for example, that a program that checks for the letter 'a' would not recognize the letter 'a' in texts in this encoding. EBCDIC 298 does not have this problem.

References

[edit | edit source]
  • Lunde, Ken. CJKV Information Processing. Sebastopol, Calif.: O'Reilly & Associates, 1998. ISBN 1-56592-224-7.
  1. International Components for Unicode (ICU), ibm-930_P120-1999.ucm, 2002-12-03
  2. "Code page 300 information document". Archived from the original on 2017-06-10.
  3. "CCSID 300 information document". Archived from the original on 2016-03-27.
  4. "CCSID 16684 information document". Archived from the original on 2016-03-27.
  5. "CCSID 24876 information document". Archived from the original on 2016-03-27.
  6. Code Page CPGID 00300 (pdf) (PDF), IBM
  7. "IBM Globalization - Coded character set identifiers - CCSID 930". Archived from the original on December 1, 2014.
  8. "IBM Globalization - Coded character set identifiers - CCSID 939". Archived from the original on December 1, 2014.
  9. "CCSID 1390 information document". Archived from the original on 2016-03-27.
[edit | edit source]