Jump to content

C Programming/C trigraph

From Wikibooks, open books for an open world

Trigraphs

[edit | edit source]

C was designed in English and assumes the common English character set, which includes such characters as {, }, [, ], and so on. Some other languages, however, do not have these or other characters which are required by C. To solve this problem, the 1989 C standard in section 5.2.1.1 defined a set of trigraph sequences which can be substitutes for the symbols and which will work in any situation. In fact, the first translation phase of compilation specified in the 1989 C standard (section 5.1.1.2) is to replace the trigraph sequences with their corresponding single-character equivalents. Note that trigraphs will be removed from C after the next major standard of it, C23, is released.[1]

The following trigraph sequences exist, and no other. Each question mark ? that does not begin one of the trigraph sequences listed is not changed.

Sequence Replacement
======== ===========
  ??=         #
  ??(         [
  ??/         \
  ??)         ]
  ??'         ^
  ??<         {
  ??!         |
  ??>         }
  ??-         ~

The effect of this is that statements such as

printf ("Eh???/n");

will, after the trigraph is replaced, be the equivalent of

printf ("Eh?\n");

Should the programmer want the trigraph not to be replaced, within strings and character constants (which is the only place they would need replacing and it would change things), the programmer can simply escape the second question mark; e.g.

 printf ("Two question marks in a row: ?\?!\n");

The 1999 C standard added these punctuators, sometimes called digraphs, in section 6.4.6. They are equivalent to the following tokens except for their spelling:

Digraph Equivalent
======= ==========
   <:       [
   :>       ]
   <%       {
   %>       }
   %:       #
  %:%:      ##

In other words, they behave differently when stringized as part of a macro replacement, but are otherwise equivalent.

References

[edit | edit source]