C++ Programming
The Preprocessor
[edit | edit source]The preprocessor is either a separate program invoked by the compiler or part of the compiler itself. It performs intermediate operations that modify the original source code and internal compiler options before the compiler tries to compile the resulting source code.
The instructions that the preprocessor parses are called directives and come in two forms: preprocessor and compiler directives. Preprocessor directives direct the preprocessor on how it should process the source code, and compiler directives direct the compiler on how it should modify internal compiler options. Directives are used to make writing source code easier (by making it more portable, for instance) and to make the source code more understandable. They are also the only valid way to make use of facilities (classes, functions, templates, etc.) provided by the C++ Standard Library.
All directives start with '#' at the beginning of a line. The standard directives are:
|
|
|
Inclusion of Header Files (#include)
[edit | edit source]The #include directive allows a programmer to include contents of one file inside another file. This is commonly used to separate information needed by more than one part of a program into its own file so that it can be included again and again without having to re-type all the source code into each file.
C++ generally requires you to declare what will be used before using it. So, files called headers usually include declarations of what will be used in order for the compiler to successfully compile source code. This is further explained in the File Organization Section of the book. The standard library (the repository of code that is available with every standards-compliant C++ compiler) and 3rd party libraries make use of headers in order to allow the inclusion of the needed declarations in your source code, allowing you to make use of features or resources that are not part of the language itself.
The first lines in any source file should usually look something like this:
#include <iostream>
#include "other.h"
The above lines cause the contents of the files iostream and other.h to be included for use in your program. Usually this is implemented by just inserting into your program the contents of iostream and other.h. When angle brackets (<>) are used in the directive, the preprocessor is instructed to search for the specified file in a compiler-dependent location. When double quotation marks (" ") are used, the preprocessor is expected to search in some additional, usually user-defined, locations for the header file and to fall back to the standard include paths only if it is not found in those additional locations. Commonly when this form is used, the preprocessor will also search in the same directory as the file containing the #include directive.
The iostream header contains various declarations for input/output (I/O) using an abstraction of I/O mechanisms called streams. For example, there is an output stream object called std::cout (where "cout" is short for "console output") which is used to output text to the standard output, which usually displays the text on the computer screen.
A list of standard C++ header files is listed below:
Standard Template Library | ||
---|---|---|
and the
Standard C Library | ||
---|---|---|
Everything inside C++'s standard library is kept in the std:: namespace.
Old compilers may include headers with a .h suffix (e.g. the non-standard <iostream.h> vs. the standard <iostream>) instead of the standard headers. These names were common before the standardization of C++ and some compilers still include these headers for backwards compatibility. Rather than using the std:: namespace, these older headers pollute the global namespace and may otherwise only implement the standard in a limited way.
Some vendors use the SGI STL headers. This was the first implementation of the standard template library.
Non-standard but somewhat common C++ libraries | ||
---|---|---|
- ↑ Streams based on FILE* from stdio.h.
- ↑ Precursor to iostream. Old stream library mostly included for backwards compatibility even with old compilers.
- ↑ Uses char* whereas sstream uses string. Prefer the standard library sstream.
#pragma
[edit | edit source]The pragma (pragmatic information) directive is part of the standard, but the meaning of any pragma directive depends on the software implementation of the standard that is used.
Pragma directives are used within the source program.
#pragma token(s)
You should check the software implementation of the C++ standard you intend to use for a list of the supported tokens.
For example, one of the most widely used preprocessor pragma directives, #pragma once
, when placed at the beginning of a header file, indicates that the file where it resides will be skipped if included several times by the preprocessor.
Macros
[edit | edit source]The C++ preprocessor includes facilities for defining "macros", which roughly means the ability to replace a use of a named macro with one or more tokens. This has various uses from defining simple constants (though const is more often used for this in C++), conditional compilation, code generation and more -- macros are a powerful facility, but if used carelessly can also lead to code that is hard to read and harder to debug!
#define and #undef
[edit | edit source]The #define directive is used to define values or macros that are used by the preprocessor to manipulate the program source code before it is compiled:
#define USER_MAX (1000)
The #undef directive deletes a current macro definition:
#undef USER_MAX
It is an error to use #define to change the definition of a macro, but it is not an error to use #undef to try to undefine a macro name that is not currently defined. Therefore, if you need to override a previous macro definition, first #undef it, and then use #define to set the new definition.
\ (line continuation)
[edit | edit source]If for some reason it is needed to break a given statement into more than one line, use the \ (backslash) symbol to "escape" the line ends. For example,
#define MULTIPLELINEMACRO \ will use what you write here \ and here etc...
is equivalent to
#define MULTIPLELINEMACRO will use what you write here and here etc...
because the preprocessor joins lines ending in a backslash ("\") to the line after them. That happens even before directives (such as #define) are processed, so it works for just about all purposes, not just for macro definitions. The backslash is sometimes said to act as an "escape" character for the newline, changing its interpretation.
In some (fairly rare) cases macros can be more readable when split across multiple lines. Good modern C++ code will use macros only sparingly, so the need for multi-line macro definitions will not arise often.
It is certainly possible to overuse this feature. It is quite legal but entirely indefensible, for example, to write
int ma\
in//ma/
()/*ma/
in/*/{}
That is an abuse of the feature though: while an escaped newline can appear in the middle of a token, there should never be any reason to use it there. Do not try to write code that looks like it belongs in the International Obfuscated C Code Competition.
Warning: there is one occasional "gotcha" with using escaped newlines: if there are any invisible characters after the backslash, the lines will not be joined, and there will almost certainly be an error message produced later on, though it might not be at all obvious what caused it.
Function-like Macros
[edit | edit source]Another feature of the #define command is that it can take arguments, making it rather useful as a pseudo-function creator. Consider the following code:
#define ABSOLUTE_VALUE( x ) ( ((x) < 0) ? -(x) : (x) )
// ...
int x = -1;
while( ABSOLUTE_VALUE( x ) ) {
// ...
}
Notice that in the above example, the variable "x" is always within its own set of parentheses. This way, it will be evaluated in whole, before being compared to 0 or multiplied by -1. Also, the entire macro is surrounded by parentheses, to prevent it from being contaminated by other code. If you're not careful, you run the risk of having the compiler misinterpret your code.
Macros replace each occurrence of the macro parameter used in the text with the literal contents of the macro parameter without any validation checking. Badly written macros can result in code which will not compile or creates hard to discover bugs. Because of side-effects it is considered a very bad idea to use macro functions as described above. However, as with any rule, there may be cases where macros are the most efficient means to accomplish a particular goal.
int z = -10;
int y = ABSOLUTE_VALUE( z++ );
If ABSOLUTE_VALUE() was a real function 'z' would now have the value of '-9', but because it was an argument in a macro z++ was expanded 3 times (in this case) and thus (in this situation) executed twice, setting z to -8, and y to 9. In similar cases it is very easy to write code which has "undefined behavior", meaning that what it does is completely unpredictable in the eyes of the C++ Standard.
// ABSOLUTE_VALUE( z++ ); expanded
( ((z++) < 0 ) ? -(z++) : (z++) );
and
// An example on how to use a macro correctly
#include <iostream>
#define SLICES 8
#define PART(x) ( (x) / SLICES ) // Note the extra parentheses around '''x'''
int main() {
int b = 10, c = 6;
int a = PART(b + c);
std::cout << a;
return 0;
}
-- the result of "a" should be "2" (b + c passed to PART -> ((b + c) / SLICES) -> result is "2")
# and ##
[edit | edit source]The # and ## operators are used with the #define macro. Using # causes the first argument after the # to be returned as a string in quotes. For example:
#define as_string( s ) # s
will make the compiler turn
std::cout << as_string( Hello World! ) << std::endl;
into
std::cout << "Hello World!" << std::endl;
Using ## concatenates what's before the ## with what's after it; the result must be a well-formed preprocessing token. For example:
#define concatenate( x, y ) x ## y ... int xy = 10; ...
will make the compiler turn
std::cout << concatenate( x, y ) << std::endl;
into
std::cout << xy << std::endl;
which will, of course, display 10 to standard output.
String literals cannot be concatenated using ##, but the good news is that this is not a problem: just writing two adjacent string literals is enough to make the preprocessor concatenate them.
The dangers of macros
[edit | edit source]To illustrate the dangers of macros, consider this naive macro
#define MAX(a,b) a>b?a:b
and the code
i = MAX(2,3)+5;
j = MAX(3,2)+5;
Take a look at this and consider what the value after execution might be. The statements are turned into
int i = 2>3?2:3+5;
int j = 3>2?3:2+5;
Thus, after execution i=8 and j=3 instead of the expected result of i=j=8! This is why you were cautioned to use an extra set of parenthesis above, but even with these, the road is fraught with dangers. The alert reader might quickly realize that if a,b contains expressions, the definition must parenthesize every use of a,b in the macro definition, like this:
#define MAX(a,b) ((a)>(b)?(a):(b))
This works, provided a,b have no side effects. Indeed,
i = 2;
j = 3;
k = MAX(i++, j++);
would result in k=4, i=3 and j=5. This would be highly surprising to anyone expecting MAX() to behave like a function.
So what is the correct solution? The solution is not to use macro at all. A global, inline function, like this
inline int max(int a, int b) { return a>b?a:b }
has none of the pitfalls above, but will not work with all types. A template (see below) takes care of this
template<typename T> inline max(const T& a, const T& b) { return a>b?a:b }
Indeed, this is (a variation of) the definition used in STL library for std::max(). This library is included with all conforming C++ compilers, so the ideal solution would be to use this.
std::max(3,4);
Another danger on working with macro is that they are excluded form type checking. In the case of the MAX macro, if used with a string type variable, it will not generate a compilation error.
MAX("hello","world")
It is then preferable to use an inline function, which will be type checked. Permitting the compiler to generate a meaningful error message if the inline function is used as stated above.
String literal concatenation
[edit | edit source]One minor function of the preprocessor is in joining strings together, "string literal concatenation" -- turning code like
std::cout << "Hello " "World!\n";
into
std::cout << "Hello World!\n";
Apart from obscure uses, this is most often useful when writing long messages, as a normal C++ string literal is not allowed to span multiple lines in your source code (i.e., to contain a newline character inside it). The exception to this is the C++11 raw string literal, which can contain newlines, but does not interpret any escape characters. Using string literal concatenation also helps to keep program lines down to a reasonable length; we can write
function_name("This is a very long string literal, which would not fit " "onto a single line very nicely -- but with string literal " "concatenation, we can split it across multiple lines and " "the preprocessor will glue the pieces together");
Note that this joining happens before compilation; the compiler sees only one string literal here, and there's no work done at runtime, i.e., your program will not run any slower at all because of this joining together of strings.
Concatenation also applies to wide string literals (which are prefixed by an L):
L"this " L"and " L"that"
is converted by the preprocessor into
L"this and that".
Conditional compilation
[edit | edit source]Conditional compilation is useful for two main purposes:
- To allow certain functionality to be enabled/disabled when compiling a program
- To allow functionality to be implemented in different ways, such as when compiling on different platforms
It is also used sometimes to temporarily "comment-out" code, though using a version control system is often a more effective way to do so.
- Syntax:
#if condition statement(s) #elif condition2 statement(s) ... #elif condition statement(s) #else statement(s) #endif #ifdef defined-value statement(s) #else statement(s) #endif #ifndef defined-value statement(s) #else statement(s) #endif
#if
[edit | edit source]The #if directive allows compile-time conditional checking of preprocessor values such as created with #define. If condition is non-zero the preprocessor will include all statement(s) up to the #else, #elif or #endif directive in the output for processing. Otherwise if the #if condition was false, any #elif directives will be checked in order and the first condition which is true will have its statement(s) included in the output. Finally if the condition of the #if directive and any present #elif directives are all false the statement(s) of the #else directive will be included in the output if present; otherwise, nothing gets included.
The expression used after #if can include boolean and integral constants and arithmetic operations as well as macro names. The allowable expressions are a subset of the full range of C++ expressions (with one exception), but are sufficient for many purposes. The one extra operator available to #if is the defined operator, which can be used to test whether a macro of a given name is currently defined.
#ifdef and #ifndef
[edit | edit source]The #ifdef and #ifndef directives are short forms of '#if defined(defined-value)' and '#if !defined(defined-value)' respectively. defined(identifier) is valid in any expression evaluated by the preprocessor, and returns true (in this context, equivalent to 1) if a preprocessor variable by the name identifier was defined with #define and false (in this context, equivalent to 0) otherwise. In fact, the parentheses are optional, and it is also valid to write defined identifier without them.
(Possibly the most common use of #ifndef is in creating "include guards" for header files, to ensure that the header files can safely be included multiple times. This is explained in the section on header files.)
#endif
[edit | edit source]The #endif directive ends #if, #ifdef, #ifndef, #elif and #else directives.
- Example:
#if defined(__BSD__) || defined(__LINUX__)
#include <unistd.h>
#endif
This can be used for example to provide multiple platform support or to have one common source file set for different program versions. Another example of use is using this instead of the (non-standard) #pragma once.
- Example:
foo.hpp:
#ifndef FOO_HPP
#define FOO_HPP
// code here...
#endif // FOO_HPP
bar.hpp:
#include "foo.h"
// code here...
foo.cpp:
#include "foo.hpp"
#include "bar.hpp"
// code here
When we compile foo.cpp, only one copy of foo.hpp will be included due to the use of include guard. When the preprocessor reads the line #include "foo.hpp"
, the content of foo.hpp will be expanded. Since this is the first time which foo.hpp is read (and assuming that there is no existing declaration of macro FOO_HPP) FOO_HPP will not yet be declared, and so the code will be included normally. When the preprocessor read the line #include "bar.hpp"
in foo.cpp, the content of bar.hpp will be expanded as usual, and the file foo.h will be expanded again. Owing to the previous declaration of FOO_HPP, no code in foo.hpp will be inserted. Therefore, this can achieve our goal - avoiding the content of the file being included more than one time.
Compile-time warnings and errors
[edit | edit source]- Syntax:
#warning message
#error message
#error and #warning
[edit | edit source]The #error directive causes the compiler to stop and spit out the line number and a message given when it is encountered. The #warning directive causes the compiler to spit out a warning with the line number and a message given when it is encountered. These directives are mostly used for debugging.
- Example:
#if defined(__BSD___)
#warning Support for BSD is new and may not be stable yet
#endif
#if defined(__WIN95__)
#error Windows 95 is not supported
#endif
Source file names and line numbering macros
[edit | edit source]The current filename and line number where the preprocessing is being performed can be retrieved using the predefined macros __FILE__ and __LINE__. Line numbers are measured before any escaped newlines are removed. The current values of __FILE__ and __LINE__ can be overridden using the #line directive; it is very rarely appropriate to do this in hand-written code, but can be useful for code generators which create C++ code base on other input files, so that (for example) error messages will refer back to the original input files rather than to the generated C++ code.