C++ Programming/Memory Management
Overview
[edit | edit source]Memory management is a large subject, and C++ offers a wide range of choices for how to manage memory (and other resources, but our focus will initially be on memory).
The good news is that modern C++ makes memory management straightforward in most cases, while providing comprehensive facilities for those who need to stray from the beaten path. We will cover both the high-level approaches (which are usually preferable), and also give details of lower-level aspects such as use of new/delete/new[]/delete[] which are usually best hidden inside classes implementing higher level patterns.
Garbage Collection and RAII
[edit | edit source]Garbage collection (GC) deals with the management of dynamic memory, with different levels of automation, where the construct called collector, attempts to reclaim garbage (memory that was used by application objects that will never be accessed or mutated again). This is often regarded as an important feature of recent languages, especially if they forbid manual memory management, since manual memory management is very prone to errors and therefore requires a high level of experience from programmers. Errors due to memory management result mostly in instabilities and crashes that are only noticed at runtime, making them extremely hard to detect and correct.
C++ has optional support for garbage collection and some implementations include garbage collection (often based on the so-called Boehm collector). The C++ standard defines the implementation of the language and its underlying platform, but allows for the inclusion of extensions. For instance, Sun's C++ compiler product does include the libgc library (a conservative garbage collector).
Unlike many high level languages, C++ does not impose the use of garbage collection, and mainstream C++ idioms for memory management do not assume the use of conventional automated garbage collection. The most common garbage collection method in C++ is the use of the strangely named idiom "RAII", that stands for "Resource Acquisition Is Initialization", this idiom is covered in the RAII Section of the book. The key idea behind RAII is that a resource, whether acquired at initialization time or not, is owned by an object, and that the object's destructor will automate the release of that resource at an appropriate time. This enables C++ through RAII to support deterministic cleanup of resources, since the same approaches that work for freeing memory can also be used to release other resources (file handles, mutexes, database connections, transactions, and many more).
In the absence of a default garbage collection, RAII is a robust way to ensure that resources are not leaked even in code that might cause exceptions to be thrown. It is arguably superior to the finally construct in Java and similar languages; when a class owns a resource, Java requires every user of that class to wrap its uses in a try/finally block. In C++ the class provides a destructor, and users of that class don't need to do anything except ensure that the object is destroyed when they are finished with it (which normally takes no work, for example in the case that the object is a local variable or a data member of another object).
For common applications, the appropriate classes have already been written: many simple cases of memory management are covered by std::string and std::vector (along with the other standard containers such as std::map and std::list).
Memory Management Comparison to C
[edit | edit source]Many programmers coming to C++ from C are used to doing manual memory management, particularly for string manipulation.
Here's a simple comparison between a C program and a C++ program with similar functionality. Both examples omit error handling, which would be present in real code.
Firstly, the C code (using C99, but trivially changed to be C90-compatible):
#include <stdio.h> // for puts, getchar, stdin
#include <stdlib.h> // for malloc and free
char *getstr(int minlen, int inc) // minlen - Minimum length, inc - Increment of length
{
int index;
int ch;
char *str = malloc(minlen);
for (index = 0; (ch = getchar()) != EOF && ch != '\n'; index++)
{
if (index >= minlen - 1)
{
minlen += inc;
str = realloc(str, minlen);
}
str[index] = (char)ch;
}
str[index] = 0; // mark end of string
return str;
}
int main()
{
char* name;
puts("Please enter your full name: ");
name = getstr(10, 10); // 10, 10 are arbitrary
printf("Hello %s\n", name);
free(name);
return 0;
}
For comparison, the C++ code
#include <string> // for std::string and std::getline
#include <iostream> // for std::cin and std::cout
int main() {
std::string name;
std::cout << "Please enter your full name: ";
std::getline(std::cin, name);
std::cout << "Hello " << name << '\n';
return 0;
}
The C++ version is shorter and doesn't contain any explicit code to work out how much memory to allocate, to allocate or free memory and doesn't need to know the implementation details of 'getstr()'; that's all taken care of by the standard string class. The C++ version also traps failure of memory allocation, whereas the C version shown above needs additional checking on the result of realloc in order to be safe in low-memory situations.
Smart Pointers for Memory Management
[edit | edit source]While smart pointers have many more uses in C++ than simple memory management, they are often useful ways to manage the lifetimes of other dynamically allocated objects.
A smart pointer type is defined as any class type that overloads operator->, operator*, or operator->*. One thing to note straight away is that "smart pointers" are, in a sense, not really pointers at all -- but overloading these operators allows a smart pointer to behave much like a built-in pointer, and much code can be written which works with both "real" pointers and smart pointers.
std::auto_ptr
[edit | edit source]The only smart pointer type included in the 2003 C++ Standard is std::auto_ptr. While this has certain uses, it is not the most elegant or capable of smart pointer designs.
Provides the ability to:
- simulate the lifetime of a local variable or member variable for an object that is actually dynamically allocated
- provide a mechanism for "transfer of ownership" of objects from one owner to another.
- Simple auto_ptr example
#include <memory> // for std::auto_ptr
#include <iostream>
class Simple {
public:
std::auto_ptr<int> theInt;
Simple() : theInt(new int()) {
*theInt = 3; //get object like normal pointer
}
int f() {
return 42;
}
// when this class is destroyed, theInt will
// automatically be freed
};
int main() {
std::auto_ptr<Simple> simple(new Simple());
// note that the following won't work:
// std::auto_ptr<Simple> simple = new Simple();
// as auto_ptr can only be constructed with new values
// access member functions like normal pointers
std::cout << simple->f();
// the Simple object is freed when simple goes out of scope
return 0;
}
The = operator in auto_ptr works in a different to normal way. What it does is transfers ownership from the rhs (right hand side) auto_ptr to the lhs (left hand side) auto_ptr. The rhs pointer will then point to NULL and the object it used to point to is deallocated.
- For example
#include <memory>
#include <iostream>
int main() {
std::auto_ptr<int> a(new int(3));
// a.get() returns the raw pointer of a
std::cout << "a loc: " << a.get() << '\n';
std::cout << "a val: " << *a << '\n';
std::auto_ptr<int> b;
b = a; // now b points to the int, a is null
std::cout << "b loc: " << b.get() << '\n';
std::cout << "b val: " << *b << '\n';
std::cout << "a loc: " << a.get() << '\n';
return 0;
}
- Output (sample)
a loc: 0x3d5ef8 a val: 3 b loc: 0x3d5ef8 b val: 3 a loc: 0
Sometimes, it may be not obvious that an object never gets deallocated. Consider the following example:
- Memory leak
#include <memory>
#include <iostream>
class Sample {
public:
int value;
Sample(): value(42) {
std::cout << "The object is allocated.\n";
}
~Sample() {
std::cout << "The object is going to be deallocated.\n";
}
};
int main() {
// the object is allocated on the heap
// but cannot be deallocated
// since there's no pointer to it
std::cout << (new Sample)->value << "\n";
// destructor ~Sample is never called
}
- Output
The sample class is allocated. 42
The memory leak can be fixed using auto_ptr:
// the rest of the code stays the same
int main() {
std::cout << (std::auto_ptr<Sample>(new Sample))->value << "\n";
}
Note that sometimes you can allocate an object on stack, avoiding such difficulties.
To sum up, the behavior of auto_ptr is useful when it is desired that only one pointer ever points to a particular object, but the pointer that does point to it may be changed. If different behavior is desired, using one of the boost pointers is a better option.
Boost Smart Pointers
[edit | edit source]The boost c++ libraries include 5 different kind of smarts pointers which, along with the std::auto_ptr, can be used in almost all memory management situations. Also, some of the smart pointers in boost are going to be in the standard libraries for the proposed c++0x revision of c++ when it is released.
- The boost and std smart pointers
Pointer | Usage situation | Performance cost | Transfer of ownership | Sharing objects | Works with | Other |
---|---|---|---|---|---|---|
std::auto_ptr | An object can only be owned by one auto_ptr at a given time, this owner may be changed though | nil | Yes | No | Single instance | Doesn't work with standard containers (std::vector etc.) |
boost::scoped_ptr | An object is assigned to a scoped_ptr, it can never be assigned to another pointer | Nil | No | No | Single instance | If used as a member of a class, must be assigned in the constructor. Also, doesn't work with standard containers (std::vector etc.) |
boost::shared_ptr | Many shared_ptrs may point to a single object, when all go out of scope, the object is destroyed | Yes, uses reference counting | Yes | Yes | Single instance | Works with standard containers |
boost::weak_ptr | used with shared_ptrs to break possible cycles, which may result in memory leaks. To use, must be converted into a shared_ptr | same as shared_ptr | Yes | Yes | Single instance | Only ever used in conjunction with shared_ptrs |
boost::scoped_array | same as scoped_ptr, but works with arrays | Nil | No | No | Array of instances | See scoped_ptr |
boost::shared_array | Same as shared_ptr, but works with arrays | Yes, uses reference counting | Yes | Yes | Array of instances | See shared_ptr |
boost::intrusive_ptr | Used to create custom smart pointers for objects that have their own reference count | Depends on implementation | Yes | yes | Single Instance | In most cases, shared_ptr should be used instead of this |
Creating Your Own Smart Pointer Type
[edit | edit source]One of the rationale of using smart pointer is to avoid leaking memory. In order to avoid this, we should avoid manually managing heap-base memory. So, we have to find a container which can automatically return the memory back to the operation system when we do not use it. The destructor of class can match this requirement.
What we need to store in a basic smart pointer is, of course, the address of the allocated memory. For this, we can simply use a pointer. Let's say we are designing for storing a piece of memory for an int.
class smt_ptr { private: int* ptr; };
In order to make sure that every user puts an address in this smart pointer when doing initialization, we have to specify the constructor to accept a declaration of this smart pointer with the target address as the argument, but not "mere declaration" of the smart pointer itself.
class smt_ptr { public: explicit smt_ptr(int* pointer) : ptr(pointer) { } private: int* ptr; };
Now, we have to specify the class to "delete" the pointer when the instance of this smart pointer destructs.
class smt_ptr { public: explicit smt_ptr(int* pointer) : ptr(pointer) { } ~smt_ptr() { delete ptr; } private: int* ptr; };
We have to allow users to access the data stored in this smart pointer and make it more "pointer-like". For this, we may add a function to provide the access the raw pointer, and overload some operators, such as operator* and operator->, to make it behave like a real pointer.
class smt_ptr { public: explicit smt_ptr(int* pointer) : ptr(pointer) { } ~smt_ptr() { delete ptr; } int* get() const { return ptr; } // Declares these functions const to indicate that int* operator->() const { return ptr; } // there is no modification to the data members. int& operator*() const { return *ptr; } private: int* ptr; };
Actually, we have finished the basic parts and it is ready to use, however, to make this "homemade" smart pointer work with other data types and classes, we have to turn it into a class template.
template<typename T> class smt_ptr { public: explicit smt_ptr(T* pointer) : ptr(pointer) { } ~smt_ptr() { delete ptr; } T* get() const { return ptr; } T* operator->() const { return ptr; } T& operator*() const { return *ptr; } private: T* ptr; };
This implementation is really very basic and only provides basic features, and is subject to many serious problems, such as copying this smart pointer will lead to double deletion but we are not discussing these problems here.
Other Smart Pointers
[edit | edit source]Apart from auto_ptr, there are many other smart pointers to cover tasks from wrapping COM objects, providing automatic synchronization for multi-threaded access, or providing transaction management for database interfaces.
A good repository for many of these is the Boost library; some smart pointers from boost are included in the C++ Committee's "TR1", a collection of library components which integrate well with standard C++.
Manual Memory Management with new, delete etc.
[edit | edit source]Modern C++ code tends to use new quite rarely, and delete very rarely. From a memory standpoint, the disadvantage is that "new" allocates memory off the heap while local objects allocate memory off the stack. Heap allocation times are much slower than allocations off the stack. However, there are still times when it's appropriate to do so, and a solid understanding of how these low-level facilities work can help with understanding of what normally happens "below the hood". There are even times when new and delete are too high-level, and we need to drop back to malloc and free -- but those situations are rare exceptions indeed.
The basic idea of new and delete is simple: new creates an object of a given type and gives a pointer to it, and delete destroys an object created by new, given a pointer to it. The reason that new and delete exist in the language is that code often does not know when it is compiled exactly which objects it will need to create at runtime, or how many of them. Thus new and delete expressions allow for "dynamic" allocation of objects.
- Example
int main() {
int * p = new int(3);
int * q = p;
delete q; // the same as delete p
return 0;
}
Unfortunately it is hard to write a realistic example in a few lines of code; dynamic allocation is only justified when a simpler approach won't work, for example because an object needs to outlive a function's scope, or because it uses so much memory that we only want to create it on demand.
For those of you familiar with the C programming language, new is a kind of "type-aware" version of malloc: the type of the expression "new int" is "int*". Hence in C++ where a cast would be necessary to write int * p = reinterpret_cast<int *>(malloc(sizeof *p));
, no cast is required when using new. Because new is type-aware, it can also initialize the newly created objects, calling constructors if appropriate. The example above uses this ability to initialize the int created to have the value 3. Another enhancement of new and delete compared to malloc and free is that the C++ standard provides a standard way to change how new and delete allocate memory; in C this is normally achieved using a non-standard technique known as "interpositioning".
The basic new and delete operators are intended to allocate only a single object at a time; they are supplemented by new[] and delete[] for dynamically allocating entire arrays. Uses of new[] and delete[] are even rarer than uses of basic new and delete; usually a std::vector is a more convenient way to manage a dynamically allocated array.
Note that when you dynamically allocate an array of objects, you must write delete[] when freeing it, not plain delete. Compilers cannot usually give an error if you get this wrong; most likely your code will crash when you run it.
When a call to delete[] runs, it first retrieves information stored by new[] describing how many elements are present in the dynamically allocated array, and then calls the destructor for each element before deallocating the memory. The actual address of the memory block that was allocated may differ from the value returned by new[] to allow room to store the number of elements; this is one reason why accidentally mixing the array form of new[] with the single-element form of delete may lead to crashes.
The particularly astute reader might be wondering if it would be possible to eliminate the need to remember which of new/new[] and delete/delete[] to use, and make the compiler figure it out instead. The answer is that it would be perfectly possible, but doing so would add overhead to each single-object allocation (as delete would need to be able to work out whether the allocation was for a single object or an array), and a design principle behind C++ has been that you "don't pay for what you don't use", so the trade-off made is that single object allocations remain efficient, but users have to take care when using these low-level facilities.
Common Mistakes
[edit | edit source]Use of typedef
[edit | edit source]Read the following (buggy) code.
... typedef char CStr[100]; ... void foo() { ... char* a_string = new CStr; ... delete a_string; return; }
The code above will lead to resource leakage (or, in some cases, to a crash). It is a common mistake to release a piece of memory array with delete, but not "array delete" (i.e. delete[]). In this situation, the typedef gives the illusion that "a_string" is a pointer which points a piece of memory enough for a "char" variable but not a piece of memory array. By wrongly executing delete, other than delete[], only the memory allocated for the first element of the array is freed, and letting the memory for those 99 "char" elements be leaked. There are only 99 bytes leaked in this case, but when the array is for holding complex classes with a lot of non-static data members, megabytes of memory is leaked. Also, when the same program which contains this bug runs again, another piece of memory will be leaked.
Thus, in the code above
delete a_string;
should be corrected to
delete[] a_string;
or, better still, a string class such as std::string should be used instead of a plain array hidden behind a typedef.