Pascal Programming/Pointers
The new data type presented in this chapter adds another layer of abstraction to your repertoire: Pointers are by far the most complicated data type. If you master them, you have got what it takes to tackle even the supreme discipline of assembly programming. So, let’s get started!
Indirection
[edit | edit source]In Pascal there are two kinds of variable types.
- So far we have been using static variables. They exist during the entire execution of a block, e. g. while a
program
is running or just during execution of a routine. - There is another kind called dynamic variables.[fn 1] They do not necessarily “exist” during the entire block. That means, there is no static memory allocated, but the used memory space varies each time the program runs.
While using static variables, the compiler[fn 2] already knows which memory chunk will be used in advance.[fn 3] Dynamic variables, however, are, hence their name, dynamic in that they will be occupying different, unpredictable memory segments.
Memory is referred to by addresses.
An address is, in CS, simply a number, an integer
value so to speak.[fn 4]
When you want to refer to a certain memory block, you use its address.
The pointer data type is a value that stores an address. This address can then be used to access the memory it is referring to. A pointer, however, is just that: It is pointing, but not making any statement as regards to “whom”, what variable this block of memory “belongs.”
Declaration
[edit | edit source]In Pascal, a pointer data type declaration starts with a ↑
(up arrow), or alternatively and more frequently the ^
(caret) character, followed by the name of a data type.
program pointerDemo(output);
type
charReference = ^char;
A variable of this pointer data type can point to a single char
value (and no other data type).
In Pascal all pointer data types have to indicate the data type of the value the pointer is referring to.
This is because a pointer alone is just an address:
An address merely points to the start of a memory block.
There is no statement with respect to this block’s size, its length.
The domain restriction, the specification of targeted value’s data type, tells the compiler “how large” a memory block will be, and in consequence how to properly read and write, how to access it.
Unlike any other data type, a pointer data type is the only data type that can use data types not declared yet. Below you will see a usage scenario, but let’s continue in the script.
Allocating memory
[edit | edit source]When you are declaring a variable in the var
‑section, you are declaring a static variable.
In the following code fragment c
is a static variable, thus its memory location is already known.
var
c: charReference;
begin
{ artificially stall the program without breakpoints }
readLn;
At this point, there is no memory space alloted to a char
value yet.
There is already space to store a pointer value, the address of a char
value, but we do not have any space available to put it, a char
value, anywhere.
In Pascal you will first need to invoke the procedure new
to get memory assigned to your program
.
New
takes one pointer variable as an argument and will reserve enough memory space to hold one value of the pointer’s domain.
new(c);
After this operation
- you occupy additional memory for, in this case, one
char
value theprogram
previously did not “own”, and c
, the pointer variable itself, will give us the address of this newly allocated memory.
As with all variables of any kind, the memory space we have acquired now is totally undefined (unintialized).
Dereferencing
[edit | edit source]To use the memory we just gained we will have to follow a pointer.
This is done by appending ↑
(or usually ^
) to the name of the pointer variable.
c^ := 'X';
writeLn(c^);
This action is called dereferencing.
The pointer is a (kind of) reference to the underlying char
value.
This char
value does not have a name, but you use the pointer to access it anyway.
On this dereferenced variable we can perform all operations permissible on the pointer’s domain data type.
I. e. here we are allowed to assign a char
value 'X'
to it, and then use it, for instance, in a writeLn
as demonstrated above.
Note that something like c := 'X'
will not work, because in this case c
simply refers to the pointer, the address storage.
- The expression
c
has the data typecharReference
. - The expression
c^
has the data typechar
.
In Pascal it is forbidden to directly assign addresses to pointers, other than by using new
.
For the special case of nil
, see below.
Releasing memory
[edit | edit source]After invoking new
the respective memory is exclusively reserved to your program
.
This memory management occurs outside of your program
.
It is a typical task of the respective OS.
To reverse the operation of new
, there is a dedicated procedure
“unreserving” memory: Dispose
.
readLn;
dispose(c);
readLn;
end.
Dispose
takes the name of a pointer variable, and releases previously with new
allocated memory.
After a dispose
you may not follow, dereference, the pointer anymore.
Nevertheless the pointer itself still stores the address where, in this case, the referenced char
value was.
Meanwhile, the “freed” memory may be used again for something or by someone else.
Lifetime
[edit | edit source]In Pascal, memory of dynamic variables remains reserved
- as long as it is accessible, that means at least one pointer must point to it, or
- until you specifically request to “unreserve” memory.
If a chunk of memory is rendered inaccessible by some operation, it is automatically released.
This can happen implicitly:
In the program
above the pointer variable c
is “gone” upon program
termination.
Because this variable is/was the only pointer (left) pointing to our previously reserved char
value, there is an automatic “invisible” dispose
.
Insofar, the explicit dispose
from our side was not necessary.
However, unfortunately not all compilers comply with this specification as laid out in the Pascal ISO standards.
For instance, Delphi, as well as the FPC (even in its {$mode ISO}
compatibility mode, as of version 3.2.0) will not issue an automatic dispose
.
There, an explicit dispose
is necessary.[fn 5]
Rest assured, using the GPC it is not necessary though; the GPC fully complies with the ISO standard 7185 level 1.
Note, that memory accessibility is transitive: This means that, for instance, a pointer pointing to a pointer pointing to the memory still satisfies the accessibility requirement.
Indication
[edit | edit source]The additional housekeeping of allocating and releasing memory may seem like quite a hassle, so when does that make sense?
- All variables declared in a
var
-section need to indicate their size in advance. For some applications, however, you do not know how much data you will need to store and process. Pointers are a means to overcome this limitation. Further below we will explore how. - Pointer values can be used to represent graphs, networks, of data, allowing you to put everything into relation with each other. This means you do not need to store the same datum multiple times. A pointer value is usually, with respect to its memory requirements, a comparatively small data type. Handling pointers trades lower memory space demand for increased complexity.
Furthermore, pointer values are frequently used to implement variable parameters of routines:
Due to its smaller size passing a single pointer value can be faster than passing, that means copying, for instance an entire array
.
This kind of use of pointers is completely transparent.
Pascal equips you with an adequate language construct;
you will learn more about variable parameters in the chapter on scopes.
Links
[edit | edit source]Nil pointers
[edit | edit source]All pointers can be assigned a literal value nil
.
The nil
pointer value represents the notion “not pointing anywhere in particular.”
Coincidentally, nil
is the only pointer value that could be used for a pointer literal.
const
nowhere = nil;
There is no other pointer value that you could possibly specify anywhere in your source code.
This also means you cannot explicitely compare any specific pointer value except nil
.
Note that nil
is fundamentally different to an unintialized variable.
You are allowed to read the value of a pointer that has been assigned the value nil
, but you are still forbidden to attempt reading the value of a variable that has not been assigned any value at all.
Attempting to dereference a pointer that currently possesses the value nil constitutes a fatal error.
|
Permissible operators
[edit | edit source]In the introduction we used the analogy comparing pointers to integer
values.
However, this is really just that.
Unlike integer
values, pointers are by no means “ordered”; they do not belong the class of ordinal data types.
There is no ord
, succ
, pred
defined for a pointer, but also ordering comparison operators like <
or >=
do not apply to pointers, not to mention any arithmetic operator is invalid in combination with a pointer value.
The only operators applicable to pointers are[fn 6]
=
, do two pointer values refer to the same address,<>
, do two pointer values refer to different addresses, and:=
, the assignment of a pointer value, eithernil
or the value of an already defined pointer variable of the same data type, to a pointer variable.
It may seem at first like quite a restriction, but it prevents you from doing potentially harmful, or even just stupid stuff.
Chicken or egg
[edit | edit source]Pointers are the only data type that can be declared using a data type yet to be declared.[fn 7]
This circumstance makes it possible to declare data types containing pointers, possibly to the data type being declared at hand or other yet to be declared data types.
This is possible because a pointer to foo
has the same memory requirements as a pointer to bar
or any other data type.
The domain restriction of a pointer is not (necessarily/explicitly) stored in the program
.
In the following code fragment numberListItem
is not yet declared, but you are still allowed declare a new pointer data type with it anyway:
program listDemo(input, output);
type
numberListItemReference = ^numberListItem;
numberListItem = record
value: real;
nextItemLocation: numberListItemReference;
end;
Yet you cannot reverse the order of the declarations of numberListItemReference
and numberListItem
;
the compiler cannot magically conclude nextItemLocation
is a pointer until it has actually seen/read the respective declaration.
Putting things together
[edit | edit source]Now we can use this data structure to dynamically store a series of numbers. Pay attention when to derefercene the pointer in the following code:
var
numberListStart: numberListItemReference;
begin
new(numberListStart);
readLn(numberListStart^.value);
new(numberListStart^.nextItemLocation);
readLn(numberListStart^.nextItemLocation^.value);
dispose(numberListStart^.nextItemLocation);
dispose(numberListStart);
end.
The entire program
contains one static variable.
Only the variable numberListStart
was declared by you.
During run-time, however, while the program
is running you will have at one point two additional real
values at your disposal.
Take notice of this example’s order of dispose
statements:
The supplied pointer variable must be valid, so a reverse order would not be possible in this specific case here.
Concededly, this example could have been better implemented by simply declaring two real
variables.
The true power of pointers becomes apparent when you are, unlike the above code, use pointers as a means of abstraction.
This chapter’s exercises will delve into that.
Routines
[edit | edit source]In particular, let’s first explore a special kind of pointers: Routine parameters, that is functional and procedural parameters, are parameters of routines that allow you to statically modify the routine’s behavior by virtually passing the address of another routine. Let’s see how this works.
Declaration and use
[edit | edit source]In the formal parameter list of a routine you can declare a parameter that looks just like a routine signature:
program routineParameter(output);
procedure fancyPrint(function f: integer);
begin
writeLn('❧ ', f:1, ' ☙')
end;
Inside the definition of fancyPrint
you can use the parameter f
as if it was a regular function
declared and defined before and outside of fancyPrint
.
However, at this point it is not known what function will be used.
The actual parameter f
is in fact a pointer.[fn 8]
We only know that this pointer’s “domain” is a, any function
without parameters and returning an integer
value, but this is already enough we need to know.
One routine fits it all
[edit | edit source]To call this kind of routine you will need to specify an appropriate routine designator that matches the signature as regards to order, number and data types of parameters and, if applicable, the returned value’s data type.
function getRandom: integer;
begin
{ chosen by fair dice roll: guaranteed to be random }
getRandom := 4
end;
function getAnswer: integer;
begin
{ the answer to the ultimate question of life, the universe and everything }
getAnswer := 42
end;
begin
fancyPrint(getRandom);
fancyPrint(getAnswer)
end.
To supply a routine parameter value to a routine, simply name a compatible routine. Note that in this case you never specify any parameters, because you are not making a call here, but the called routine will do so “on behalf” of you. Specifying the routine’s name, and thus passing its address, is sufficient to achieve that.
Standard routines such as writeStr (EP) or sin cannot be used that way,[fn 9] because they are an integral part of the language. There is no (singular) routine definition for them.
|
Caveats
[edit | edit source]As a beginner, pointers are difficult to tame. Without experience, you will frequently observe (for you) “unexpected” behaviors. Some pitfalls are presented here.
with
-clause
[edit | edit source]Special care must be taken when using pointers in conjunction with a with
-clause.
The expressions listed at the top of a with
-clause are evaluated once before executing any following statement.
During the entire with
-statement the expressions using the “short” notation will actually use an invisible transient value.
This speeds up execution, because the same value is not evaluated over and over again, but there is also a caveat in it.
Surprisingly, the long notation using an FQI can become invalid, while the short notation at first seems to be still valid.
The following program
demonstrates the issue:
program withDemo(output);
type
foo = record
magnitude: integer;
end;
fooReference = ^foo;
var
bar: fooReference;
begin
new(bar);
bar^.magnitude := 42;
with bar^ do
begin
dispose(bar);
bar := nil;
{ Here, bar^.magnitude would fail horribly, }
{ but you can still do the following: }
writeLn(magnitude);
end;
end.
When you compile and run this program
, you will
- notice that it prints anything but
42
, but - it should be rather astonishing that it still prints anything at all.
The writeLn(magnitude)
does actually use a “hidden (pointer) variable” and not bar
.
This variable’s value was evaluated one time at the top of the with
-clause.
The compiler does not (and cannot) complain that bar
meanwhile became invalid.
You are not making any assignments to the actually utilized hidden variable (i. e. it is still considered bearing a valid value), thus there is no reason for complaints.
Limits
[edit | edit source]This section primarily concerns users of Delphi and the FPC, as well as possibly some other compilers. Users of the GPC could skip this section, but understanding the theory is encouraged. |
Memory is not an infinite resource. This has some grave implications.
Most OSs try their best to fulfill the processes’ requests. Using a non-ISO-compliant compiler, the following program is doomed to fail though:
program oomDemo;
var
p: ^integer;
begin
while true do
begin
new(p);
end;
end.
program overwrites the previous pointer value, thus rendering the previously associated integer value inaccessible, the now inaccessible memory is still exclusively reserved to your program . Depending on OS internals and also the compiler used to compile your program , your computer will eventually freeze (become irresponsive to any input) or (a robust OS) will just kill your program (jargon for terminating it immediately without giving it any chance to fix the problem) and reclaim the once reserved but never released memory. |
There is no means to check whether any subsequent new
will exhaust the finite resource memory.
On multi-tasking OSs it is feasible that between the time you have queried the amount of free memory space and actually requesting additional memory, another program
running at the same time has acquired memory so there is none, or not enough left for you.
This kind of situation is known as time-of-check to time-of-use.
You need to simply in a make-or-break manner ask for more memory.
This issue is rather of theoretical concern for the scope of this textbook. A standard desktop computer manufactured in the 21st century or later will not run out of memory for any programming exercise given here. This is not supposed to mean you can waste memory. |
Do not hoard memory: To mitigate potential OOM conditions, it is generally sensible to dispose memory as soon as you are certain it will not be used anymore.
|
Tasks
[edit | edit source]program listDemo
so it accepts an unknown number of items. The program
should print the number of total items first and then a list of items.function readNumber: numberListItemReference;
var
result: numberListItemReference;
begin
new(result);
with result^ do
begin
readLn(value);
nextItemLocation := nil;
end;
readNumber := result;
end;
{ === MAIN ============================================================= }
var
numberListRoot: numberListItemReference;
currentNumberListItem: numberListItemReference;
numberListLength: integer;
begin
writeLn('Enter numbers and finish by abandoning input:');
{ input - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - }
numberListRoot := readNumber;
numberListLength := 1;
currentNumberListItem := numberListRoot;
while not EOF(input) do
begin
with currentNumberListItem^ do
begin
nextItemLocation := readNumber;
currentNumberListItem := nextItemLocation;
end;
numberListLength := numberListLength + 1;
end;
{ output - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - }
writeLn('You’ve entered ', numberListLength:1, ' numbers as follows:');
currentNumberListItem := numberListRoot;
while currentNumberListItem <> nil do
begin
with currentNumberListItem^ do
begin
writeLn(value);
currentNumberListItem := nextItemLocation;
end;
end;
{ release memory - - - - - - - - - - - - - - - - - - - - - - - - - - - }
currentNumberListItem := numberListRoot;
while currentNumberListItem <> nil do
begin
with currentNumberListItem^ do
begin
dispose(currentNumberListItem);
{ Note that at _this_ point, after dispose(…), writing
… := currentNumberListItem^.nextItemLocation
would be illegal! }
currentNumberListItem := nextItemLocation;
end;
end;
end.
procedure
that accepts a real function
and graphs its function values similar to this:
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
For that complete the following procedure
:
program graphPlots(output);
const
lineWidth = 80;
procedure plot(
function f(x: real): real;
xMinimum: real; xMaximum: real; xDelta: real;
yMinimum: real; yMaximum: real
);
{
this is the part you are supposed to implement
}
function wave(x: real): real;
begin
wave := sin(x);
end;
begin
plot(wave, 0.0, 6.283, 0.196, -1.0, 1.0);
end.
plot
could look like this:
procedure plot(
function f(x: real): real;
xMinimum: real; xMaximum: real; xDelta: real;
yMinimum: real; yMaximum: real
);
var
x: real;
y: real;
column: 0..lineWidth;
begin
x := xMinimum;
while x < xMaximum do
begin
y := f(x);
{ always reset `column` in lieu of doing that in an `else` branch }
column := 0;
{ is function value within window? }
if (y >= yMinimum) and (y <= yMaximum) then
begin
{ move everything toward zero }
y := y - yMinimum;
{ scale [yMinimum, yMaximum] range to [0..79] range }
y := y * (lineWidth - 1) / (yMaximum - yMinimum);
{ convert to integer }
column := round(y) + 1;
end;
The following use of write
/writeLn
is actually an EP extension. In Standard Pascal as laid out in ISO standard 7185 all format specifiers need to be positive integer
values. Extended Pascal also allows a zero value. While for printing integer
values the width specifier still indicates a minimum width, for char
and string
values it means the exact width. Thus the following can print a blank line when column
is zero, i. e. when the function value is outside of the window.
writeLn('*':column);
It should be an easy feat for you to adapt the writeLn
line should your compiler not support this EP extension.
x := x + xDelta;
end;
end;
plot
in such a generic way, i. e. accepting a functional parameter, you can reuse it for any other function you wish to.Notes:
- ↑ The Pascal ISO standards call this idea identified variables.
- ↑ For the sake of simplicity we say this was the compiler’s task. Usually it is rather a task of a linker, the link editor, that determines and substitutes specific addresses.
- ↑ Actually the compiler does not know which (physical) memory will be used, but another abstraction layer called virtual memory administerd by the OS permits us to think that way.
- ↑ This is an analogy for explanation purposes. The range of integer values does not necessarily correspond to permissible pointer values (i. e. addresses). For instance on x32 targets pointers have 32 significant bits, but an
integer
value occupies 64 bits. - ↑ Failing to release memory will probably go unnoticed. Your
program
will compile and run without a properdispose
. However, eventually the finite resource “memory” will be exhausted, a condition known as memory leak. If there is no sufficient memory available, anynew
will fail and terminate theprogram
immediately. - ↑ Some manuals call the
↑
/^
an “operator”. This language, however, is imprecise. The↑
does not alter the state of your program, do an operation, but merely instructs the compiler to treat an identifier differently than it would without the arrow’s presence. - ↑ The declaration of the pointer data type and the referenced data type must occur in the same scope, in the same block, in other words in one and the same
type
-section. - ↑ This is an implementation detail that is not specified by the ISO standards, although in fact most compilers will implement this as a pointer.
- ↑ Some compilers do not have this restriction, yet the ISO standards require an “activation”, which simply does not happen for standard routines.