In Free Pascal, a unit is a reusable module of code. After you have written a unit, you can use it easily from any other source file that you write.
For example, last week we saw how to implement a stack using an
array. Let's put our array-based stack type into a unit. We create a
file array_stack.pas
that looks like this:
unit array_stack; // unit name must match filename! interface type stack = array of integer; procedure init(var s: stack); procedure push(var s: stack; i: integer); function pop(var s: stack): integer; function isEmpty(s: stack): boolean; implementation procedure init(var s: stack); begin setLength(s, 0); end; procedure push(var s: stack; i: integer); begin setLength(s, length(s) + 1); s[high(s)] := i; end; function pop(var s: stack): integer; var k: integer; begin k := s[high(s)]; setLength(s, length(s) - 1); exit(k); end; function isEmpty(s: stack): boolean; begin exit(length(s) = 0); end; end.
As you can see above, a unit begins with a unit
declaration at the top of the file specifying the name of the unit.
It must be the same as the source file name without the '.pas'
extension.
The interface
section declares types, procedures and
functions that the unit will export. Procedures and functions
declared in this section must be implemented in the following
implementation
section.
A unit ends with the end
keyword followed by a
period.
Now suppose that we are writing a program abc.pas
. It
can use the array_stack
unit we just wrote:
// abc.pas uses array_stack; var s: stack; i: integer; begin init(s); for i := 1 to 10 do push(s, i); while not isEmpty(s) do writeln(pop(s)); end.
For this to work, you must place abc.pas
and array_stack.pas
in the same directory.
A program that uses a unit may call only the procedures and
functions declared in the interface
section. Any other
procedures and functions in the implementation
section
are private to the unit.
As we learn about more data structures in the remainder of this course, you may wish to create units that implement these structures. Then you can easily use those structures in other programs that you write. Note that when you submit a program to ReCodEx you may upload multiple source files. So you can even use units in your ReCodEx programs; simply upload the necessary units along with your top-level program.
In Pascal and many other languages, a pointer is a special kind of value that points to another value in memory. In other words, a pointer is an indirect reference to a value.
We will use pointers to build various kinds of linked data structures: linked lists, binary trees, expression trees and so on. These structures will be useful for many purposes.
You can make a pointer to any kind of value, but in this course we
will only use pointers to records. Here is a record type pos
and a pointer variable:
type pos = record x, y: integer; end; var p: ^pos;
The type ^pos
means a pointer to a pos
.
By the way, we usually pronounce ^
as "hat" since the symbol
^
looks
something like a hat.
Initially the value of p is
undefined. We can use the new
operator to dynamically
allocate a pos
that p will point to:
begin new(p);
Now p points to a record of type pos
. The values
x and y inside that record have undefined values since we haven't
stored anything there yet. Let's set those values now:
p^.x := 4; p^.y := 5;
The expression p^
means the value that p points to.
p^
is the record that we dynamically allocated. p^.x
is the field x inside that record.
Suppose that we have a second pointer variable q:
var q: ^pos;
We can now assign
q := p;
Now q points to the same record that p points to. An assignment between two pointers always makes them point to the same value.
We can set the x field through the pointer q:
q^.x := 6; writeln(p^.x); // writes 6
The change to x is also visible through the pointer p. That's because p and q are pointing to the same record.
When we are finished using a dynamically allocated value, we can
free it using the dispose
operator:
dispose(p);
This returns the record's memory to the operating system. You must never use a value after you have disposed it:
dispose(p); p^.x := 7; // BAD – may crash or have unpredictable effects
If two pointers point to the same value and you invoke dispose
on one of them, a subsequent access through the other pointer is also
invalid:
q := p; dispose(p); q^.x := 7; // BAD – may crash or have unpredictable effects
That's because dispose
frees the single object that both
pointers point to.
You may give a pointer the special value nil
:
p := nil;
nil
is a pointer to nothing. This is actually a useful
concept (sort of like the empty set, or the number 0), and we will
use nil
frequently in building linked data structures.
Any reference through nil will crash the program:
p := nil; writeln(p^.x); // CRASH
In this situation the program will die with a runtime error. (In many languages this is a dreaded null pointer exception). When you write code that uses pointers, you must take care to ensure that a null pointer runtime error can never occur.
You can pass a pointer to a function, and a function can return a pointer. To do this, however, you must declare a name for the pointer type. This will not compile:
procedure abc(p: ^pos); // COMPILER ERROR - ^ is not allowed in signature … function xyz(i: integer): ^pos; // COMPILER ERROR - ^ is not allowed in signature …
Instead, you need to do this:
type ppos = ^pos; // a pointer to a pos procedure abc(p: ppos); … function xyz(i: integer): ppos; …
Here's a procedure that takes a pointer to a pos
and
increments each of the record's components:
procedure incr(p: ppos); begin p^.x += 1; p^.y += 1; end;
Here's a function that takes an integer i and returns a pointer to a
dynamically allocated pos
whose fields are both i:
function make(i: integer): ppos; var p: ^pos; begin new(p); p^.x := i; p^.y := i; exit(p); end;
We could call this function like this:
var p: ^pos; begin p := make(4); writeln(p^.x); // writes 4
Like other function parameters, a parameter of pointer type is passed by value by default. That means that the function receives a local copy of the pointer. A change to the local copy will not be visible in the caller. For example:
procedure abc(p: ppos); begin p^.x := 4; p := nil; end; var p: ^pos; begin p^.x := 3; p^.y := 3; abc(p); writeln(p^.x); // writes 3
In the code above, the assignment 'p := nil
' only
affects the local copy of p inside abc
, and not the
outer variable p in the main begin/end block.
If you precede a parameter with the var
keyword, it
will be passed by reference, and a change to its value in the
function will be seen in the caller. For example, let's modify the
declaration of procedure abc
above as follows:
procedure abc(var p: ppos);
Now the assignment 'p := nil
' will affect the outer p.
And so the sequence
abc(p); writeln(p^.x);
will now crash, because when the function
returns p is nil
,
and a reference through a nil pointer yields a runtime error.
We can use pointers to build a useful data structure called a linked list, which looks like this:
Like an array, a linked list can hold a sequence of elements (integers in this case). But it performs quite differently from an array. We can access the jth element of an array in constant time for any j, but inserting or deleting an element at the beginning of an array or in the middle takes time O(N), where N is the length of the array. Conversely, accessing the jth element of a linked list takes time O(j), but insertions and deletions take O(1).
An element of a linked list is called a node.
A node contains one or more values, plus a pointer to the next node
in the list. The first node of a linked list is called its head.
The last node of a linked list is its tail. The tail always
points to nil
.
By the way, we will sometimes illustrate a linked list more compactly:
2 → 4 → 7 → nil
The two pictures above denote the same structure; the first is simply more detailed.
Here is a node type for a linked list that holds integers:
type node = record i: integer; next: ^node; end;
We can build the 3-element linked list pictured above as follows:
var p, q, r: ^node; begin new(r); r^.i := 7; r^.next := nil; new(q); q^.i := 4; q^.next := r; new(p); p^.i := 2; p^.next := q;
Now p
points to the head of the list. In general we
refer to a linked list using a pointer to its head.
We often want to iterate over all elements in a list. To do this,
we start with a pointer p
to the head of the list, and
advance at each step like this:
p := p^.next;
Here is a function that iterates over a linked list of integers and computes the sum of all elements:
function sum(list: pnode): integer; var p: ^node; s: integer = 0; begin p := list; while p <> nil do begin s += p^.i; p := p^.next; end; exit(s); end;
If p
points to the 3-element list that we built above,
then we can now call
writeln(add(p)); // writes 13
Or we can call
writeln(add(nil)); // writes 0
This last call works because nil
is a linked list.
It is the empty list,
i.e. a list with 0 elements.
Above we saw code that builds a fixed 3-element list. Of course, we usually want to build a list using a loop that works with any number of nodes.
One way to build a list is by prepending nodes. Recall that to prepend means to add at the beginning. For example, if we prepend the character 'p' to 'ear' the result is 'pear'.
Suppose that we want to build a list with the numbers 1 through 10
in order. We start with nil
, which is the empty list. We
allocate a node with the value 10 and prepend it to nil, yielding a
list with one node. Now we allocate a node with value 9 and prepend
it, yield a list with the values 9 and 10. And so on.
Here is a function that builds a linked list of the integers 1 through k by prepending:
function sequence(n: integer): pnode; var head, p: ^node; i: integer; begin head := nil; for i := n downto 1 do begin new(p); p^.i := i; p^.next := head; // prepend p to the list head := p; // now p is the head of the list end; exit(head); end;
Alternatively we can build a list by appending, which is only slightly harder. (Recall that to append means to add at the end.) To do this we need to keep two pointers: one to the head of the list and one to the current tail.
Here is a function that builds a linked list of the integers 1 through n by appending:
function sequence(n: integer): pnode; var head, tail, p: ^node; i: integer; begin head := nil; tail := nil; for i := 1 to n do begin new(p); p^.i := i; p^.next := nil; if head = nil then // list is empty begin head := p; tail := p; end else begin tail^.next := p; // append after tail tail := p; end; end; exit(head); end;
We can use the same technique to build a linked list of values from other sources. For example, here is a very similar function that builds a linked list of values read from standard input until EOF:
function readList: pnode; var head, tail, p: ^node; i: integer; begin head := nil; tail := nil; while not seekEof do begin new(p); read(p^.i); p^.next := nil; if head = nil then // list is empty begin head := p; tail := p; end else begin tail^.next := p; // append after tail tail := p; end; end; exit(head); end;
Let's write a function that takes a linked list and returns true if any two adjacent elements in the list are identical, i.e. have the same value.
function adjacentIdentical(p: pnode): boolean; begin if p = nil then exit(false); while p^.next <> nil do // while p doesn't point to the last node begin if p^.i = p^.next^.i then exit(true); p := p^.next; end; exit(false); end;
Note the comparison
if p^.i = p^.next^.i then
This compares the value in the node that p points to with the
value in the following node. We can analyze the last term
p^.next^.i
as follows:
p^
is the node that p points to
p^.next
is a pointer to the following node
p^.next^
is the following node
p^.next^.i
is the value in the following node
Note also the while condition
while p^.next <> nil do
This stops as soon as p points to the last node in the list. At that point we must stop, because if we attempt to access the following node's value via
p^.next^.i
we will get a runtime error.
Finally note the initial check for the empty list:
if p = nil then exit(false);
If this check were absent and the function were invoked on the empty list, then the following check
while p^.next <> nil do
would crash.
Many functions on linked lists can be written easily using recursion. Here is a recursive function to add all elements in a list:
function sum(p: pnode): integer; begin if p = nil then exit(0); exit(p^.i + sum(p^.next)); end;
In the last lecture we learned about stacks, which are an abstract data type with these operations:
type stack = ... procedure init(var s: stack); procedure push(var s: stack; i: integer); function pop(var s: stack): integer; function isEmpty(s: stack): boolean;
We also saw how to implement a stack using an array.
Alternatively we can implement a stack using a linked list. To
accomplish this, the type stack
will simply be a pointer
to a node:
type stack = ^node;
Now our stack operations are quite straightforward:
procedure init(var s: stack); begin s := nil; end; procedure push(var s: stack; i: integer); var n: ^node; begin new(n); n^.i := i; n^.next := s; s := n; end; function pop(var s: stack): integer; var i: integer; n: ^node; begin i := s^.i; n := s; s := s^.next; dispose(n); exit(i); end; function isEmpty(s: stack): boolean; begin isEmpty := (s = nil); end;
Consider this code that uses a stack:
var s: stack; i: integer; begin for i := 1 to 10 do push(s, i); while not isEmpty(s) do writeln(pop(s));
Notice that this code will produce the same result whether the stack is implemented as an array or as a linked list. This is a general feature of abstract data types: code that uses them will work correctly no matter how the type is implemented.
Different implementations of an abstract type may, however, have
different performance characteristics. In the previous lecture we saw
that if we implement a stack using a dynamic array, we can improve
its performance by modifying our implementation to double the dynamic
array's size each time we need to grow it. Even then, however, we
found that the push
operation took O(N) in the worst
case, where N is the current stack size. Our linked list-based
implementation performs differently: push
always runs in
O(1).
In a recent lecture we discussed how to compute the greatest common divisor of two integers, and learned about Euclid's algorithm, which can perform this computation efficiently. A related concept is the least common multiple of two integers p and q, which is the smallest integer that is divisible by both p and q. For example,
lcm(60, 90) = 180
How can we compute a least common multiple efficiently? Here is a useful fact: for all integers and b,
a · b = gcd(a, b) · lcm(a, b)
And so
lcm(a, b) = a · b / gcd(a, b)
We can compute the gcd efficiently using Euclid's algorithm, so the formula above gives us an efficient way to compute the lcm. Here is a Pascal function to do that:
function lcm(a, b: integer): integer; begin exit(a div gcd(a, b) * b); end;
Altenatively we could write
exit(a * b div gcd(a, b));
but the first version is better, since it avoids the risk of integer overflow if (a * b) will not fit in an integer.