Programming I, 2018-9
Lecture 10 – Notes

out parameters

Sometimes we'd like to return multiple values from a function. For example, suppose that we'd like to write a function firstAndLast that returns the first and last characters of a string. One way to do that is using a procedure with var parameters:

// Return the first and last characters of a string
procedure firstAndLast(s: string; var first: char; var last: char);
begin
  first := s[1];
  last := s[length(s)];
end;

This will work, but when we read the procedure's signature it's not clear that the parameters are used only for returning values, not for receiving values from the caller.

Alternatively we can write out in place of var. This makes our intentions clearer:

// Return the first and last characters of a string
procedure firstAndLast(s: string; out first: char; out last: char);

out is similar to var, but does not let the function receive a value from the caller. For example, in the following procedure we must use var, not out:

procedure add(var i: integer; k: integer);
begin
  i += k;
end;

Note that out parameters are available only if you enable Delphi mode.

var parameters versus return values

Note that a var parameter can be an alternative to a function return value. Suppose that we'd like to write a function or procedure that takes an integer x and computes x2 + 1. We could write this:

function square1(x: integer): integer;
begin
  exit(x * x + 1);
end;

Alternatively, we could write

procedure square1(var x: integer);
begin
  x := x * x + 1;
end;

Neither of these approaches is necessarily better. The right choice depends on how you want to use this function or procedure, and is to some degree a question of style.

Similarly, suppose that we want to write a function or procedure that prepends a value to a linked list. We could write it like this:

// Return a linked list containing _i_ prepended to _p_.
function prepend(p: pnode; i: integer): pnode;
var
  q: ^node;
begin
  new(q);
  q^.i := i;
  q^.next := p;
  exit(q);
end;

Or like this:

// Prepend _i_ to the list _p_.
procedure prepend(var p: pnode; i: integer);
var
  q: ^node;
begin
  new(q);
  q^.i := i;
  q^.next := p;
  p := q;
end;

Again, neither approach is necessarily better. The first version (function prepend) is a more functional style; the second (procedure prepend) is a more imperative (or, arguably, object-oriented) style.

Note, however, that when calling function prepend you must remember to use the function's return value! Consider this code:

var
  p: ^node;

begin
  p := nil;
  prepend(p, 3);
  prepend(p, 2);
  

If you are calling procedure prepend, this will work fine. But if you are calling function prepend, then p will still be nil after the calls above! You must instead write

p := nil;

p := prepend(p, 3);
p := prepend(p, 2);

passing array elements or fields to var parameters

Consider this procedure:

procedure add(var i: integer; k: integer);
begin
  i += k;
end;

We can of course pass a local variable to add:

var
  i: integer = 4;

begin
  add(i, 3);  // now i = 7

Note that we can also pass an array element:

var
  a: array[1..10] of integer;

begin
  
  a[2] := 6;
  add(a[2], 5);  // now a[2] = 11

Or even a field:

var
  p: ^node;

begin
  
  add(p^.i, 3);

It is sometimes convenient to pass fields to var parameters when we write functions recursively on linked lists (or trees). We will see examples of this soon.

queues

In the last lecture we saw stacks, which are an abstract data type: we can implement a stack in various ways, such as using an array or using a linked list. Stacks are a last in first out (LIFO) data structure. As we saw before, the last value that we push to a stack will be the first one to be popped:

var
  s: stack;

begin
  init(s);
  push(s, 3);
  push(s, 4);
  push(s, 5);

  writeln(pop(s));  // writes 5

Queues are another important abstract data type. The interface to a queue looks like this:

type queue = 

procedure init(var q: queue);
procedure enqueue(var q: queue; i: integer);
function dequeue(var q: queue): integer;
function isEmpty(q: queue): boolean;

The enqueue function adds a value to a queue, and dequeue removes a value from a queue.

Queues are a first in first out (FIFO) data structure: the first value added to a queue will be the first one to be removed. For example:

var
  q: queue;

begin
  init(q);
  enqueue(q, 4);
  enqueue(q, 77);
  enqueue(q, 12);

  writeln(dequeue(q));  // writes 4
  writeln(dequeue(q));  // writes 77

So a queue is like a set of values waiting in line. We say that the value that will next be removed is at the head of the queue, and the value that was last added is at the tail.

implementing a queue with an array

One way to implement a queue is using an array. In addition to the array itself, we keep integer variables head and tail that indicate the array positions of the first and next-to-be-added elements:

type
  queue = record
    a: array of integer;
    head: integer;  // index of the first item on the queue
    tail: integer;  // index of the next item to be added
  end;

For example, if head = 3 and tail = 6 then there are currently three elements in the queue, with values a[3], a[4] and a[5].

As we add elements to the queue, they may wrap past the end of the array back to the beginning. For example, if length(a) = 8, head = 6 and tail = 2 then there are four elements in the queue: a[6], a[7], a[0] and a[1].

If the tail reaches the head, then the array is full. We could decide that the array will never expand past its initial size, so the queue will have some fixed maximum size. With this implementation, the enqueue and dequeue operations will run in O(1). But the fixed size is a significant limitation.

Alternatively, we can grow the array when necessary. When the queue fills the array, we allocate a new array with twice the size of the existing array. We then copy all elements from the old array to the new array. With this implementation, enqueue will run in O(1) on average, just as when we implemented a stack using an array. But in the worst case enqueue will take time O(N). The dequeue operation never changes the array size and always runs in O(1).

implementing a queue with a linked list

We can easily implement a queue using a linked list. We must keep pointers to the first and last nodes of the list, which are the head and tail of the queue. The enqueue and dequeue operations will run in constant time, i.e. O(1).

type
  node = record
    i: integer;
    next: ^node;
  end;
  
  queue = record
    head: ^node;
    tail: ^node;
  end;

procedure init(var q: queue);
begin
  q.head := nil;
  q.tail := nil;
end;

procedure enqueue(var q: queue; i: integer);
var
  n: ^node;
begin
  new(n);
  n^.i := i;
  n^.next := nil;
  
  if q.head = nil then
    begin
      q.head := n;
      q.tail := n;
    end
  else
    begin
      q.tail^.next := n;  // append to tail
      q.tail := n;
    end;
end;

function dequeue(var q: queue): integer;
var
  p: ^node;
begin
  dequeue := q.head^.i;
  p := q.head;
  q.head := q.head^.next;
  dispose(p);
  if q.head = nil then q.tail := nil;
end;

function isEmpty(q: queue): boolean;
begin
  exit(q.head = nil);
end;

sets

Another important abstract data type is a set. Here is its interface. (We use the name dset, meaning "dynamic set", since in Free Pascal set is a reserved word with another meaning.)

type dset = ...

procedure init(var d: dset);
procedure add(var d: dset; i: integer);     // add if not already present
procedure remove(var d: dset; i: integer);  // remove if present
function contains(d: dset; i: integer): boolean;

A set is like a mathematical set, but can change over time as we add or remove values dynamically. A set cannot contain the same value twice: every value is either present in the set or it is not.

Just like our other abstract data types (stacks and queues), we can build a set of values of any type, but we use a set of integers as an illustrative example here.

How can we build a set? If we know that the integers to be stored are all in some fixed range, such as 0 … 999, then we can use an array of boolean:

type
  dset = array[0 .. 999] of boolean;

procedure init(var d: dset);
var
  i: integer;
begin
  for i := low(d) to high(d) do
    d[i] := false;
end;

procedure add(var d: dset; i: integer);
begin
  d[i] := true;
end;

procedure remove(var d: dset; i: integer);
begin
  d[i] := false;
end;

function contains(d: dset; i: integer): boolean;
begin
  exit(d[i]);
end;

This implementation is very efficient. But it is impractical when the values to be stored in the set are arbitrary integers: our array of booleans would need gigabytes of memory to have one element for every possible integer. And an array of boolean will not work at all when the set will hold values of some other type, such as strings.

Alternatively we could use an unsorted array of integers. With this implementation, add will append an element to the end of the array, doubling the array's size when necessary. This will take O(1) on average. remove will delete an element from the array, shifting all following elements one place to the left. This may take O(N), where N is the current array length. And contains will scan the array looking for a particular value, which will also take O(N) in the worst case.

Another possibility is a sorted array of integers. With this implementation, add will insert an element in the middle of an array, shifting all following elements one place to the right. Its running time will be O(N). remove will also take O(N), just like with the unsorted array implementation. contains can run in O(log N) since we can perform a binary search.

Here is a summary of these running times:


unsorted array

sorted array

add

O(1) *

O(N)

remove

O(N)

O(N)

contains

O(N)

O(log N)

* = on average


Neither of these implementations will have adequate performance for many applications. We would like an implementation in which all of these operations have a fast running time, i.e. O(1) or O(log N). This motivates the study of our next subject, binary trees.

binary trees

A binary tree consists of a set of nodes. Each node contains a single value and may have 0, 1, or 2 children.

Here is a picture of a binary tree of integers. (Note that this is not a binary search tree, which is a special kind of binary tree that we will discuss later.)

tree

In this tree, node 10 is the root node. 14 is the parent of 12 and 6. 12 is the left child of 14, and 6 is the right child of 14. 14 is an ancestor of 22, and 22 is a descendant of 14.

A node may have 0, 1, or 2 children. In this tree, node 15 has a right child but no left child.

The subtree rooted at 14 is the left subtree of node 10. The subtree rooted at 1 is the right subtree of node 10.

The nodes 12, 5, 22, 4, and 3 are leaves: they have no children. Nodes 10, 14, 1, 6, and 15 are internal nodes, which are nodes that have at least one child.

The depth of a node is its distance from the root. The root has depth 0. In this tree, node 15 has depth 2 and node 4 has depth 3. The height of a tree is the greatest depth of any node. This tree has height 3.

The tree with no nodes is called the empty tree.

Note that a binary tree may be asymmetric: the right side might not look at all like the left. In fact a binary tree can have any structure at all, as long as each node has 0, 1, or 2 children.

A binary tree is complete iff every internal node has 2 children and all leaves have the same height. Here is a complete binary tree of height 3:

tree

A complete binary tree of height h has 2h leaves, and has 20 + 21 + … + 2h-1 = 2h – 1 interior nodes. So it has a total of 2+ 2– 1 = 2h + 1 – 1 nodes. In this tree there are 23 = 8 leaves and 23 – 1 = 7 interior nodes, a total of 24 – 1 = 15 nodes.

Conversely if a complete binary tree has N nodes, then N = 2h + 1 – 1, where h is the height of the tree. And so h = log2(N + 1) – 1 ≈ log2(N) – 1 = O(log N).

binary trees in Pascal

We can represent a binary tree in Pascal using nodes and pointers, similarly to how we represent linked lists. Here is a node type for a binary tree of integers:

type
  node = record
    i: integer;
    left: ^node;    // pointer to left child, or nil if none
    right: ^node;   // pointer to right child, or nil if none
  end;


  pnode = ^node;

We will generally refer to a tree using a pointer to its root. We use nil to represent the empty tree, just as we used nil for the empty linked list.

Let's see how to build a binary tree. Here is a small binary tree with just 3 nodes:

tree

We can build this in Pascal as follows:

var
  p, q, r: ^node;

begin
  new(q);
  q^.i := 7;
  q^.left := nil;
  q^.right := nil;

  new(r);
  r^.i := 5;
  r^.left := nil;
  r^.right := nil;

  new(p);
  p^.i := 4;
  p^.left := q;
  p^.right := r;

To build larger trees, we will write functions that use loops or recursion.

Let's now write some functions that act on trees. Here is a function that computes the sum of all values in a binary tree:

function sum(p: pnode): integer;
begin
  if p = nil then exit(0);
  
  exit(p^.i + sum(p^.left) + sum(p^.right));
end;

It is much easier to write this function recursively than iteratively. Recursion is a natural fit for trees, since the pattern of recursive calls in a function like this one can mirror the tree structure. To put it differently, recursion explores a tree. This is a theme we will see many times again.

Now let's write a function to calculate the height of a tree. Once again, the height is the depth of the deepest node. This is not much harder:

function depth(p: pnode): integer;
begin
  if p = nil then exit(-1);
  
  exit(1 + max(depth(p^.left), depth(p^.right)));
end;

binary search trees

A binary search tree is a binary tree in which the values are ordered in a particular way that makes searching easy: for any node N with value v,

Here is a binary search tree of integers:

tree

We can use a binary search tree to store a set supporting the add, remove, and contains operations that we described above.

contains

It is not difficult to find whether a binary tree contains a given value k. We begin at the root. If the root's value is k, then we are done. Otherwise, we compare k to the root's value v. If k < v, we move to the left child; if k > v, we move to the right child. We proceed in this way until we have found k or until we hit a nil pointer, in which case k is not in the tree.

add

Inserting a value into a binary search tree is also pretty straightforward. Beginning at the root, we look for an insertion position, proceeding down the tree just as in the above algorithm for contains. When we reach an empty left or right child, we create a node there.

remove

Deleting a value from a binary search tree is a bit trickier. It's not hard to find the node to delete: we just walk down the tree, just like when searching or inserting. Once we've found the node N we want to delete, there are several cases.

  1. If N is a leaf (it has no children), we can just remove it from the tree.

  2. If N has only a single child, we replace N with its child. For example, we can delete node 15 in the binary tree above by replacing it with 18.

  3. If N has two children, then we will replace its value by the next highest value in the tree. To do this, we start at N's right child and follow left child pointers for as long as we can. This wil take us to the smallest node in N's right subtree, which must be the next highest node in the tree after N. Call this node M. We can easily remove M from the right subtree: M has no left child, so we can remove it following either case (a) or (b) above. Now we set N's value to the value that M had.

    As a concrete example, suppose that we want to delete the root node (with value 10) in the tree above. This node has two children. We start at its right child (20) and follow its left child pointer to 15. That’s as far as we can go in following left child pointers, since 15 has no left child. So now we remove 15 (following case b above), and then replace 10 with 15 at the root.

running time of binary search tree operations

It is not difficult to see that the add, remove and contains operations described above will all run in time O(h), where h is the height of a binary search tree. What is their running time as a function of N, the number of nodes in the tree?

First consider a complete binary search tree. As we saw above, if the tree has N nodes then its height is h = log2(N + 1) – 1 ≈ log2(N) – 1 = O(log N). So add, remove, and contains will all run in time O(log N).

Even if a tree is not complete, these operations will run in O(log N) time if the tree is not too tall given its number of nodes N – specfically if its height is O(log N). We call such a tree balanced.

Unfortunately not all binary trees are balanced. Suppose that we insert values into a binary search tree in ascending order:

var
  t: pnode = nil;
  i: integer;

begin
  for i := 1 to 1000 do
    add(t, i);

The tree will look like this:

tree

This tree is completely unbalanced. It basically looks like a linked list with an extra nil pointer at every node. add, remove and contains will all run in O(N) on this tree.

How can we avoid an unbalanced tree such as this one? There are basically two possible ways. First, if we insert values into a binary search tree in a random order then that the tree will almost certainly be balanced. We will not prove this fact here (you might see a proof in your Algorithms and Data Structures class next semester).

Unfortunately it is not always practical to insert in a random order – for example, we may be reading a stream of values from a network and may need to insert each value as we receive it. So alternatively we can use a more advanced data structure known as a self-balancing binary tree, which automatically balances itself as values are inserted. Two examples of such structures are red‑black trees and AVL trees. We will not study these in this course, but you will see them in Algorithms and Data Structures next semester, and we might visit them in Programming II as well. For now, you should just be aware that they exist.