Lecture 7: Notes

Some of the material from class is also covered in the Introduction to Algorithms textbook:

2.1 Insertion sort
2.3 Designing algorithms (discusses merge sort)
10.1 Stacks and queues
12 Binary Search Trees

sorting

We will study a variety of sorting algorithms in this class. Most of these algorithms will work on sequences of any ordered data type: integers, reals, string and so on.

Any particular sorting algorithm may or may not have the following desirable qualities:

in place – the algorithm can sort an array of values without requiring any extra storage
adaptive – the algorithm runs quickly on data that is already sorted
stable – the algorithm does not change the relative position of values that are equal

In the discussion of the algorithms below, we assume that arrays are indexed starting from 0, to match how Pascal indexes open arrays.

bubble sort

Bubble sort is a simple sorting algorithm that runs in O(n²) time, where n is the number of elements in the input array. It works by making a number of passes over the input. On each pass, it compares pairs of elements: first elements 0 and 1, then elements 1 and 2, and so on. After each comparison, it swaps the elements if they are out of order.

For example, consider bubble sorting this array:

The algorithm first compares 6 and 5, and swaps them because 5 < 6:

Now the algorithm compares 6 and 3, and swaps them because 3 < 6:

And so on. At the end of the first pass, the array looks like this:

Notice that the largest element (8) has moved to the last position. In general, the first pass of a bubble sort makes n – 1 comparisons, and always brings the largest element into the last position. So the second pass does not need to go so far: it makes only n – 2 comparisons, and brings the second-largest element into the second-to-last position. And so on. After n – 1 passes, the sort is complete and the array is in order.

Here is an animation of bubble sort in action on the above array.

Here is Pascal code implementing a bubble sort:

procedure swap(var a, b: integer);
var
  t: integer;
begin
  t := a;
  a := b;
  b := t;
end;

procedure bubbleSort(var a: array of integer);
var
  i, j: integer;
begin
  for i := length(a) - 2 downto 0 do
    for j := 0 to i do
      if a[j] > a[j + 1] then swap(a[j], a[j + 1]);
end;

The total number of comparisons performed is (n - 1) + (n - 2) + … + 1 = O(n²).

Bubble sort works in place. It is stable. As implemented above it is not adaptive, but it is easy to modify the algorithm to be adaptive by stopping after any pass in which no elements are swapped.

Bubble sort is simple, but is usually a poor choice for a sorting algorithm, because even other O(n²) algorithms such as insertion sort (to be discussed next) are faster.

insertion sort

Insertion sort is another O(n²) sorting algorithm. It works by sorting a subarray that grows from left to right until it encompasses the entire array. When insertion begins on an unsorted array, the subarray consisting of only the first element a[0] is already trivially sorted. The sort begins by swapping the first two elements if they are out of order, so now the subarray a[0..1] is sorted. It then inserts a[2] into that subarray so that a[0..2] is sorted, and so on.

For example, consider an insertion sort on this array:

The sort first swaps the two elements, so that a[0..1] is sorted:

Now we must insert 3 into the subarray [5, 6]. It goes at the beginning:

And now we insert 1 into [3, 5, 6]. It also goes at the beginning:

Now we insert 8 into [1, 3, 5, 6]. It stays at the end, so the array does not change:

Now we insert 7 into [1, 3, 5, 6, 8]:

And so on. Here is an animation of insertion sort in action on the above array.

To insert a[i] into the sorted subarray a[0 .. (i - 1)], insertion sort first sets v = a[i], then walks backwards through the subarray, shifting elements forward by one position as it goes. When it sees an element that is less than v, it stops, and inserts v to the right of it. At this point the entire subarray a[0 .. i] is sorted.

Here is a Pascal implementation of insertion sort:

procedure insertionSort(var a: array of integer);
var 
  i, j, v: integer;
begin
  for i := 1 to length(a) - 1 do
    begin
      v := a[i];
      j := i;
      while (j > 0) and (a[j - 1] > v) do
        begin
          a[j] := a[j - 1];
          j := j - 1;
        end;
      a[j] := v;
    end;
end;

Insertion sort is naturally adaptive. If the input array is already sorted, then no elements are shifted or modified at all and the algorithm runs in time O(n). The worst case is when the input array is in reverse order. Then to insert each value we must shift all subarray elements, so the total number of shifts is 1 + 2 + … + (n – 1) = O(n²). If the input array is ordered randomly, then on average we will shift half of the subarray elements on each iteration. Then the time is still O(n²).

Insertion sort works in place and is stable. It generally outperforms bubble sort and other O(n²) sorting algorithms such as selection sort, so it is usually a good choice for a simple sorting algorithm when n is not large.

merge sort

Merge sort is a sorting algorithm that is asymptotically faster than bubble sort and insertion sort: it runs in time O(n log n).

Merge sort has a simple recursive structure. To sort an array of n elements, it divides the array in two and recursively merge sorts each half. It then merged the two sorted subarrays into a single sorted array. This problem solving approach is called divide and conquer.

Merging two sorted arrays is easy: we repeatedly take elements from the beginning of the arrays, taking the smallest available element at each step.

For example, consider merge sort’s operation on this array:

Merge sort splits the array into two halves:

It then sorts each half, recursively:

Finally, it merges these two sorted arrays back into a single sorted array:

Here is an animation of merge sort in action on the above array. Here is a diagram showing the operation of merge sort on an array of 7 elements.

Here is a Pascal implementation of merge sort:

function fetch(const a: array of integer; k: integer; default: integer): integer;
begin
  if (0 <= k) and (k <= high(a)) then exit(a[k])
  else exit(default);
end;

// Merge the sorted arrays (a) and (b) into the array (c).
procedure merge(a, b: array of integer; var c: array of integer);
var
  i, j, k: integer;
begin
  i := 0;
  j := 0;
  for k := 0 to high(c) do
    if fetch(a, i, MaxInt) <= fetch(b, j, MaxInt) then
      begin
        c[k] := a[i];
        i := i + 1;
      end
    else
      begin
        c[k] := b[j];
        j := j + 1;
      end;
end;

procedure mergeSort(var a: array of integer);
var
  mid: integer;
begin
  if length(a) < 2 then exit;
  mid := length(a) div 2;
  mergeSort(a[0 .. mid - 1]);
  mergeSort(a[mid .. high(a)]);
  merge(a[0 .. mid - 1], a[mid .. high(a)], a);
end;

There is an important subtlety in the code above. In the merge procedure, the array parameters a and b are not declared with the const or var keyword. So Pascal passes the arrays by value: it makes a copy of these arrays as it passes them to the procedure. This is necessary for the merge to work, because the sorted arrays cannot be merged in place. If we change the code so that a and b are const parameters, the arrays will be passed by reference and will not be copied. Then the merge will fail, because writing into the array c can clobber elements of the array a that we have not yet merged.

Even with the array copies, merge runs in time O(n), where n is the length of the array c. So the running time of mergeSort follows the recurrence

T(n) = 2 ⋅ T(n / 2) + O(n)

The solution to this recurrence is T(n) = O(n log n). One way to see this is using a recursion tree. We know that O(n) ≤ kn for some constant k. So we can expand T(n) into three terms which we draw in a tree:

tree
Now we expand each node at the second level in the same way:

tree
The first level of the tree has a single node kn. The total time at the second level is 2k(n/2) = kn . When we expand the tree again, the nodes at the third level will also add to 4k(n/4) = kn . If we keep expanding the tree, it will eventually have log₂(n) levels, with T(1) at the leaves of the tree. So the total time will be kn log₂(n) = O(n log n).

Merge sort is stable. It is not adaptive: its running time does not depend on the initial order of the input array. It does not run on arrays in place.

stacks

A stack is any data structure supporting the push and pop operations. push pushes a value onto a stack, and pop removes the value that was most recently pushed. This is like a stack of sheets of paper on a desk, where sheets can be added or removed at the top.

In other words, a stack is a last in first out data structure: the last element that was added is the first to be removed.

Here is an interface for a stack:

type stack = ...
procedure init(var s: stack);
procedure push(var s: stack; i: integer);
function pop(var s: stack): integer;
function isEmpty(s: stack): boolean;

It is possible to implement a stack using various data structures. Below, we show stack implementations using an array and a linked list.

units

Free Pascal lets you divide code into modules using units. A unit is defined in a Pascal source file that looks like this:

unit myUnit;

interface

type
  abc = string;

procedure honk(s: abc);
function add(i, j: integer): integer;

implementation

procedure honk(s: abc);
begin
  writeln('honk: ', s);
end;

function add(i, j: integer): integer;
begin
  add := i + j;
end;

end.

The unit declaration at the top of the file specifies the name of the unit. It must be the same as the source file name without the '.pas' extension.

The interface section declares types, procedures and functions that will be exported by the unit. Procedures and functions declared in this section must be implemented in the following implementation section.

A unit ends with the end keyword followed by a period.

A program that uses a unit may call only the procedures and functions declared in the interface section. Any other procedures and functions in the implementation section are private to the unit.

implementing a stack with an array

Here is a Free Pascal unit that uses an array to implement the stack interface described above. The implementation is straightforward.

The init, pop and isEmpty operations run in time O(1). push runs in time O(1) on average, though individual push operations may take as long as O(n), where n is the current stack size.

{$r+}

unit stack_array;

interface

type
  stack = record
    a: array of integer;
    n: integer;
  end;

procedure init(var s: stack);
procedure push(var s: stack; i: integer);
function pop(var s: stack): integer;
function isEmpty(s: stack): boolean;

implementation

procedure init(var s: stack);
begin
  setLength(s.a, 1);
  s.n := 0;
end;

procedure push(var s: stack; i: integer);
begin
  if length(s.a) = s.n then
    setLength(s.a, length(s.a) * 2);
  s.a[s.n] := i;
  s.n := s.n + 1;
end;

function pop(var s: stack): integer;
begin
  pop := s.a[s.n - 1];
  s.n := s.n - 1;
end;

function isEmpty(s: stack): boolean;
begin
  isEmpty := (s.n = 0);
end;

end.

implementing a stack with a linked list

Here is a unit that implements a stack using a linked list. Again, the implementation is straightforward.

With this implementation, init, push, pop and isEmpty all run in time O(1).

unit stack_linked;

interface

type 
  node = record
    i: integer;
    next: ^node;
  end;

  stack = ^node;

procedure init(var s: stack);
procedure push(var s: stack; i: integer);
function pop(var s: stack): integer;
function isEmpty(s: stack): boolean;

implementation

procedure init(var s: stack);
begin
  s := nil;
end;

procedure push(var s: stack; i: integer);

var 
  n: ^node;
begin
  new(n);
  n^.i := i;
  n^.next := s;
  s := n;
end;

function pop(var s: stack): integer;

var 
  n: ^node;
begin
  pop := s^.i;
  n := s;
  s := s^.next;
  dispose(n);
end;

function isEmpty(s: stack): boolean;
begin
  isEmpty := (s = nil);
end;

end.

dynamic set interface

A dynamic set is a set of objects which can grow or shrink over time. Each object in a dynamic set has an associated value called a key, and we can query the set for objects by key. The keys may be integers, real numbers, strings or other ordered values. The objects in a dynamic set may or may not have other values (called satellite data).

We will define a dynamic set as supporting the following operations:

find – look up an object by key
insert – add a new object
delete – remove the object with the given key
maximum – find the largest key in the set
minimum – find the smallest key in the set

Here is an interface for a dynamic set in which keys are integers and there is no satellite data.

// a dynamic set of integer keys

type dynSet = ...

function find(s: dynSet; key: integer): boolean;
procedure insert(s: dynSet; key: integer);
procedure delete(s: dynSet; key: integer);
function maximum(s: dynSet): integer;
function minimum(s: dynSet): integer;

A dynamic set in which the satellite data consists of a single value is called a dictionary. Here is an interface for a dictionary in which keys are integers and the associated values are strings:

// a dictionary mapping integers to strings

type dictionary = ...

function find(d: dictionary; key: integer): string;
procedure insert(d: dictionary; key: integer; value: string);
procedure delete(d: dictionary; key: integer);
function maximum(d: dictionary): integer;
function minimum(d: dictionary): integer;

A dynamic set might or might not allow duplicate keys, depending on the implementation.

Like stacks, we can implement dynamic sets using a variety of data structures. For example, we can build a dynamic set using an array or linked list. If we do so, the set operations will have time complexities as follows:

	unsorted array	sorted array	unsorted linked list	sorted linked list
search	O(n)	O(log n)	O(n)	O(n)
insert	O(1) ¹	O(n)	O(1)	O(n)
delete	O(n)	O(n)	O(n)	O(n)
minimum	O(n)	O(1)	O(n)	O(1)
maximum	O(n)	O(1)	O(n)	O(1) ²

¹ – Inserting into an unsorted array (i.e. appending) runs in O(1) on average if we double the array size each time it needs to be reallocated, but individual operations may take O(n).

² – We can find the maximum element in a sorted linked list in time O(1) only if we keep a pointer to the last element at all times.

None of these data structures allow us to both insert and retrieve elements quickly. To make that possible, we we will need to use more sophisticated data structures such as binary trees, described next.

binary trees

A binary tree holds a set of values. A binary tree has zero or more nodes, each of which contains a single value. The tree with no nodes is called the empty tree. Any non-empty tree consists of a root node plus its left and right subtrees, which are also (possibly empty) binary trees.

Here is a picture of a binary tree:

tree
In this tree, a is the root node. Node b is the parent of nodes d and e. Node d is the left child of b, and node e is b's right child. Node e has a left child but no right child. Node c has a right child but no left child.

The subtree rooted at b is the left subtree of node a.

The nodes d, f, h and i are leaves: they have no children. Nodes a, b, c, e and g are internal nodes, which are nodes that are not leaves.

The height of this tree is 3, defined as the length of the longest path from the root to any leaf.

We can view a binary tree as a special sort of directed acyclic graph.

binary search trees

A binary search tree is a tree of ordered values such as integers or strings in which, for any node N with value v,

all values in N's left subtree are less than v
all values in N's right subtree are greater than v

Here is a binary search tree of integers:

tree

finding a value in a binary search tree

Finding a value in a binary search tree is straightforward. To find the value v, we begin at the root. Let r be the root node's value. If v = r, we are done. Otherwise, if v < r then we recursively search for v in the root's left subtree; if v > r then we search in the right subtree.

inserting into a binary search tree

Inserting a value into a binary search tree is also straightforward. Beginning at the root, we look for an insertion position, proceeding down the tree just as in the above algorithm for finding a node. When we reach an empty left or right child, we create a node there.

deleting from a binary search tree

Deleting a value from a binary search tree is a little trickier. It's not hard to find the node to delete: we just walk down the tree, just like when searching or inserting. Once we've found the node N we want to delete, there are several cases.

If N is a leaf (it has no children), we can just remove it from the tree.
If N has only a single child, we replace N with its child. For example, we can delete node 15 in the binary tree above by replacing it with 18.
If N has two children, then we must replace it by the next highest node in the tree. To do this, we start at N's right child and follow left child pointers for as long as we can. This wil take us to the smallest node in N's right subtree, which must be the next highest node in the tree after N. Call this node M. We must remove M from the right subtree, and fortunately this is easy: M has no left child, so we can remove it following either case (a) or (b) above. Now we splice M into the tree in place of N.

As a concrete example, suppose that we want to delete the root node (with value 10) in the tree above. This node has two children. We start at its right child (20) and follow its left child pointer to 15. That’s as far as we can go in following left child pointers, since 15 has no left child. So now we remove 15 (following case b above), and then replace 10 with 15 at the root.

running time of binary search tree operations

We have seen that finding, inserting and deleting all run in time O(h), where h is the height of a binary search tree. What is their running time as a function of n, the number of nodes in the tree?

Suppose that a binary search tree of height h is complete: that is, all nodes have two children and all leaves have the same depth. Then its first level has 1 node, its second level has 2 nodes, its third has 4 nodes and so on. There are 2^h-1leaves. So the total number of nodes in the tree is

1 + 2 + 4 + … 2^h-1= 2^h – 1 = n

So h = log₂(n + 1) = O(log n). Thus searching, inserting and deleting in this tree all run in time O(log n). This is the best-case running time of these operations.

Now suppose that a binary search tree of height h has only (h + 1) nodes, one at each tree level. This tree basically looks like a linked list with an extra nil pointer at every node. For this tree, h = n – 1 = O(n), so searching, inserting and deleting take time O(n). This is the worst-case running time. Unfortunately, if we insert values into a binary search tree in ascending or descending order then we will end up with a tree that looks like this.

On the other hand, if we insert values into a binary search tree in random order then it is possible to show that the tree will probably end up balanced, that is, that the expected value of the tree's height will be O(log n). So in this case tree operations will run with an expected time of O(log n).

If we want a firmer guarantee, we will need to use a more advanced data structure. In a future lecture we'll look at self-balancing trees, which guarantee that operations such as searching, inserting and deleting will always run in time O(log n).