Some of the material from class is also covered in the Introduction to Algorithms textbook:
2.1 Insertion sort
2.3 Designing algorithms (discusses merge sort)
10.1 Stacks and queues
12 Binary Search Trees
We will study a variety of sorting algorithms in this class. Most of these algorithms will work on sequences of any ordered data type: integers, reals, string and so on.
Any particular sorting algorithm may or may not have the following desirable qualities:
in place – the algorithm can sort an array of values without requiring any extra storage
adaptive – the algorithm runs quickly on data that is already sorted
stable – the algorithm does not change the relative position of values that are equal
In the discussion of the algorithms below, we assume that arrays are indexed starting from 0, to match how Pascal indexes open arrays.
Bubble sort is a simple sorting algorithm that runs in O(n^{2}) time, where n is the number of elements in the input array. It works by making a number of passes over the input. On each pass, it compares pairs of elements: first elements 0 and 1, then elements 1 and 2, and so on. After each comparison, it swaps the elements if they are out of order.
For example, consider bubble sorting this array:
The algorithm first compares 6 and 5, and swaps them because 5 < 6:
Now the algorithm compares 6 and 3, and swaps them because 3 < 6:
And so on. At the end of the first pass, the array looks like this:
Notice that the largest element (8) has moved to the last position. In general, the first pass of a bubble sort makes n – 1 comparisons, and always brings the largest element into the last position. So the second pass does not need to go so far: it makes only n – 2 comparisons, and brings the secondlargest element into the secondtolast position. And so on. After n – 1 passes, the sort is complete and the array is in order.
Here is an animation of bubble sort in action on the above array.
Here is Pascal code implementing a bubble sort:
procedure swap(var a, b: integer); var t: integer; begin t := a; a := b; b := t; end; procedure bubbleSort(var a: array of integer); var i, j: integer; begin for i := length(a)  2 downto 0 do for j := 0 to i do if a[j] > a[j + 1] then swap(a[j], a[j + 1]); end;
The total number of comparisons performed is (n  1) + (n  2) + … + 1 = O(n^{2}).
Bubble sort works in place. It is stable. As implemented above it is not adaptive, but it is easy to modify the algorithm to be adaptive by stopping after any pass in which no elements are swapped.
Bubble sort is simple, but is usually a poor choice for a sorting algorithm, because even other O(n^{2}) algorithms such as insertion sort (to be discussed next) are faster.
Insertion sort is another O(n^{2}) sorting algorithm. It works by sorting a subarray that grows from left to right until it encompasses the entire array. When insertion begins on an unsorted array, the subarray consisting of only the first element a[0] is already trivially sorted. The sort begins by swapping the first two elements if they are out of order, so now the subarray a[0..1] is sorted. It then inserts a[2] into that subarray so that a[0..2] is sorted, and so on.
For example, consider an insertion sort on this array:
The sort first swaps the two elements, so that a[0..1] is sorted:
Now we must insert 3 into the subarray [5, 6]. It goes at the beginning:
And now we insert 1 into [3, 5, 6]. It also goes at the beginning:
Now we insert 8 into [1, 3, 5, 6]. It stays at the end, so the array does not change:
Now we insert 7 into [1, 3, 5, 6, 8]:
And so on. Here is an animation of insertion sort in action on the above array.
To insert a[i] into the sorted subarray a[0 .. (i  1)], insertion sort first sets v = a[i], then walks backwards through the subarray, shifting elements forward by one position as it goes. When it sees an element that is less than v, it stops, and inserts v to the right of it. At this point the entire subarray a[0 .. i] is sorted.
Here is a Pascal implementation of insertion sort:
procedure insertionSort(var a: array of integer); var i, j, v: integer; begin for i := 1 to length(a)  1 do begin v := a[i]; j := i; while (j > 0) and (a[j  1] > v) do begin a[j] := a[j  1]; j := j  1; end; a[j] := v; end; end;
Insertion sort is naturally adaptive. If the input array is already sorted, then no elements are shifted or modified at all and the algorithm runs in time O(n). The worst case is when the input array is in reverse order. Then to insert each value we must shift all subarray elements, so the total number of shifts is 1 + 2 + … + (n – 1) = O(n^{2}). If the input array is ordered randomly, then on average we will shift half of the subarray elements on each iteration. Then the time is still O(n^{2}).
Insertion sort works in place and is stable. It generally outperforms bubble sort and other O(n^{2}) sorting algorithms such as selection sort, so it is usually a good choice for a simple sorting algorithm when n is not large.
Merge sort is a sorting algorithm that is asymptotically faster than bubble sort and insertion sort: it runs in time O(n log n).
Merge sort has a simple recursive structure. To sort an array of n elements, it divides the array in two and recursively merge sorts each half. It then merged the two sorted subarrays into a single sorted array. This problem solving approach is called divide and conquer.
Merging two sorted arrays is easy: we repeatedly take elements from the beginning of the arrays, taking the smallest available element at each step.
For example, consider merge sort’s operation on this array:
Merge sort splits the array into two halves:
It then sorts each half, recursively:
Finally, it merges these two sorted arrays back into a single sorted array:
Here is an animation of merge sort in action on the above array. Here is a diagram showing the operation of merge sort on an array of 7 elements.
Here is a Pascal implementation of merge sort:
function fetch(const a: array of integer; k: integer; default: integer): integer; begin if (0 <= k) and (k <= high(a)) then exit(a[k]) else exit(default); end; // Merge the sorted arrays (a) and (b) into the array (c). procedure merge(a, b: array of integer; var c: array of integer); var i, j, k: integer; begin i := 0; j := 0; for k := 0 to high(c) do if fetch(a, i, MaxInt) <= fetch(b, j, MaxInt) then begin c[k] := a[i]; i := i + 1; end else begin c[k] := b[j]; j := j + 1; end; end; procedure mergeSort(var a: array of integer); var mid: integer; begin if length(a) < 2 then exit; mid := length(a) div 2; mergeSort(a[0 .. mid  1]); mergeSort(a[mid .. high(a)]); merge(a[0 .. mid  1], a[mid .. high(a)], a); end;
There is an important subtlety in the code above. In the merge
procedure, the array parameters a and b are not declared with the
const
or var
keyword. So Pascal passes the
arrays by value: it makes a copy of these arrays as it passes them to
the procedure. This is necessary for the merge to work, because the
sorted arrays cannot be merged in place. If we change the code so
that a and b are const parameters, the arrays will be passed by
reference and will not be copied. Then the merge will fail, because
writing into the array c can clobber elements of the array a that we
have not yet merged.
Even with the array copies, merge runs in time O(n), where n is the length of the array c. So the running time of mergeSort follows the recurrence
T(n) = 2 ⋅ T(n / 2) + O(n)
The solution to this recurrence is T(n) = O(n log n). One way to see this is using a recursion tree. We know that O(n) ≤ kn for some constant k. So we can expand T(n) into three terms which we draw in a tree:
Now
we expand each node at the second level in the same way:
The
first level of the tree has a single node kn.
The total time at the second level is 2k(n/2)
= kn .
When we expand the tree again, the nodes at the third level will also
add to 4k(n/4)
= kn .
If
we keep expanding the tree, it will eventually have log_{2}(n)
levels, with T(1) at the leaves of the tree. So the total time will
be kn log_{2}(n)
= O(n
log
n).
Merge sort is stable. It is not adaptive: its running time does not depend on the initial order of the input array. It does not run on arrays in place.
A stack is any data structure supporting the push
and pop
operations. push
pushes a value
onto a stack, and pop
removes the value that was most
recently pushed. This is like a stack of sheets of paper on a desk,
where sheets can be added or removed at the top.
In other words, a stack is a last in first out data structure: the last element that was added is the first to be removed.
Here is an interface for a stack:
type stack = ... procedure init(var s: stack); procedure push(var s: stack; i: integer); function pop(var s: stack): integer; function isEmpty(s: stack): boolean;
It is possible to implement a stack using various data structures. Below, we show stack implementations using an array and a linked list.
Free Pascal lets you divide code into modules using units. A unit is defined in a Pascal source file that looks like this:
unit myUnit; interface type abc = string; procedure honk(s: abc); function add(i, j: integer): integer; implementation procedure honk(s: abc); begin writeln('honk: ', s); end; function add(i, j: integer): integer; begin add := i + j; end; end.
The unit
declaration at the top of the file specifies
the name of the unit. It must be the same as the source file name
without the '.pas' extension.
The interface
section declares types, procedures and
functions that will be exported by the unit. Procedures and functions
declared in this section must be implemented in the following
implementation
section.
A unit ends with the end
keyword followed by a
period.
A program that uses a unit may call only the procedures and
functions declared in the interface
section. Any other
procedures and functions in the implementation
section
are private to the unit.
Here is a Free Pascal unit that uses an array to implement the stack interface described above. The implementation is straightforward.
The init
, pop
and isEmpty
operations run in time O(1). push
runs in time O(1) on
average, though individual push
operations may take as
long as O(n), where n is the current stack size.
{$r+} unit stack_array; interface type stack = record a: array of integer; n: integer; end; procedure init(var s: stack); procedure push(var s: stack; i: integer); function pop(var s: stack): integer; function isEmpty(s: stack): boolean; implementation procedure init(var s: stack); begin setLength(s.a, 1); s.n := 0; end; procedure push(var s: stack; i: integer); begin if length(s.a) = s.n then setLength(s.a, length(s.a) * 2); s.a[s.n] := i; s.n := s.n + 1; end; function pop(var s: stack): integer; begin pop := s.a[s.n  1]; s.n := s.n  1; end; function isEmpty(s: stack): boolean; begin isEmpty := (s.n = 0); end; end.
Here is a unit that implements a stack using a linked list. Again, the implementation is straightforward.
With this implementation, init
, push
,
pop
and isEmpty
all run in time O(1).
unit stack_linked; interface type node = record i: integer; next: ^node; end; stack = ^node; procedure init(var s: stack); procedure push(var s: stack; i: integer); function pop(var s: stack): integer; function isEmpty(s: stack): boolean; implementation procedure init(var s: stack); begin s := nil; end; procedure push(var s: stack; i: integer); var n: ^node; begin new(n); n^.i := i; n^.next := s; s := n; end; function pop(var s: stack): integer; var n: ^node; begin pop := s^.i; n := s; s := s^.next; dispose(n); end; function isEmpty(s: stack): boolean; begin isEmpty := (s = nil); end; end.
A dynamic set is a set of objects which can grow or shrink over time. Each object in a dynamic set has an associated value called a key, and we can query the set for objects by key. The keys may be integers, real numbers, strings or other ordered values. The objects in a dynamic set may or may not have other values (called satellite data).
We will define a dynamic set as supporting the following operations:
find – look up an object by key
insert – add a new object
delete – remove the object with the given key
maximum – find the largest key in the set
minimum – find the smallest key in the set
Here is an interface for a dynamic set in which keys are integers and there is no satellite data.
// a dynamic set of integer keys type dynSet = ... function find(s: dynSet; key: integer): boolean; procedure insert(s: dynSet; key: integer); procedure delete(s: dynSet; key: integer); function maximum(s: dynSet): integer; function minimum(s: dynSet): integer;
A dynamic set in which the satellite data consists of a single value is called a dictionary. Here is an interface for a dictionary in which keys are integers and the associated values are strings:
// a dictionary mapping integers to strings type dictionary = ... function find(d: dictionary; key: integer): string; procedure insert(d: dictionary; key: integer; value: string); procedure delete(d: dictionary; key: integer); function maximum(d: dictionary): integer; function minimum(d: dictionary): integer;
A dynamic set might or might not allow duplicate keys, depending on the implementation.
Like stacks, we can implement dynamic sets using a variety of data structures. For example, we can build a dynamic set using an array or linked list. If we do so, the set operations will have time complexities as follows:

unsorted array 
sorted array 
unsorted linked list 
sorted linked list 

search 
O(n) 
O(log n) 
O(n) 
O(n) 
insert 
O(1) ^{1} 
O(n) 
O(1) 
O(n) 
delete 
O(n) 
O(n) 
O(n) 
O(n) 
minimum 
O(n) 
O(1) 
O(n) 
O(1) 
maximum 
O(n) 
O(1) 
O(n) 
O(1) ^{2} 
^{1} – Inserting into an unsorted array (i.e. appending) runs in O(1) on average if we double the array size each time it needs to be reallocated, but individual operations may take O(n).
^{2} – We can find the maximum element in a sorted linked list in time O(1) only if we keep a pointer to the last element at all times.
None of these data structures allow us to both insert and retrieve elements quickly. To make that possible, we we will need to use more sophisticated data structures such as binary trees, described next.
A binary tree holds a set of values. A binary tree has zero or more nodes, each of which contains a single value. The tree with no nodes is called the empty tree. Any nonempty tree consists of a root node plus its left and right subtrees, which are also (possibly empty) binary trees.
Here is a picture of a binary tree:
In
this tree, a
is the root node. Node b
is the parent of
nodes d
and
e.
Node d
is
the left
child of
b,
and node e
is
b's
right
child.
Node e
has
a left child but no right child. Node c
has
a right child but no left child.
The subtree rooted at b is the left subtree of node a.
The nodes d, f, h and i are leaves: they have no children. Nodes a, b, c, e and g are internal nodes, which are nodes that are not leaves.
The height of this tree is 3, defined as the length of the longest path from the root to any leaf.
We can view a binary tree as a special sort of directed acyclic graph.
A binary search tree is a tree of ordered values such as integers or strings in which, for any node N with value v,
all values in N's left subtree are less than v
all values in N's right subtree are greater than v
Here is a binary search tree of integers:
Finding a value in a binary search tree is straightforward. To find the value v, we begin at the root. Let r be the root node's value. If v = r, we are done. Otherwise, if v < r then we recursively search for v in the root's left subtree; if v > r then we search in the right subtree.
Inserting a value into a binary search tree is also straightforward. Beginning at the root, we look for an insertion position, proceeding down the tree just as in the above algorithm for finding a node. When we reach an empty left or right child, we create a node there.
Deleting a value from a binary search tree is a little trickier. It's not hard to find the node to delete: we just walk down the tree, just like when searching or inserting. Once we've found the node N we want to delete, there are several cases.
If N is a leaf (it has no children), we can just remove it from the tree.
If N has only a single child, we replace N with its child. For example, we can delete node 15 in the binary tree above by replacing it with 18.
If N has two children, then we must replace it by the next
highest node in the tree. To do this, we start at N's right child
and follow left child pointers for as long as we can. This wil take
us to the smallest node in N's right subtree, which must be the next
highest node in the tree after N. Call this node M. We must remove M
from the right subtree, and fortunately this is easy: M has no left
child, so we can remove it following either case (a) or (b) above.
Now we splice M into the tree in place of N.
As a
concrete example, suppose that we want to delete the root node (with
value 10) in the tree above. This node has two children. We start at
its right child (20) and follow its left child pointer to 15. That’s
as far as we can go in following left child pointers, since 15 has
no left child. So now we remove 15 (following case b above), and
then replace 10 with 15 at the root.
We have seen that finding, inserting and deleting all run in time O(h), where h is the height of a binary search tree. What is their running time as a function of n, the number of nodes in the tree?
Suppose that a binary search tree of height h is complete: that is, all nodes have two children and all leaves have the same depth. Then its first level has 1 node, its second level has 2 nodes, its third has 4 nodes and so on. There are 2^{h1 }leaves. So the total number of nodes in the tree is
1 + 2 + 4 + … 2^{h1 }= 2^{h} – 1 = n
So h = log_{2}(n + 1) = O(log n). Thus searching, inserting and deleting in this tree all run in time O(log n). This is the bestcase running time of these operations.
Now suppose that a binary search tree of height h has only (h + 1) nodes, one at each tree level. This tree basically looks like a linked list with an extra nil pointer at every node. For this tree, h = n – 1 = O(n), so searching, inserting and deleting take time O(n). This is the worstcase running time. Unfortunately, if we insert values into a binary search tree in ascending or descending order then we will end up with a tree that looks like this.
On the other hand, if we insert values into a binary search tree in random order then it is possible to show that the tree will probably end up balanced, that is, that the expected value of the tree's height will be O(log n). So in this case tree operations will run with an expected time of O(log n).
If we want a firmer guarantee, we will need to use a more advanced data structure. In a future lecture we'll look at selfbalancing trees, which guarantee that operations such as searching, inserting and deleting will always run in time O(log n).