Introduction to Algorithms
Week 8: Notes

Some of this week's topics are covered in Problem Solving with Algorithms:

7. Trees and Tree Algorithms

12. Binary Search Trees
- 12.1 What is a binary search tree?
- 12.2 Querying a binary search tree
- 12.3 Insertion and deletion

Here are some additional notes.

recursion on linked lists

We have studied both recursion and linked lists. Now we can combine these topics and write recursive functions on linked lists. (Actually recursion is a natural fit for linked lists, since linked lists are themselves recursive: the next pointer in each node points to another linked list.)

To begin, here is our linked list Node class again:

class Node:
  def __init__(self, val, next):
    self.val = val
    self.next = next

Here is a recursive function that computes the sum of all values in a linked list:

def listSum(n):
  if n == None:
    return 0
  return n.val + listSum(n.next)

We could write similar recursive functions to count the nodes in a linked list, find the greatest value in a linked list, and so on.

Is it better to write linked list functions iteratively or recursively? Recursion does bring one danger: the call stack that Python uses to keep track of recursive calls only has a fixed size, and too many nested levels of recursion can lead to a stack overflow, which will terminate the program. Here is a simple experiment to determine how deeply we can recurse:

def recurse(n):
  print(n)
  recurse(n + 1)

>>> recurse(1)
1
2
3
…
995
Traceback (most recent call last):
… 
RecursionError: maximum recursion depth exceeded while calling a Python object

So (in this function at least) we can only recurse to about 1000 levels. That means that our recursive function listSum above can only handle lists with less than about 1000 nodes. That's a signficant limitation.

Some compilers can automatically transform recursive code into iterative code that cannot overflow the call stack. But the default Python implementation isn't like that. Given that, it's probably best to write linked list code iteratively unless you are sure that the lists in your program will be short.

sets and maps

We have already seen stacks and queues, two abstract data types. We've seen that it's possible to implement these abstract types using various concrete data structures. For example, we can implement a stack or queue either using an array or a linked list.

We'll now introduce two more abstract data types. The first is a set, which provides the following operations:

s.add(value): Add a value to a set.
s.remove(value): Remove a value from a set.
s.contains(value): Test whether a value is present in a set, returning True or False.

A set cannot contain the same value twice: every value is either present in the set or it is not.

The second is a map (or dictionary), which maps keys to values. It provides these operations:

m.add(key, value): Add a new (key, value) pair, or update an existing key if present.
m.remove(key): Remove a key and its associated value.
m.lookup(key): Look up a key and return its associated value, or None if absent.

A map cannot contain the same key twice. In other words, a map associates a key with exactly one value.

Both of these types will be quite familiar, because they have default implementations in Python that we have seen and used often.

In this algorithms course, we will study various ways to implement sets and maps. We'd like to understand how we could build efficient sets and maps in Python if they were not already provided in the standard library.

binary trees

A binary tree consists of a set of nodes. Each node contains a single value and may have 0, 1, or 2 children.

Here is a picture of a binary tree of integers. (Note that this is not a binary search tree, which is a special kind of binary tree that we will discuss later.)

tree

In this tree, node 10 is the root node. 14 is the parent of 12 and 6. 12 is the left child of 14, and 6 is the right child of 14. 14 is an ancestor of 22, and 22 is a descendant of 14.

A node may have 0, 1, or 2 children. In this tree, node 15 has a right child but no left child.

The subtree rooted at 14 is the left subtree of node 10. The subtree rooted at 1 is the right subtree of node 10.

The nodes 12, 5, 22, 4, and 3 are leaves: they have no children. Nodes 10, 14, 1, 6, and 15 are internal nodes, which are nodes that have at least one child.

The depth of a node is its distance from the root. The root has depth 0. In this tree, node 15 has depth 2 and node 4 has depth 3. The height of a tree is the greatest depth of any node. This tree has height 3.

The tree with no nodes is called the empty tree.

Note that a binary tree may be asymmetric: the right side might not look at all like the left. In fact a binary tree can have any structure at all, as long as each node has 0, 1, or 2 children.

A binary tree is complete iff every internal node has 2 children and all leaves have the same height. Here is a complete binary tree of height 3:

tree

A complete binary tree of height h has 2^h leaves, and has 2⁰ + 2¹ + … + 2^h-1 = 2^h – 1 interior nodes. So it has a total of 2^h+ 2^h– 1 = 2^h + 1– 1 nodes. In this tree there are 2³ = 8 leaves and 2³ – 1 = 7 interior nodes, a total of 2⁴ – 1 = 15 nodes.

Conversely if a complete binary tree has N nodes, then N = 2^{h
+ 1} – 1, where h is the height of the tree. And so h = log₂(N + 1) – 1 ≈ log₂(N) – 1 = O(log N).

binary trees in Python

We can represent a binary tree in Python using node objects, similarly to how we represent linked lists. Here is a node type for a binary tree of integers:

class Node:
  def __init__(self, val, left, right):
    self.val = val
    self.left = left      # left child, or None if absent
    self.right = right    # right child, or None if absent

We will generally refer to a tree using a reference to its root. We use None to represent the empty tree, just as we used None for the empty linked list. In all leaf nodes, left and right will be None.

Here is a small binary tree with just 3 nodes:

tree

We can build this in Python as follows:

q = Node(7, None, None)
r = Node(5, None, None)
p = Node(4, q, r)

To build larger trees, we will write functions that use loops or recursion.

Here is a function that computes the sum of all values in a binary tree:

def treeSum(node):
  if node == None:
    return 0
  return node.val + treeSum(node.left) + treeSum(node.right)

It is much easier to write this function recursively than iteratively. Recursion is a natural fit for trees, since the pattern of recursive calls in a function like this one can mirror the tree structure.

binary search trees

A binary search tree is a binary tree in which the values are ordered in a particular way that makes searching easy: for any node N with value v,

all values in N's left subtree are less than v
all values in N's right subtree are greater than v

Here is a binary search tree of integers:

tree

We can use a binary search tree to store a set supporting the add, remove, and contains operations that we described above. To do this, we'll write a TreeSet class that holds the current root of a binary tree:

class TreeSet:
  def __init__(self):
    self.root = None

contains

It is not difficult to find whether a binary tree contains a given value k. We begin at the root. If the root's value is k, then we are done. Otherwise, we compare k to the root's value v. If k < v, we move to the left child; if k > v, we move to the right child. We proceed in this way until we have found k or until we hit None, in which case k is not in the tree. Here's how we can implement this in the TreeSetclass:

def contains(self, x):
    n = self.root
    while n != None:
      if x == n.val:
        return True
      if x < n.val:
        n = n.left
      else:
        n = n.right
    return False

add

Inserting a value into a binary search tree is also pretty straightforward. Beginning at the root, we look for an insertion position, proceeding down the tree just as in the above algorithm for contains. When we reach an empty left or right child, we create a node there. In the TreeSet class:

# add a value, or do nothing if already present
  def add(self, x):
    n = self.root
    if not n:
      self.root = Node(x, None, None)
      return
      
    while n.val != x:
      if x < n.val:
        if n.left:
          n = n.left
        else:
          n.left = Node(x, None, None)
          break
      elif x > n.val:
        if n.right:
          n = n.right
        else:
          n.right = Node(x, None, None)
          break

remove

Deleting a value from a binary search tree is a bit trickier. It's not hard to find the node to delete: we just walk down the tree, just like when searching or inserting. Once we've found the node N we want to delete, there are several cases.

If N is a leaf (it has no children), we can just remove it from the tree.
If N has only a single child, we replace N with its child. For example, we can delete node 15 in the binary tree above by replacing it with 18.
If N has two children, then we will replace its value by the next highest value in the tree. To do this, we start at N's right child and follow left child pointers for as long as we can. This wil take us to the smallest node in N's right subtree, which must be the next highest node in the tree after N. Call this node M. We can easily remove M from the right subtree: M has no left child, so we can remove it following either case (a) or (b) above. Now we set N's value to the value that M had.

As a concrete example, suppose that we want to delete the root node (with value 10) in the tree above. This node has two children. We start at its right child (20) and follow its left child pointer to 15. That’s as far as we can go in following left child pointers, since 15 has no left child. So now we remove 15 (following case b above), and then replace 10 with 15 at the root.

We won't give an implementation of this operation here, but writing this yourself is an excellent (and somewhat challenging) exercise.

running time of binary search tree operations

It is not difficult to see that the add, remove and contains operations described above will all run in time O(h), where h is the height of a binary search tree. What is their running time as a function of N, the number of nodes in the tree?

First consider a complete binary search tree. As we saw above, if the tree has N nodes then its height is h = log₂(N + 1) – 1 ≈ log₂(N) – 1 = O(log N). So add, remove, and contains will all run in time O(log N).

Even if a tree is not complete, these operations will run in O(log N) time if the tree is not too tall given its number of nodes N – specfically if its height is O(log N). We call such a tree balanced.

Unfortunately not all binary trees are balanced. Suppose that we insert values into a binary search tree in ascending order:

t = TreeSet()
for i in range(1, 1001):
  t.add(i)

The tree will look like this:

tree

This tree is completely unbalanced. It basically looks like a linked list with an extra None pointer at every node. add, remove and contains will all run in O(N) on this tree.

How can we avoid an unbalanced tree such as this one? There are two possible ways. First, if we insert values into a binary search tree in a random order then that the tree will almost certainly be balanced. We will not prove this fact here (you might see a proof in the Algorithms and Data Structures class next semester).

Unfortunately it is not always practical to insert in a random order – for example, we may be reading a stream of values from a network and may need to insert each value as we receive it. So alternatively we can use a more advanced data structure known as a self-balancing binary tree, which automatically balances itself as values are inserted. Two examples of such structures are red‑black trees and AVL trees. We will not study these in this course, but you will see them in Algorithms and Data Structures next semester. For now, you should just be aware that they exist.

Introduction to Algorithms Week 8: Notes

recursion on linked lists

sets and maps

binary trees

binary trees in Python

binary search trees

contains

add

remove

running time of binary search tree operations

Introduction to Algorithms
Week 8: Notes