Week 7: Notes

abstract data types

An abstract data type specifies a set of operations that an object may provide. In the next few lectures of this course, we will see various abstract data types including stacks, queues, sets, and dictionaries.

We will see that for most abstract data types there are different possible implementations using various data structures. We'll also see that the big-O running time of operations will often vary between implementations.

stacks

A stack is an abstract data type that provides two operations called 'push' and 'pop'. push(x) adds a value x to a stack, and pop() removes the value that was most recently pushed and returns it. This is like a stack of sheets of paper on a desk, where sheets can be added or removed at the top.

In other words, a stack is a last in first out (LIFO) data structure: the last element that was added is the first to be removed.

Often a stack will provide an additional operation is_empty() that returns true if it contains no elements.

Before we implement a stack, let's look at how one can be used:

# initially 's' is an empty stack
s.push(4)
s.push(8)
s.push(1)
  
while not s.is_empty() do
    print(pop(s))

This code will write

1
8
4

implementing a stack with an array

An easy way to implement a stack in Python is using a list, i.e. an array:

class ArrayStack:
    def __init__(self):
        self.a = []

    def is_empty(self):
        return len(self.a) == 0
  
    def push(self, x):
        self.a.append(x)
  
    def pop(self):
        assert not self.is_empty(), 'stack is empty'
        return self.a.pop()

The implementation is straightforward. The list contains all stack elements, with the top of the stack (i.e. the most recently pushed element) at the end of the list.

With this implementation, 'push' will run in O(1) on average, since that is the running time of append(). 'pop' will always run in O(1).

linked lists

Let's now study a broadly useful data structure that we might alternatively use to implement a stack, namely a linked list. A linked list looks like this:

An element of a linked list is called a node. A node contains one or more values, plus a pointer to the next node in the list. The first node of a linked list is called its head. The last node of a linked list is its tail. The tail always points to None (in Python, or its equivalent such as nil in other languages).

By the way, we will sometimes illustrate a linked list more compactly:

2 → 4 → 7 → None

The two pictures above denote the same structure; the first is simply more detailed.

Note that a Python list is not a linked list! A Python list is an array. :)

Here is a node type for a linked list in Python:

class Node:
    def __init__(self, val, next):
        self.val = val
        self.next = next

We can build the 3-element linked list pictured above as follows:

>>> r = Node(7, None)
>>> q = Node(4, r)
>>> p = Node(2, q)

Now p refers to the head of the list, and q and r refer to successive elements:

Through p, we can get to the values in the list:

>>> p.val
2
>>> p.next.val
4
>>> p.next.next.val
7

We can traverse a linked list using a loop that moves to the next list node on each iteration. Here's a function that takes a pointer to the head of a linked list, and returns its length:

def list_len(head):
    n = head
    count = 0
    while n != None:
        n = n.next      # move to the next node
        count += 1
    return count

By modifying the function only slightly, we can write a function that computes the sum of all values in a linked list:

def list_sum(head):
    n = head
    total = 0
    while n != None:
        total += n.val
        n = n.next      # move to the next node
    return total

Let's now write a function that takes an integer n and constructs a linked list containing the values from 1 to n. One easy way to do that is to build the list in reverse order, just like when we built a list above with the values 2, 4, and 7. On each step, we will prepend a node to the existing list.

def list_1_to_n():
    head = None
    for i in range(n, 0, -1):       # n, n - 1, ... 1
        p = Node(i, head)
        head = p
    return head

As this function runs, the local variable 'head' points to the beginning of the list that we've built so far. On each loop iteration, the function allocates a new node by calling the Node constructor function. As it does so, it makes the new node point to the existing list, by passing 'head' as the value for the 'next' attribute. After that, it modifies 'head' to point to the newly allocated node.

We can write many more functions that operate on linked lists, performing operations such as inserting nodes, deleting nodes, and so on. To get practice with this, we will solve various linked list exercises in our tutorials.

implementing a stack using a linked list

We can implement a stack using a linked list. Our class LinkedStack will have an attribute 'head' that points to the head of the list:

The stack operations are fairly straightforward:

class LinkedStack:
    def __init__(self):
        self.head = None
    
    def is_empty(self):
        return self.head == None
  
    def push(self, x):
        n = Node(x, self.head)

        self.head = n
  
    def pop(self):
        assert self.head != None, 'stack is empty'
        x = self.head.val
        self.head = self.head.next
        return x

Notice that the push() method allocates a node n that points to the previous head, then records n as the new list head. In other words, it prepends a new node to the list. This is similar to the activity in the list_1_to_n() function above.

If we implement a stack using an array (i.e. a Python list), the push operation will take O(1) on average but O(N) in the worst case, where N is the current stack size. Our linked list-based implementation performs differently: push always runs in O(1) (assuming that object allocations run in constant time).

queues

Queues are another important abstract data type. A queue provides two operations called 'enqueue' and 'dequeue'. enqueue(x) adds an element to the tail of a queue, and dequeue() removes the element at the head and returns it. A queue is something like people waiting in a line: you must join at the back of the line, and the person at the front of the line is served next.

In other words, queues are a first in first out (FIFO) data structure: the first value added to a queue will be the first one to be removed.

A queue can be used like this:

# initially 'q' is an empty queue

q.enqueue(4)
q.enqueue(77)
q.enqueue(12)

print(q.dequeue())  # writes 4

print(q.dequeue())  # writes 77

implementing a queue with an array

A naive implementation of a queue in Python will store elements in an array, i.e. a Python list. It might look like this:

class ArrayQueue:
    def __init__(self):
        self.a = []
    
    def is_empty(self):
        return len(self.a) == 0
    
    def enqueue(self, x):
        self.a.append(x)

    def dequeue(self):
        return self.a.pop(0)

With this implementation, is_empty() will run in O(1), and enqueue() will run in O(1) on average. However, dequeue() will run in O(N), because all array elements must shift leftward by one position when the first element is deleted. So this is a poor choice of implementation if N might be large.

implementing a queue using a linked list

We can implement a queue more efficiently using a linked list. To do so, we will keep pointers to both the head (first node) and tail (last node) in the list, which will allow us to enqueue or dequeue elements in O(1):

Here is an implementation:

class LinkedQueue:
    def __init__(self):
        self.head = self.tail = None
  
    def isEmpty(self):
        return self.head == None
  
    def enqueue(self, x):
        n = Node(x, None)
        if self.tail == None:
            self.head = self.tail = n
        else:
            self.tail.next = n
            self.tail = n
      
    def dequeue(self):
        assert self.head != None, 'queue is empty'
        x = self.head.val
        self.head = self.head.next
        if self.head == None:
            self.tail = None
        return x

sets

We have already seen stacks and queues, two abstract data types. We've seen that it's possible to implement these abstract types using various concrete data structures. For example, we can implement a stack or queue either using an array or a linked list.

Let's now introduce another abstract data type. A set provides the following operations:

s.add(value): Add a value to a set.
s.remove(value): Remove a value from a set.
s.contains(value): Test whether a value is present in a set, returning True or False.

A set cannot contain the same value twice: every value is either present in the set or it is not.

This type should be familiar, because it has a default implementation in Python that we have seen and used.

In this algorithms course, we'll study various ways to implement sets. We'd like to understand how we could build efficient sets in Python if they were not already provided in the standard library.

introduction to binary trees

A binary tree consists of a set of nodes. Each node contains a single value and may have 0, 1, or 2 children.

Here is a picture of a binary tree of integers. (Note that this is not a binary search tree, which is a special kind of binary tree that we will discuss later.)

tree

In this tree, node 10 is the root node. 14 is the parent of 12 and 6. 12 is the left child of 14, and 6 is the right child of 14. 14 is an ancestor of 22, and 22 is a descendant of 14.

A node may have 0, 1, or 2 children. In this tree, node 15 has a right child but no left child.

The subtree rooted at 14 is the left subtree of node 10. The subtree rooted at 1 is the right subtree of node 10.

The nodes 12, 5, 22, 4, and 3 are leaves: they have no children. Nodes 10, 14, 1, 6, and 15 are internal nodes, which are nodes that have at least one child.

The depth of a node is its distance from the root. The root has depth 0. In this tree, node 15 has depth 2 and node 4 has depth 3. The height of a tree is the greatest depth of any node. This tree has height 3.

The tree with no nodes is called the empty tree.

Note that a binary tree may be asymmetric: the right side might not look at all like the left. In fact a binary tree can have any structure at all, as long as each node has 0, 1, or 2 children.

A binary tree is complete iff every internal node has 2 children and all leaves have the same height. Here is a complete binary tree of height 3:

tree

A complete binary tree of height h has 2^h leaves, and has 2⁰ + 2¹ + … + 2^h-1 = 2^h – 1 interior nodes. So it has a total of 2^h+ 2^h– 1 = 2^h + 1– 1 nodes. In this tree there are 2³ = 8 leaves and 2³ – 1 = 7 interior nodes, a total of 2⁴ – 1 = 15 nodes.

Conversely if a complete binary tree has N nodes, then N = 2^{h + 1} – 1, where h is the height of the tree. And so h = log₂(N + 1) – 1 ≈ log₂(N) – 1 = O(log N).

binary trees in Python

We can represent a binary tree in Python using an object for each node, similarly to how we represent linked lists. Here is a Node class we can use for binary trees:

class Node:
    def __init__(self, val, left, right):
        self.val = val
        self.left = left      # left child, or None if absent
        self.right = right    # right child, or None if absent

We will generally refer to a tree using a pointer to its root node. We'll use None to represent the empty tree, just as we used None for the empty linked list. In all leaf nodes, left and right will be None.

Here is a small binary tree with just 3 nodes:

tree

We can build this tree in Python as follows:

l = Node(7, None, None)
r = Node(5, None, None)
root = Node(4, l, r)

Here's a recursive function that counts the number of nodes in a binary tree:

def size(n):
    if n == None:   # empty tree
        return 0

    return 1 + size(n.left) + size(n.right)

Let's try it on the tree we just built:

>>> size(root)
3

Recursion is a natural fit for trees, since the pattern of recursive calls in a function like this one can mirror the tree structure.

Here's a recursive function that computes the height of a binary tree:

def height(node):
    if node == None:
        return -1
    
    return max(height(node.left), height(node.right)) + 1

Let's try it:

>>> height(root)
1

The constant -1 in this function may seem surprising. Recall that the height of a binary tree is defined as the greatest depth of any node. If a binary tree has only a single node, then the depth of that node is 0, so the tree's height is 0. And so if a tree is empty, meaning that it has no nodes at all, then for consistency it must have a height of one less than that, or -1.

As another example, let's write a function that generates a complete binary tree of a given height h. Each node of the tree will contain a random value from 0 .. 99. Once more we will use recursion:

def complete(h):
    if h == -1:
        return None
    
    left = complete(h - 1)    # generate left subtree
    right = complete(h - 1)   # generate right subtree
    return Node(randrange(100), left, right)

Now we can easily generate a large tree:

>>> t = complete(10)
>>> size(t)
2047
>>> height(t)
10

binary search trees

A binary search tree is a binary tree in which the values are ordered in a particular way that makes searching easy. We began to discuss binary search trees in our lecture this week, but did not have time to cover them completely. We'll return to this topic in next week's lecture.