Introduction to Algorithms
Week 12: Notes

Some of this week's topics are covered in Problem Solving with Algorithms:

8. Graphs and Graph Algorithms
- 8.7. The Word Ladder Problem
- 8.8. Building the Word Ladder Graph
- 8.9. Implementing Breadth First Search
- 8.10. Breadth First Search Analysis

Here are some additional notes on these topics:

breadth-first search
non-recursive depth-first search
finding the shortest distance between vertices
storing a depth-first tree
example: maze solvability

breadth-first search

Starting from some node N, a breadth-first search first visits nodes adjacent to N, i.e. nodes of distance 1 from N. It then visits nodes of distance 2, and so on. In this way it can determine the shortest distance from N to every other node in the graph.

We can implement a breadth-first search using a queue. Just like with depth-first graph search, we must remember all nodes that we have visited to avoid walking in circles. We begin by adding the start node to the queue and marking it as visited. In a loop, we repeatedly remove nodes from the queue. Each time we remove an node, we mark all of its adjacent unvisited nodes as visited and add them to the queue. The algorithm terminates once the queue is empty, at which point we will have visited all reachable nodes.

The queue represents the frontier. When we remove a node from the queue, it moves to the explored set. Just like with depth-first graph search, the visited nodes are the frontier nodes and the nodes in the explored set.

As the algorithm runs, all nodes in the queue are at approximately the same distance from the start node. To be more precise, at every moment in time there is some value d such that all nodes in the queue are at distance d or (d + 1) from the start node.

Let's revisit the Europe graph that we saw in the last lecture. Here is a breadth-first search in progress, starting from Austria:

map
To implement a breadth-first search in Python, we need a queue data structure. We can conveniently use the deque (double-ended queue) class found in the collections module. Since a deque is double-ended, we can use it as a queue in either of two ways: we can either

call d.appendleft() to enqueue, and d.pop() to dequeue, or
call d.append() to enqeue, and d.popleft() to dequeue

All of these dequeue operations are fast, so these approaches should be equally efficient, and we can choose one arbitrarily.

Here is a Python function bfs() that performs a breadth-first search. It takes a graph in adjacency-list representation, plus a start vertex:

import collections

# breadth-first search
def bfs(g, start):
    q = deque()
    q.appendleft(start)   # enqueue
    visited = { start }
    
    while q:
        node = q.pop()
        print('exploring ' + node)
        for n in g[node]:
            if n not in visited:
                visited.add(n)
                q.appendleft(n)   # enqueue

Note that we must mark nodes as visited when we add them to the queue, not when we remove them. (If we marked them as visited only when removing them, then our algorithm could add the same node to the queue more than once.)

Like a depth-first search, a breadth-first search does a constant amount of work for each vertex and edge, so it also runs in time O(V + E).

non-recursive depth-first search

Suppose that we replace the queue in our preceding breadth-first search function with a stack. The function will now perform a depth-first search!

# iterative depth-first search
def dfs_iter(g, start):
    q = []
    q.append(start)
    visited = { start }
    
    while q:
        node = q.pop()
        print('exploring ' + node)
        for n in g[node]:
            if n not in visited:
                visited.add(n)
                q.append(n)     # push

Specifically, this is a non-recursive depth-first search, or a depth-first search with an explicit stack.

This shows that there is a close relationship between stacks and depth-first search. Specifically, a stack is a LIFO (last in first out) data structure. And when we perform a depth-first search, the last frontier node we discover is the first that we will expand by following its edges. Similarly, a queue is a FIFO (first in first out) data structure, and in a breadth-first search the first frontier node we discover is the first that we will expand.

It is sometimes wise to implement a depth-first search non-recursively in a situation where the search may be very deep. This will avoid running out of call stack space, which is fixed on most operating systems.

finding the shortest distance between vertices

Let's write a function that takes a graph and two vertex ids and returns an integer representing the length of the shortest path between the vertices. To do so, we will modify our breadth-first search implementation from above. Each queue element will now contain a pair (v, d), where v is a vertex ID and d is the length of the shortest path from the start vertex to v.

# breadth-first search with distances
def bfs_dist(g, start, end):
    if start == end:
        return 0
        
    q = deque()
    q.appendleft((start, 0))
    visited = { start }
    
    while q:
        v, dist = q.pop()
        for w in g[v]:
            if w == end:
                return dist + 1
            if w not in visited:
                visited.add(w)
                q.appendleft((w, dist + 1))
                
    return -1    # no path found

storing a depth-first tree

Just like depth-first search, breadth-first search traverses a tree which spans the original graph. A breadth-first tree indicates a shortest path from the start node to every other node in the graph. (Note, however, that the shortest path between two graph nodes is not necessarily unique.) Here is a breadth-first tree for the Europe graph:

map
We will sometimes store a breadth-first tree as a directed graph in memory with the arrows pointing in the other direction, i.e. toward the start node. This is sometimes called an in-tree. In this representation each node points toward its predecessor in the breadth-first graph. Here is the previous breadth-first tree as an in-tree:

map

Let's modify the function in the previous section so that it prints the actual shortest path between a given pair of vertices. To do this, we need to store the breadth-first tree that is generated by our breadth-first search. We will store an in-tree as depicted above.

We will write a new class Node, representing a node of the breadth-first tree. Each Node will contain

a vertex ID
a depth, i.e. the length of the shortest path from the start vertex to this vertex
a reference to a parent Node

The queue will now hold Node objects.

Here is our implementation:

class Node:
    def __init__(self, id, depth, parent):
        self.id = id
        self.depth = depth
        self.parent = parent

# breadth-first search finding shortest path
def _bfs_shortest(g, from_id, to_id):
    node = Node(from_id, 0, None)
    if from_id == to_id:
        return node
        
    q = deque()
    q.appendleft(node)   # enqueue
    visited = { from_id }
    
    while q:
        node = q.pop()
        print(f'{node.id} has distance {node.depth}')
        for id in g[node.id]:
            if id not in visited:
                visited.add(id)
                new = Node(id, node.depth + 1, node)
                if id == to_id:     # found destination
                    return new
                q.appendleft(new)   # enqueue
    return None
    
def bfs_shortest(g, from_id, to_id):
    n = _bfs_shortest(g, from_id, to_id)
    if n == None:
        print('no path')
        return
        
    l = []
    while n != None:
        l.append(n.id)
        n = n.parent
    print(' -> '.join(l[::-1]))

example: maze solvability

We can use our graph search algorithms to solve many problems that have a graph-like structure, even if they are not explicitly problems about graphs.

For example, let's write a function to determine whether a maze is solvable. Specifically, we will write a function that takes a list of list of booleans representing a rectangular maze, where walls are represented by array elements that are True. The function will return a boolean indicating whether there is any path from the upper-left corner (0, 0) to the lower-right corner of the maze. We will assume that in each step we may move up, down, left, or right, but not diagonally.

We could solve this problem using either a depth-first or breadth-first search. We will use a depth-first search, implemented recursively.

We could convert the input array to a graph in adjacency-list representation, where each square in the maze is a separate graph vertex. But that is unnecessary and would be wastefully inefficient. Instead, we can implement a depth-first search directly on the maze itself. At each step, instead of iterating over an adjacency list we will iterate over the neighbors of the current maze square, i.e. the four squares that are above, below, to the left and to the right of the square. Notice how the constant array Dirs let us easily loop over the four compass directions.