Some of this week's topics are covered in Problem Solving with Algorithms:
Here are some additional notes on these topics:
Starting from some node N, a breadth-first search first visits nodes adjacent to N, i.e. nodes of distance 1 from N. It then visits nodes of distance 2, and so on. In this way it can determine the shortest distance from N to every other node in the graph.
We can implement a breadth-first search using a queue. Just like with depth-first graph search, we must remember all nodes that we have visited to avoid walking in circles. We begin by adding the start node to the queue and marking it as visited. In a loop, we repeatedly remove nodes from the queue. Each time we remove an node, we mark all of its adjacent unvisited nodes as visited and add them to the queue. The algorithm terminates once the queue is empty, at which point we will have visited all reachable nodes.
The queue represents the frontier. When we remove a node from the queue, it moves to the explored set. Just like with depth-first graph search, the visited nodes are the frontier nodes and the nodes in the explored set.
As the algorithm runs, all nodes in the queue are at approximately the same distance from the start node. To be more precise, at every moment in time there is some value d such that all nodes in the queue are at distance d or (d + 1) from the start node.
Let's revisit the Europe graph that we saw in the last lecture. Here is a breadth-first search in progress, starting from Austria:
To
implement a breadth-first search in Python, we need a queue
data structure. We can
conveniently use the deque
(double-ended queue) class found in the collections
module. Since a deque is double-ended, we can use it as a queue in
either of two ways: we can either
call d.appendleft() to enqueue, and d.pop() to dequeue, or
call d.append() to enqeue, and d.popleft() to dequeue
All of these dequeue operations are fast, so these approaches should be equally efficient, and we can choose one arbitrarily.
Here is a Python function bfs() that performs a breadth-first search. It takes a graph in adjacency-list representation, plus a start vertex:
import collections # breadth-first search def bfs(g, start): q = deque() q.appendleft(start) # enqueue visited = { start } while q: node = q.pop() print('exploring ' + node) for n in g[node]: if n not in visited: visited.add(n) q.appendleft(n) # enqueue
Note that we must mark nodes as visited when we add them to the queue, not when we remove them. (If we marked them as visited only when removing them, then our algorithm could add the same node to the queue more than once.)
Like a depth-first search, a breadth-first search does a constant amount of work for each vertex and edge, so it also runs in time O(V + E).
Suppose that we replace the queue in our preceding breadth-first search function with a stack. The function will now perform a depth-first search!
# iterative depth-first search def dfs_iter(g, start): q = [] q.append(start) visited = { start } while q: node = q.pop() print('exploring ' + node) for n in g[node]: if n not in visited: visited.add(n) q.append(n) # push
Specifically, this is a non-recursive depth-first search, or a depth-first search with an explicit stack.
This shows that there is a close relationship between stacks and depth-first search. Specifically, a stack is a LIFO (last in first out) data structure. And when we perform a depth-first search, the last frontier node we discover is the first that we will expand by following its edges. Similarly, a queue is a FIFO (first in first out) data structure, and in a breadth-first search the first frontier node we discover is the first that we will expand.
It is sometimes wise to implement a depth-first search non-recursively in a situation where the search may be very deep. This will avoid running out of call stack space, which is fixed on most operating systems.
Let's write a function that takes a graph and two vertex ids and returns an integer representing the length of the shortest path between the vertices. To do so, we will modify our breadth-first search implementation from above. Each queue element will now contain a pair (v, d), where v is a vertex ID and d is the length of the shortest path from the start vertex to v.
# breadth-first search with distances def bfs_dist(g, start, end): if start == end: return 0 q = deque() q.appendleft((start, 0)) visited = { start } while q: v, dist = q.pop() for w in g[v]: if w == end: return dist + 1 if w not in visited: visited.add(w) q.appendleft((w, dist + 1)) return -1 # no path found
Just like depth-first search, breadth-first search traverses a tree which spans the original graph. A breadth-first tree indicates a shortest path from the start node to every other node in the graph. (Note, however, that the shortest path between two graph nodes is not necessarily unique.) Here is a breadth-first tree for the Europe graph:
We
will sometimes store a breadth-first tree as a directed graph in
memory with the arrows pointing in the other direction, i.e. toward
the start node. This is sometimes called an in-tree. In this
representation each node points toward its predecessor in the
breadth-first graph. Here is the previous breadth-first tree as an
in-tree:
Let's modify the function in the previous section so that it prints the actual shortest path between a given pair of vertices. To do this, we need to store the breadth-first tree that is generated by our breadth-first search. We will store an in-tree as depicted above.
We will write a new class Node, representing a node of the breadth-first tree. Each Node will contain
a vertex ID
a depth, i.e. the length of the shortest path from the start vertex to this vertex
a reference to a parent Node
The queue will now hold Node objects.
Here is our implementation:
class Node: def __init__(self, id, depth, parent): self.id = id self.depth = depth self.parent = parent # breadth-first search finding shortest path def _bfs_shortest(g, from_id, to_id): node = Node(from_id, 0, None) if from_id == to_id: return node q = deque() q.appendleft(node) # enqueue visited = { from_id } while q: node = q.pop() print(f'{node.id} has distance {node.depth}') for id in g[node.id]: if id not in visited: visited.add(id) new = Node(id, node.depth + 1, node) if id == to_id: # found destination return new q.appendleft(new) # enqueue return None def bfs_shortest(g, from_id, to_id): n = _bfs_shortest(g, from_id, to_id) if n == None: print('no path') return l = [] while n != None: l.append(n.id) n = n.parent print(' -> '.join(l[::-1]))
We can use our graph search algorithms to solve many problems that have a graph-like structure, even if they are not explicitly problems about graphs.
For example, let's write a function to determine whether a maze is
solvable. Specifically, we will write a function that takes a list
of list of booleans representing a rectangular maze, where
walls are represented by array elements that are True
.
The function will return a boolean
indicating whether
there is any path from the upper-left corner (0, 0) to the
lower-right corner of the maze. We will assume that in each step we
may move up, down, left, or right, but not diagonally.
We could solve this problem using either a depth-first or breadth-first search. We will use a depth-first search, implemented recursively.
We could convert the input array to a graph in adjacency-list
representation, where each square in the maze is a separate graph
vertex. But that is unnecessary and would be wastefully inefficient.
Instead, we can implement a depth-first search directly on the maze
itself. At each step, instead of iterating over an adjacency list we
will iterate over the neighbors of the current maze square,
i.e. the four squares that are above, below, to the left and to the
right of the square. Notice how the constant array Dirs
let us easily loop over the four compass directions.
# We are given a list of list of booleans representing a rectangular # maze, where True represents a wall. Return True if there is any # path from the upper-left to the lower-right corner. Dirs = [(1, 0), (-1, 0), (0, 1), (0, -1)] def solvable(maze): width = len(maze) height = len(maze[0]) visited = [[False] * height for _ in range(width)] def visit(x, y): if 0 <= x < width and 0 <= y < height and not maze[x][y] and not visited[x][y]: visited[x][y] = True if x == width - 1 and y == height - 1: return True # found path for dx, dy in Dirs: if visit(x + dx, y + dy): return True return False return visit(0, 0)