Week 5: Notes

foldl

The built-in higher-order predicate foldl can combine elements of a list using an arbitrary predicate. Let's see foldl in action:

add(I, J, K) :- I + J #= K.

?- foldl(add, [3, 4, 5], 0, S).
S = 12

The code above adds the members of the list [3, 4, 5], starting with the value 0. In other words, it computes ((0 + 3) + 4) + 5. This is called a left fold because it combines values from the left, just like the accumulators that we wrote in last week's lecture.

In general, foldl takes four arguments: a predicate of three arguments, a list, a starting value for the accumulator and a final value for the accumulator. If we write

foldl(P, [a, b, c], A, R)

then foldl will make a series of calls to P, with this effect:

P(a, A, A1)

P(b, A1, A2)

P(c, A2, R)

Here is an efficient implementation of reverse(), using foldl:

prepend(X, L, [X | L]).

reverse(L, M) :- same_length(L, M), foldl(prepend, L, [], M).

We can easily implement fold() using call(). The implementation will look similar to the accumulator pattern that we saw in the previous lecture:

foldl(_, [], A, A).

foldl(P, [X | L], A, R) :- call(P, X, A, A1), foldl(P, L, A1, R).

Another version of foldl takes five arguments: a predicate of four arguments, two lists, a starting value for the accumulator and a final value for the accumulator. If we write

foldl(P, [a, b, c], [d, e, f], A, R)

then foldl will make a series of calls to P, with this effect:

P(a, d, A, A1)

P(b, d, A1, A2)

P(c, f, A2, R)

In other words, each call to P receives two elements, one from each of the input lists, and uses them to compute the next accuulator value. For example, we can use this version of foldl to compute a dot product:

mul_add(X, Y, A, A1) :- A1 #= A + X * Y.

?- foldl(mul_add, [1, 2, 3], [4, 5, 6], 0, N).
N = 32

sorting

We may implement sorting algorithms such as insertion sort, merge sort or quicksort easily in Prolog.

For example, let's implement insertion sort. As a first step, here's a predicate that can insert a value into a sorted list:

insert(X, [], [X]).

insert(X, [Y | Z], [X, Y | Z]) :- X #=< Y.

insert(X, [Y | Z], [Y | L]) :- X #> Y, insert(X, Z, L).

Let's try it:

?- insert(15, [10, 12, 14, 16], L).
L = [10, 12, 14, 15, 16]

?- insert(A, [10, 12, 16], [10, 12, 14, 16]).
A = 14

Now we can write insertion sort recursively:

isort([], []).

isort([X | L], M) :- same_length([X | L], M), isort(L, LS), insert(X, LS, M).

It works both forwards and backwards:

?- isort([5, 2, 8, 1, 10, 3], L).
L = [1, 2, 3, 5, 8, 10]

?- isort(L, [10, 15, 20]).
L = [10, 15, 20] ;
L = [15, 10, 20] ;
L = [20, 10, 15] ;
L = [10, 20, 15] ;
L = [15, 20, 10] ;
L = [20, 15, 10].

?- isort(L, [10, 20, 15]).
false.

How long will our predicate take to sort N numbers? In the best case, the input list will already be sorted. Then insert() will run in O(1) on each iteration, so the total time will be O(N). In the worst (and expected) case, insert() will take O(M) to insert into a sorted list of M numbers, so the total running time will be O(N + (N - 1) + ... + 1) = O(N²). These are the same best-case and worst-case times that we would expect from an implementation of insertion sort in a procedural language.

functional data structures

In Prolog we can build common data structures such as stacks, queues, and binary trees. However, since Prolog has no mutable state, the interfaces to these structures will look a bit different than in procedural languages. Every operation on a data structure S, such as pushing a value onto a stack or enqueuing into a queue, will return a new version of the data structure, since S itself cannot be modified.

As we will see, the implementations of functional data structures must also sometimes be different than in procedural languages.

stacks

A functional stack is trivial to implement in Prolog, since a Prolog list itself can act as a stack.

% S is an empty stack.
empty_stack([]). 

% We can push X onto S to make the stack [X | S].
push(X, S, [X | S]).

% We can pop X from [X | S] to make the stack S.
pop(X, [X | S], S).

We can push N numbers onto an empty stack using recursion:

% S is the empty stack with the numbers 1 .. N pushed onto it.
push_to_n(0, []).
push_to_n(N, S) :- N #> 0, N1 #= N - 1, push_to_n(N1, S1), push(N, S1, S).

Or using foldl:

push_to_n2(N, S) :- numlist(1, N, NL), foldl(push, NL, [], S).

queues

We may naively implement a functional queue using a list. When we enqueue values, we append them to the end fo the list. We dequeue values by removing them from the front:

% empty_queue(Q) - Q is an empty queue
empty_queue([]).

% enqueue(X, Q, R) - we can enqueue X onto Q to make R
enqueue(X, Q, R) :- append(Q, [X], R).

% dequeue(X, Q, R) - we can dequeue X from Q to make R
dequeue(X, [X | R], R).

We may use foldl to enqueue a series of values:

?- numlist(1, 10, _L), foldl(enqueue, _L, [], Q).
Q = [1, 2, 3, 4, 5, 6, 7, 8, 9|...].

In fact we may even use foldl to dequeue several values at once:

?- numlist(1, 10, _L), foldl(enqueue, _L, [], _Q),
   length(M, 3), foldl(dequeue, M, _Q, _).
M = [1, 2, 3].

However our naive implementation is inefficient. enqueue() runs in O(N) when there are N values in the queue. Queueing N values successively onto an empty queue will take O(N²).

Alternatively, we may implement a functional queue using two lists, which we will call the front and back lists. For example, here is one such queue:

[a, b, c] / [f, e, d]

To enqueue a value, we prepend it to the back list. To dequeue a value, we remove it from the head of the front list. For example, we may enqueue 'g' and then dequeue 'a' and b' from the queue above, yielding this queue:

[c] / [g, f, e, d]

When dequeueing, if the front list is empty then we reverse the back list and move it to the front. For example, if we dequeue 'c' from the queue above, then the front list will become empty. The next time we dequeue, we will reverse the list [g, f, e, d] to form [d, e, f, g]. We will dequeue 'd', and the remaining items will now be at the front:

[e, f, g] / []

Here is a Prolog implementation:

empty_queue([] / []).

enqueue(X, F / B, F / [X | B]).

dequeue(X, [X | F] / B, F / B).
dequeue(X, [] / B, F / []) :- reverse(B, [X | F]).

With this implementation, enqueue() runs in O(1).

How will dequeue() perform? Suppose that we start with an empty queue, and enqueue and dequeue N values via some series of enqueue and dequeue operations. There will be some number of flips, i.e. operations in which we move values from the back to the front. Each flip of K values will take O(K), since reverse() runs in linear time. The total number of values flipped will be N, since each enqueued value will move to the front exactly once. So the total cost of all the flip operations will be O(N). And so the total time for all N dequeue operations will be O(N) (for removing values from the front list) + O(N) (for all the flip operations) = O(N). And so dequeue() will run in O(1) on average.

binary trees

It is straightforward to represent a binary tree in Prolog. We can represent an empty binary tree as the atom nil, and a non-empty tree as t(L, X, R), where X is the value in the root node and L and R are left and right subtrees. For example, consider this tree:

Here is a predicate defining its representation:

my_tree(T) :- T = t(t(nil, 2, nil),
                    5,
                    t(t(nil, 8, nil), 10, t(nil, 12, nil))).

It's straightforward to write a recursive predicate to count the nodes in a tree:

count(nil, 0).
count(t(L, _, R), N) :- N #>= 1, LN #>= 0, RN #>= 0, N #= LN + RN + 1,
                        count(L, LN), count(R, RN).

?- my_tree(_T), count(_T, N).
N = 5.

We may even generate all trees with a given number of nodes:

?- count(T, 3).
T = t(nil, _, t(nil, _, t(nil, _, nil))) ;
T = t(nil, _, t(t(nil, _, nil), _, nil)) ;
T = t(t(nil, _, nil), _, t(nil, _, nil)) ;
T = t(t(nil, _, t(nil, _, nil)), _, nil) ;
T = t(t(t(nil, _, nil), _, nil), _, nil)

Let's write a predicate maptree(P, T) that succeeds if a predicate P is true for every value in a tree:

maptree(_, nil).
maptree(P, t(L, X, R)) :- call(P, X), maptree(P, L), maptree(P, R).

?- count(T, 2), maptree(=(5), T).
T = t(nil, 5, t(nil, 5, nil)) ;
T = t(t(nil, 5, nil), 5, nil)

the cut

Normally Prolog generates all possible solutions to any query. In some situations, however, we may only want a single solution. The cut operator ! causes Prolog to commit to all choices that it has already made in the current query or predicate. After a cut, Prolog will not backtrack to look for more possible solutions. For example:

?- member(X, [2, 4, 6, 8, 10]), X #> 5, !.
X = 6.

?- append(L, M, [a, b, c]), !.
L = [],
M = [a, b, c].

Here are some more examples:

?- (X = 3; X = 4, !; X = 5).
X = 3 ;
X = 4.

?- (Z = 1; Z = 2), (X = 3; X = 4, !; X = 5).
Z = 1,
X = 3 ;
Z = 1,
X = 4.

Prolog will, however, continue to explore choices that appear to the right of the cut:

?- (X = 3; X = 4, !; X = 5), (Y = 10; Y = 20).
X = 3,
Y = 10 ;
X = 3,
Y = 20 ;
X = 4,
Y = 10 ;
X = 4,
Y = 20.

If a predicate is written using multiple rules, then the cut will affect it just as if it were written as a single rule using the ';' operator. In other words, after a cut no further rules will be considered. For example:

foo(X) :- X = 3.
foo(X) :- X = 4, !.
foo(X) :- X = 5.

?- foo(X).
X = 3 ;
X = 4.

?-

the cut and logical purity

Almost all the Prolog features and predicates that we have seen until now are part of the pure logical core of the language. For example, dif(), member(), append(), select(), maplist(), foldl(), and new-style arithmetic are all logically pure.

As we've noted before, any program written only using these pure logical features has some nice properties: predicates run in multiple directions, query results are sound and complete, and reordering goals will not change the results of a query (though it may affect termination).

The cut can be useful in various situations, however it is not logically pure. This means that a program that uses the cut, or features derived from the cut, will probably not have the properties above.

Let's look at a couple of examples. Consider this predicate:

kind(N, small) :- N #< 10.
kind(N, medium) :- N #>= 10, N #< 100.
kind(N, large) :- N #>= 100.

We might attempt to simplify it using the cut as follows:

kind2(N, small) :- N #< 10, !.
kind2(N, medium) :- N #< 100, !.
kind2(_, large).

With this change, any query of the form kind2(N, K) will still produce a correct value K if N is already instantiated and K is not:

?- kind2(50, K).
K = medium.

?- kind2(200, K).
K = large.

However, if N is uninstantiated then we may now receive answers that are incorrect:

?- kind2(N, large).
true.

The preceding answer claims that all N are big, but that is not true. Similarly, if K is already instantiated then we may also receive incorrect answers:

?- kind2(5, medium).
true.

Furthermore, the results of our query may now depend on the order in which we ask questions:

?- N = 50, K = medium, kind2(N, K).
N = 50,
K = medium.

?- kind2(N, K), N = 50, K = medium.
false.

As another example, suppose that we use the cut to write a version of member(X, L) that will succeed only once, even if X is in the list L multiple times:

mem1(X, [X | _]) :- !.
mem1(X, [_ | L]) :- mem1(X, L).

In the forward direction, it works as intended:

?- mem1(3, [2, 3, 3, 5, 4, 3]).
true.

?-

However in the reverse direction it now returns results that are incomplete:

?- mem1(3, L).
L = [3|_].

?-

As a result, once again we may get different results if we reorder goals in a query:

?- L = [2, 3, 4], mem1(3, L).
L = [2, 3, 4].

?- mem1(3, L), L = [2, 3, 4].
false.

In summary, the cut works well in a forward direction, i.e. at a moment when all variables are already instantiated. So, for example, it's reasonable to use a cut at the top level of a program that is searching for solutions in a state space, at a moment when a first solution is known and you don't want to search for more solutions. However, in situations where code may need to run multidirectionally, the cut is potentially dangerous and requires you to think about your program in a more procedural way.

negation

We may use the cut to implement negation. For example, let's write a predicate not_mem(X, L) that is true if X is not a member of L:

not_mem(X, L) :- member(X, L), !, false; true.

This predicate works as follows. If X is a member of L, then the predicate is false, and the cut prevents it from considering the other branch. Otherwise, the other branch runs and the predicate is true:

?- not_mem(3, [4, 5, 6]).
true.

?- not_mem(3, [4, 3, 6]).
false.

We may generalize this idea by writing a higher-order predicate 'not':

not(P) :- call(P), !, false; true.

?- not(member(3, [4, 5, 6])).
true.

In fact this predicate is built into Prolog, and has the name \+:

?- \+ member(3, [4, 5, 6]).
true.

Because Prolog's negation is defined using the cut, it will really only work in a forward direction. If you apply negation to a term in which all variables are already instantiated, it will behave as you expect. If variables are not instantiated, you may receive results that are unsound or incomplete. For example, consider this query:

?- \+ member(X, [2, 3, 4]).
false

This result is claiming that there are no values X which are not members of the list [2, 3, 4]. But the claim is false, since certainly such X do exist:

?- X = 10, \+ member(X, [2, 3, 4]).
X = 10.

As another example, suppose that we have the following predicates about restaurants:

good_food(ferdinand).
good_food(beseda).

expensive(beseda).

cheap(R) :- \+ expensive(R).

I'm looking for a restaurant that has good food and is inexpensive. This query lets me find one:

?- good_food(R), cheap(R).
R = ferdinand

However if I ask the questions in the other order, the query fails:

?- cheap(R), good_food(R).
false

if/then/else

We may also use the cut to implement an if/then/else operation. For example, here is a predicate max() which computes the maximum Z of two values X and Y:

max(X, Y, Z) :- X #>= Y, !, Z = X; Z = Y.

?- max(4, 10, Z).
Z = 10.

?- max(7, 3, Z).
Z = 7.

If the goal 'X #>= Y' succeeds, then Z = X and the cut prevents Prolog from considering the other choice. If 'X #>= Y' fails, then Z = Y.

Prolog includes an arrow operator '->' which is defined using the cut, and lets us write this if/then/else choice with an (arguably) nicer syntax:

max(X, Y, Z) :- X #>= Y -> Z = X; Z = Y.

Once again, because this operator is defined using the cut, it will really only work in a forward direction, and may give us unsound or incomplete results in a reverse direction. For example:

?- max(4, X, 10).
false

This result is claiming that there are no values X such that the maximum of 4 and X is 10. Of course, that is wrong.

graphs

There are various possible ways to represent a graph in Prolog. Consider this acylic directed graph:

We could represent the graph using a series of facts:

edge(a, b).
edge(a, e).
edge(b, c).
...

This is sometimes called a temporal representation, because a query such as edge(a, V) will return a series of solutions over time.

Alternatively, we could use an adjacency list representation:

graph1(G) :- G =
    [ a -> [b, e], b -> [c, f], c -> [],
      d -> [], e -> [d, f], f -> [c],
      g -> [d, h], h -> [f] ].

This is sometimes called a spatial representation, since the graph is represented using a data structure.

We can implement graph search algorithms on either sort of graph. Let's choose the adjacency-list representation. We can begin by writing a predicate that can tell whether an edge exists in a graph:

% True if G has an edge from V to W.
edge(G, V, W) :- member(V -> L, G), member(W, L).

depth-first search

Here is a predicate path() that can find a path between two vertices in a graph in adjacency-list representation.

% path(G, V, W, P): True if P is a path from V to W in graph G.

path(_, V, V, [V]).

path(G, V, W, [V | P]) :- edge(G, V, X), path(G, X, W, P).

This predicate is somewhat similar to the ancestor() predicate that we wrote in the first lecture. However, unlike that predicate it builds a path P from the start to the goal.

Let's use path() to find all paths from a to c in the graph above:

?- graph1(_G), path(_G, a, c, P).
P = [a, b, c] ;
P = [a, b, f, c] ;
P = [a, e, f, c]

The predicate will even work multidirectionally:

?- graph1(_G), path(_G, V, W, [a, e, f, c]).
V = a,
W = c

?- graph1(_G), path(_G, a, V, P).
V = a,
P = [a] ;
V = b,
P = [a, b] ;
V = c,
P = [a, b, c] ;
V = f,
P = [a, b, f]
...

Our undirected graph above was acyclic. Let's now add an edge from c to a to make the graph cyclic:

graph2(G) :- G =
    [ a -> [b, e], b -> [c, f], c -> [a],
      d -> [], e -> [d, f], f -> [c],
      g -> [d, h], h -> [f] ].

Let's try using path() to search this graph:

?- graph2(_G), path(_G, a, c, P).
P = [a, b, c] ;
P = [a, b, c, a, b, c] ;
P = [a, b, c, a, b, c, a, b, c]
...

?- path(_G, a, e, P)
<hang>

We see it fails to find a path from a to e. That's because the search walks around the cycle a-b-c in an infinite loop, so it will never consider the edge from a to e.

So how can we search in a graph (or state space) that may contain cycles? Let's look at three possible ways do that.

iterative deepening

Iterative deepening performs a series of depth-first searches, each of which has a maximum depth. The first depth-first search has depth limit 1, the second has limit 2, and so on. Iterative deepening will always find a shortest path to a goal. It will work fine even if graphs with cycles. Although it will explore paths around cycles, that will not cause it to go into an infinite loop since its search depth is always limited.

We can easily implement iterative deepening in Prolog using a simple trick, as follows:

% iter_deep(G, V, W, P): True if P is a path from V to W in G.
% Return the shortest path first.
iter_deep(G, V, W, P) :- length(P, _), path(G, V, W, P).

Howe does this work? The goal length(P, _) will generate uninstantiated paths of increasing lengths:

?- length(P, _).
P = [] ;
P = [_] ;
P = [_, _] ;
P = [_, _, _] ;
P = [_, _, _, _]
...

So iter_deep will make a series of calls to path, each of which looks for a path of a particular length.

Unlike path(), this predicate returns the shortest path from b to f as its first result:

?- graph2(_G), iter_deep(_G, b, f, P).
P = [b, f] ;
P = [b, c, a, e, f]
<hang>

Iterative deepening is sometimes a useful technique. However, this implementation will never terminate, since it will keep searching for longer and longer paths. Of course, we can use the cut if we want only a single solution:

% iter_deep(G, V, W, P): True if P is the shortest path from V to W in G.
iter_deep(G, V, W, P) :- length(P, _), path(G, V, W, P), !.

However, the predicate will still fail to terminate if there is no path V to W.

Another problem is that this predicate considers all possible paths of length (N - 1) before any path of length N. And unfortunately there may be exponentially many of these.

avoiding self-intersecting paths

As another possible approach, we can amend our path() predicate so that it avoids selecting any vertex that is already in the path that it has built so far. It might look like this:

not_mem(X, L) :- maplist(dif(X), L).

% extend(G, P, Goal, R): true if P can be extended to reach Goal, producing path R

extend(_, P, Goal, R) :- P = [Goal | _], reverse(P, R).

extend(G, [X | P], Goal, R) :- dif(X, Goal), edge(G, X, Y), not_mem(Y, [X | P]),
    extend(G, [Y, X | P], Goal, R).

% path(G, V, W, Path): True if Path is a path from V to W in graph G.

path2(G, V, W, Path) :- extend(G, [V], W, Path).

path2() can search successfully even in a cyclic graph:

?- graph2(_G), path2(_G, a, c, P).
P = [a, b, c] ;
P = [a, b, f, c] ;
P = [a, e, f, c]

?- graph2(_G), path2(_G, a, e, P).
P = [a, e]

This is a step forward, but path2() still has a couple of limitations. First, since it is a depth-first search it is not guaranteed to find the shortest path between two vertices:

?- graph2(_G), path2(_G, b, f, P).
P = [b, c, a, e, f]

Also, this predicate may be inefficient. Unlike our implementation of depth-first search in Introduction to Algorithms last year, this predicate does not use a visited set shared across the entire search. Instead, it only avoids self-intersecting paths. In some graphs this may be much less efficient than using a visited set, since there may be (exponentially) many paths that do not intersect themselves, and path2() may explore many of these.

breadth-first search

(We didn't have time to cover this topic in the lecture. It will not be required on the exam this year, but I'll include it in the notes for completeness.)

A breadth-first search with a visited set will visit each state only once, so it may be far more efficient than iterative deepening or a search that only avoids self-intersecting paths.

A breadth-first search needs a queue. For the moment, we will use a naive queue representation, i.e. a list, where we enqueue by appending to the list. We also need a visited set. For the moment, we will also use a naive visited set representation, i.e. a list.

When the breadth-first search reaches the goal, we would like to return a list of all states on the path from the start to the goal. For this reason, each element on the queue will not be a simple vertex V. Instead, it will be a list of vertices on a path from the start to V, in reverse order. Because this list is in reverse order, we can efficiently extend it by prepending any neighbor of V to it.

Here's one possible solution:

prepend_to(L, X, [X | L]).

% bfs(+Graph, +Queue, +Visited, +Goal, -Path)
% True if Path is a path of states leading from the start to Goal.

bfs(_, [[Goal | P1] | _], _, Goal, Path) :- reverse([Goal | P1], Path).

bfs(Graph, [[V | P1] | Q], Visited, Goal, Path) :-
    member(V -> Neighbors, Graph),
    subtract(Neighbors, Visited, Neighbors1), append(Neighbors1, Visited, Visited2),
    maplist(prepend_to([V | P1]), Neighbors1, Neighbors2), append(Q, Neighbors2, Q2),
    bfs(Graph, Q2, Visited2, Goal, Path).

% top-level entry point
bfs(Graph, Start, Goal, Path) :- bfs(Graph, [[Start]], [Start], Goal, Path).

Of course, we could improve this predicate by using more efficient data structures for the queue and visited set. For the queue, we could use the front/back data structure that we saw last week. For a visited set, SWI-Prolog includes libraries 'assoc' and 'rbtree' that implement efficient dictionaries using balanced binary trees. We could use one of these data structures, representing a set as a dictionary from keys to any fixed value.

searching in state spaces

We may use graph search algorithms to search in state spaces, which allows us to solve many sorts of problems and puzzles.

As a first trivial example, consider this problem. There are three coins on a table. Initially the first two coins are heads up, and the third is tails up:

H H T

We would like to make three coin flips and end up in a state where the three coins are either all heads up or all tails up. We want to write a Prolog predicate that can determine all possible solutions.

Let's represent a state via a list of three atoms, e.g. [h, h, t]. We can write a predicate next() that defines how we may move from one state to the next:

flip(h, t).
flip(t, h).

% next(S, T) - from state S we can make a move to state T.
next(S, T) :- select(X, S, Y, T), flip(X, Y).

For example:

?- next([h, h, t], T).
T = [t, h, t] ;
T = [h, t, t] ;
T = [h, h, h]

Now let's write a predicate path() that can find all paths from a state S to any final state:

final([h, h, h]).
final([t, t, t]).

% path(S, P): P is a path from S to a final state.
path(S, [S]) :- final(S).

path(S, [S | P]) :- next(S, U), path(U, P).

We want to make three coin flips, so any valid path will have 4 states since it includes both the initial and final states. We can now find all possible solutions:

?- length(P, 4), path([h, h, t], P).
P = [[h, h, t], [t, h, t], [h, h, t], [h, h, h]] ;
P = [[h, h, t], [t, h, t], [t, h, h], [h, h, h]] ;
P = [[h, h, t], [h, t, t], [h, h, t], [h, h, h]] ;
P = [[h, h, t], [h, t, t], [h, t, h], [h, h, h]] ;
P = [[h, h, t], [h, h, h], [t, h, h], [h, h, h]] ;
P = [[h, h, t], [h, h, h], [h, t, h], [h, h, h]] ;
P = [[h, h, t], [h, h, h], [h, h, t], [h, h, h]]

Let's try to find any solution at all from the starting state:

?- path([h, h, t], P).
<hang>

This query hangs because the depth-first search is considering an infinite path in which it flips the first coin at every step: [h, h, t], [t, h, t], [h, h, t], ... It will never find a solution.

Instead, we can use iterative deepening to find all solutions in increasing order of length:

?- length(P, _), path([h, h, t], P).
P = [[h, h, t], [h, h, h]] ;
P = [[h, h, t], [t, h, t], [t, t, t]] ;
P = [[h, h, t], [h, t, t], [t, t, t]] ;
P = [[h, h, t], [t, h, t], [h, h, t], [h, h, h]] ;
P = [[h, h, t], [t, h, t], [t, h, h], [h, h, h]] ;
P = [[h, h, t], [h, t, t], [h, h, t], [h, h, h]] ;
P = [[h, h, t], [h, t, t], [h, t, h], [h, h, h]]
...

Notice that some of these solutions (such as the fourth one listed) contain repeating states. To prevent that, we could either avoid self-intersecting paths or implement a breadth-first search with a visited set. We described both of these techniques above. The most efficient solution will be a breadth-first search that uses efficient data structures to hold the queue and visited set.