Week 5: Notes

functional data structures

In Prolog we can build common data structures such as stacks, queues, and binary trees. However, since Prolog has no mutable state, the interfaces to these structures will look a bit different than in procedural languages. Every operation on a data structure S, such as pushing a value onto a stack or enqueuing into a queue, will return a new version of the data structure, since S itself cannot be modified.

As we will see, the implementations of functional data structures must also sometimes be different than in procedural languages.

stacks

A functional stack is trivial to implement in Prolog, since a Prolog list itself can act as a stack.

% S is an empty stack.
empty_stack([]). 

% We can push X onto S to make the stack [X | S].
push(X, S, [X | S]).

% We can pop X from [X | S] to make the stack S.
pop(X, [X | S], S).

We can push N numbers onto an empty stack using recursion:

% S is the empty stack with the numbers 1 .. N pushed onto it.
push_to_n(0, S) :- empty_stack(S).
push_to_n(N, S) :- N #> 0, N1 #= N - 1, push_to_n(N1, S1), push(N, S1, S).

Or using foldl:

push_to_n2(N, S) :- empty_stack(E), numlist(1, N, NL),
                    foldl(push, NL, E, S).

queues

We may naively implement a functional queue using a list. When we enqueue values, we append them to the end fo the list. We dequeue values by removing them from the front:

% empty_queue(Q) - Q is an empty queue
empty_queue([]).

% enqueue(X, Q, R) - we can enqueue X onto Q to make R
enqueue(X, Q, R) :- append(Q, [X], R).

% dequeue(X, Q, R) - we can dequeue X from Q to make R
dequeue(X, [X | R], R).

However this naive implementation is inefficient. enqueue() runs in O(N) when there are N values in the queue. Queueing N values successively onto an empty queue will take O(N2).

Alternatively, we may implement a functional queue using two lists, which we will call the front and back lists. For example, here is one such queue:

[a, b, c] / [f, e, d]

To enqueue a value, we prepend it to the back list. To dequeue a value, we remove it from the head of the front list. For example, we may enqueue 'g' and then dequeue 'a' and b' from the queue above, yielding this queue:

[c] / [g, f, e, d]

When dequeueing, if the front list is empty then we reverse the back list and move it to the front. For example, if we dequeue 'c' from the queue above, then the front list will become empty. The next time we dequeue, we will reverse the list [g, f, e, d] to form [d, e, f, g]. We will dequeue 'd', and the remaining items will now be at the front:

[e, f, g] / []

Here is a Prolog implementation:

empty_queue([] / []).

enqueue(X, F / B, F / [X | B]).

dequeue(X, [X | F] / B, F / B).
dequeue(X, [] / B, F / []) :- reverse(B, [X | F]).

With this implementation, enqueue() runs in O(1).

How will dequeue() perform? Suppose that we start with an empty queue, and enqueue and dequeue N values via some series of enqueue and dequeue operations. There will be some number of flips, i.e. operations in which we move values from the back to the front. Each flip of K values will take O(K), since reverse() runs in linear time. The total number of values flipped will be N, since each enqueued value will move to the front exactly once. So the total cost of all the flip operations will be O(N). And so the total time for all N dequeue operations will be O(N) (for removing values from the front list) + O(N) (for all the flip operations) = O(N). And so dequeue() will run in O(1) on average.

binary trees

It is straightforward to represent a binary tree in Prolog. We can represent an empty binary tree as the atom 'nil', and a non-empty tree as t(L, X, R), where X is the value in the root node and L and R are left and right subtrees. For example, consider this tree:

Here is a predicate defining its representation:

my_tree(T) :- T = t(t(nil, 2, nil),
                    5,
                    t(t(nil, 8, nil), 10, t(nil, 12, nil))).

It's straightforward to write a recursive predicate to count the nodes in a tree:

count(nil, 0).
count(t(L, _, R), N) :- N #>= 1, LN #>= 0, RN #>= 0, N #= LN + RN + 1,
                        count(L, LN), count(R, RN).

?- my_tree(_T), count(_T, N).
N = 5.

We may even generate all trees with a given number of nodes:

?- count(T, 3).
T = t(nil, _, t(nil, _, t(nil, _, nil))) ;
T = t(nil, _, t(t(nil, _, nil), _, nil)) ;
T = t(t(nil, _, nil), _, t(nil, _, nil)) ;
T = t(t(nil, _, t(nil, _, nil)), _, nil) ;
T = t(t(t(nil, _, nil), _, nil), _, nil)

Let's write a predicate maptree(P, T) that succeeds if a predicate P is true for every value in a tree:

maptree(_, nil).
maptree(P, t(L, X, R)) :- call(P, X), maptree(P, L), maptree(P, R).

?- count(T, 2), maptree(=(5), T).
T = t(nil, 5, t(nil, 5, nil)) ;
T = t(t(nil, 5, nil), 5, nil)

Here is a predicate that flattens a tree into a list:

flatten(nil, []).
flatten(t(L, X, R), M) :- flatten(L, FL), flatten(R, FR),
                          append([FL, [X], FR], M).

Unfortunately it's inefficient. Suppose that the tree has N nodes, and looks like a linked list descending leftward:

Then we will make N calls to append(), each of which will append a single element to the list. The total time will be O(1 + 2 + 3 + … + N) = O(N2). Alternatively, suppose that the tree is a complete binary tree with N nodes, and let T(N) be the predicate's running time. Then T(N) = 2 T(N / 2) + O(N), so T(N) = O(N log N).

The tutorial solutions below include a better implementation of flatten() that will run in linear time.

binary search trees

Let's write some predicates that operate on binary search trees. As we recall, in a binary search tree, if a node has value X, then all values in its left subtree are less than X, and all values in the right subtree are greater than X.

Here is a predicate to test membership in a tree:

% mem_tree(X, T) is true if X is contained in the binary search tree T.

mem_tree(X, t(_, X, _)).
mem_tree(X, t(L, Y, R)) :- X #< Y, mem_tree(X, L).
mem_tree(X, t(L, Y, R)) :- X #> Y, mem_tree(X, R).

Inserting into a binary search tree is straightforward:

% insert(X, T, U) is true if we can insert X into the binary search tree T
% to produce the tree U.  If X is already present in T, the predicate
% succeeds but does not insert a duplicate value.

insert(X, nil, t(nil, X, nil)).
insert(X, t(L, X, R), t(L, X, R)).
insert(X, t(L, Y, R), t(L2, Y, R)) :- X #< Y, insert(X, L, L2).
insert(X, t(L, Y, R), t(L, Y, R2)) :- X #> Y, insert(X, R, R2).

Deletion is not quite as easy, and we will not implement it here.

Of course, a disadvantage of a simple binary search tree is that it may become unbalanced. As an alternative, the Prolog standard library includes an implementation of association lists, which are dictionaries implemented using balanced binary trees.

the cut

Normally Prolog generates all possible solutions to any query. In some situations, however, we may only want a single solution. The cut operator ! causes Prolog to commit to all choices that it has already made in the current query or predicate. After a cut, Prolog will not backtrack to look for more possible solutions. For example:

?- member(X, [2, 4, 6, 8, 10]), X #> 5, !.
X = 6.

?- append(L, M, [a, b, c]), !.
L = [],
M = [a, b, c].

?-

Here are some more examples:

?- (X = 3; X = 4, !; X = 5).
X = 3 ;
X = 4.

?- (Z = 1; Z = 2), (X = 3; X = 4, !; X = 5).
Z = 1,
X = 3 ;
Z = 1,
X = 4.

Prolog will, however, continue to explore choices that appear to the right of the cut:

?- (X = 3; X = 4, !; X = 5), (Y = 10; Y = 20).
X = 3,
Y = 10 ;
X = 3,
Y = 20 ;
X = 4,
Y = 10 ;
X = 4,
Y = 20.

If a predicate is written using multiple rules, then the cut will affect it just as if it were written as a single rule using the ';' operator. In other words, after a cut no further rules will be considered. For example:

foo(X) :- X = 3.
foo(X) :- X = 4, !.
foo(X) :- X = 5.

?- foo(X).
X = 3 ;
X = 4.

?-

the cut and logical purity

Almost all of the Prolog features and predicates that we have seen until now are part of the pure logical core of the language. For example, dif(), member(), append(), select(), maplist(), foldl(), and new-style arithmetic are all logically pure.

Any program written only using these pure logical features has some very nice properties:

The cut is useful in various situations, however it is not logically pure. This means that a program that uses the cut, or features derived from the cut, will probably not have the properties above.

Let's look at a couple of examples. Consider this predicate:

kind(N, small) :- N #< 10.
kind(N, medium) :- N #>= 10, N #< 100.
kind(N, large) :- N #>= 100.

We might attempt to simplify it using the cut as follows:

kind2(N, small) :- N #< 10, !.
kind2(N, medium) :- N #< 100, !.
kind2(_, large).

With this change, any query of the form kind2(N, K) will still produce a correct value K if N is already instantiated and K is not:

?- kind2(50, K).
K = medium.

?- kind2(200, K).
K = large.
 

However, if N is uninstantiated then we may now receive answers that are incorrect:

?- kind2(N, large).
true.

The preceding answer claims that all N are big, but that is not true. Similarly, if K is already instantiated then we may also receive incorrect answers:

?- kind2(5, medium).
true.

Furthermore, the results of our query may now depend on the order in which we ask questions:

?- N = 50, K = medium, kind2(N, K).
N = 50,
K = medium.

?- kind2(N, K), N = 50, K = medium.
false.

As another example, suppose that we use the cut to write a version of member(X, L) that will succeed only once, even if X is in the list L multiple times:

mem1(X, [X | _]) :- !.
mem1(X, [_ | L]) :- mem1(X, L).

In the forward direction, it works as intended:

?- mem1(3, [2, 3, 3, 5, 4, 3]).
true.

?-

However in the reverse direction it now returns results that are incomplete:

?- mem1(3, L).
L = [3|_].

?-

As a result, once again we may get different results if we reorder goals in a query:

?- L = [2, 3, 4], mem1(3, L).
L = [2, 3, 4].

?- mem1(3, L), L = [2, 3, 4].
false.

In summary, the cut works well in a forward direction, i.e. at a moment when all variables are already instantiated. So, for example, it's reasonable to use a cut at the top level of a program that is searching for solutions in a state space, at a moment when a first solution is known and you don't want to search for more solutions. However, in situations where code may need to run multidirectionally, the cut is potentially dangerous and requires you to think about your program in a more procedural way.

negation

We may use the cut to implement negation. For example, let's write a predicate not_mem(X, L) that is true if X is not a member of L:

not_mem(X, L) :- member(X, L), !, false; true.

This predicate works as follows. If X is a member of L, then the predicate is false, and the cut prevents it from considering the other branch. Otherwise, the other branch runs and the predicate is true:

?- not_mem(3, [4, 5, 6]).
true.

?- not_mem(3, [4, 3, 6]).
false.

We may generalize this idea by writing a higher-order predicate 'not':

not(P) :- call(P), !, false; true.

?- not(member(3, [4, 5, 6])).
true.

In fact this predicate is built into Prolog, and has the name \+:

?- \+ member(3, [4, 5, 6]).
true.

Because Prolog's negation is defined using the cut, it will really only work in a forward direction. If you apply negation to a term in which all variables are already instantiated, it will behave as you expect. If variables are not instantiated, you may receive results that are unsound or incomplete. For example, consider this query:

?- \+ member(X, [2, 3, 4]).
false

This result is claiming that there are no values X which are not members of the list [2, 3, 4]. But the claim is unsound, since certainly such X do exist:

?- X = 10, \+ member(X, [2, 3, 4]).
X = 10.

As another example, suppose that we have the following predicates about restaurants:

good_food(u_kohouta).
good_food(u_krocana).

expensive(u_krocana).

inexpensive(R) :- \+ expensive(R).

I'm looking for a restaurant that has good food and is inexpensive. This query lets me find one:

?- good_food(R), inexpensive(R).
R = u_kohouta 

However if I ask the questions in the other order, the query fails:

?- inexpensive(R), good_food(R).
false

if/then/else

We may also use the cut to implement an if/then/else operation. For example, here is a predicate max() which computes the maximum Z of two values X and Y:

max(X, Y, Z) :- X #>= Y, !, Z = X; Z = Y.

?- max(4, 10, Z).
Z = 10.

?- max(7, 3, Z).
Z = 7.

If the goal 'X #>= Y' succeeds, then Z = X and the cut prevents Prolog from considering the other choice. If 'X #>= Y' fails, then Z = Y.

Prolog includes an arrow operator '->' which is defined using the cut, and lets us write this if/then/else choice with an (arguably) nicer syntax:

max(X, Y, Z) :- X #>= Y -> Z = X; Z = Y.

Once again, because this operator is defined using the cut, it will really only work in a forward direction, and may give us unsound or incomplete results in a reverse direction. For example:

?- max(4, X, 10).
false

This result is claiming that there are no values X such that the maximum of 4 and X is 10. Of course, that is wrong.

state space searching, revisited

We have already seen that we may easily use iterative deepening to find shortest solutions to state space problems in Prolog. Iterative deepening performs a series of depth-first searches of increasing depth.

Unfortunately, in some state spaces iterative deepening may be extremely inefficient. For example, let's generalize the 3 coins problem that we saw before. In the N coins problem, we begin with N coins in a row on a table, all facing heads up. We'd like to flip over as few coins as possible to reach a state in which all coins are tails up. Of course, the solution is trivial: simply flip over each coin once. However, we will treat this as a state space search problem, in which we blindly try every possible action from each state in an attempt to reach the goal. Let's adapt our previous 3 coins code to this problem, still using iterative deepening:

flip(h, t).
flip(t, h).

% move(S, T) - true if we can move from state S to state T
move(S, T) :- select(X, S, Y, T), flip(X, Y).

% path(S, T, P) - true if P is a path of states from S to T.
path(S, S, [S]).
path(S, T, [S | P]) :- move(S, S1), path(S1, T, P).

solve(N, P) :- length(Start, N), maplist(=(h), Start),
               length(Goal, N), maplist(=(t), Goal),
               length(P, _), path(Start, Goal, P).

It works:

?- solve(4, P).
P = [[h, h, h, h], [t, h, h, h], [t, t, h, h], [t, t, t, h], [t, t, t, t]] .

However as N increases this predicate becomes quite slow. For example, solve(10, P) takes minutes to run, even though there are only 210 = 1024 states in the state space.

The problem is that our iterative deeping solution is exploring every possible sequence of moves, and there are a very large number of these. For example, if N = 10, then there are 1010 possible sequences of 10 coin flips, and only a small fraction of these will reach the goal state.

breadth-first search

To solve state space search problems such as the preceding, we need a different search strategy. Specifically, a breadth-first search with a visited set will visit each state only once, so it may be far more efficient than iterative deepening.

Let's first write a predicate that can perform a breadth-first search to find the shortest path between two vertices in a graph. The breadth-first search needs a queue. For the moment, we will use a naive queue representation, i.e. a list, where we enqueue by appending to the list. We also need a visited set. For the moment, we will also use a naive visited set representation, i.e. a list.

When the breadth-first search reaches the goal, we would like to return a list of all states on the path from the start to the goal. For this reason, each element on the queue will not be a simple vertex V. Instead, it will be a list of vertices on a path from the start to V, in reverse order. Because this list is in reverse order, we can efficiently extend it by prepending any neighbor of V to it.

Here is our implementation:

neighbors(G, V, N) :- member(V -> N, G).

prepend_to(L, X, [X | L]).

% bfs(+Graph, +Queue, +Visited, +Goal, -Path)
% True if Path is a path of states leading from the start to Goal.

bfs(_, [[Goal | P1] | _], _, Goal, Path) :- reverse([Goal | P1], Path).

bfs(Graph, [[V | P1] | Q], Visited, Goal, Path) :-
    neighbors(Graph, V, Neighbors),
    subtract(Neighbors, Visited, Neighbors1), append(Neighbors1, Visited, Visited2),
    maplist(prepend_to([V | P1]), Neighbors1, Neighbors2), append(Q, Neighbors2, Q2),
    bfs(Graph, Q2, Visited2, Goal, Path).

% top-level entry point
bfs(Graph, Start, Goal, Path) :- bfs(Graph, [[Start]], [Start], Goal, Path).

breadth-first search in a state space

We can adapt the breath-first search code above to perform a state space search. However, we will need to some way to generate a list of all neighboring states for any state. The built-in predicate findall() can help. Given a variable X and a goal that includes X, findall() will find all possible solutions for X and return them in a list. For example:

?- findall(X, permutation([a, b, c], X), L).
L = [[a, b, c], [a, c, b], [b, a, c], [b, c, a], [c, a, b], [c, b, a]].

Now suppose that our state transition function is defined by a predicate move(S, T), which succeeds if T is a successor state of S. Then we can call findall(X, move(S, X), L) to produce a list of succesor states of S.

Here is a solution to the N coins puzzle, using a breadth-first search:

flip(h, t).
flip(t, h).

% move(S, T) - true if we can move from state S to state T
move(S, T) :- select(X, S, Y, T), flip(X, Y).

prepend_to(L, X, [X | L]).

% bfs(+Queue, +Visited, +Goal, -Path)
% True if Path is a path of states leading from the start to Goal.

bfs([[Goal | P1] | _], _, Goal, Path) :- reverse([Goal | P1], Path).

bfs([[S | P1] | Q], Visited, Goal, Path) :-
    findall(X, move(S, X), Neighbors),
    subtract(Neighbors, Visited, Neighbors1), append(Neighbors1, Visited, Visited2),
    maplist(prepend_to([S | P1]), Neighbors1, Neighbors2),  append(Q, Neighbors2, Q2),
    bfs(Q2, Visited2, Goal, Path).

solve(N, P) :- length(Start, N), maplist(=(h), Start),
               length(Goal, N), maplist(=(t), Goal),
               bfs([[Start]], [Start], Goal, P).

Now we can find a solution reasonably quickly even with N = 10:

?- solve(10, P).
P = [[h, h, h, h, h, h, h, h|...], [t, h, h, h, h, h, h|...], [t, t, h, h, h, h|...], [t, t, t, h, h|...], [t, t, t, t|...], [t, t, t|...], [t, t|...], [t|...], [...|...]|...] 

However this breadth-first search implementation will still be too inefficient for larger state space searches, because we have used naive data structures for the queue and visited set. We will need to use more efficient data structures if we want to search spaces with many thousands of states.