Week 11: Notes

Our topics this week included

You can read about these topics in chapters 6 ("Higher Order Functions"), 8 ("Making Our Own Types and Typeclasses") and 9 ("Input and Output") in the Learn You a Haskell book.

Abstract data types in Haskell

In many programming languages we may define abstract data types such as stacks and queues. An abstract data type defines a set of operations that exist on a data type. For example, a stack has push and pop operations, and probably an isEmpty operation as well. We can then build concrete implementations of those data types using various data structures. For example, we can implement a stack either using an array, or a linked list.

In Haskell we can use a type class to define an abstract data type, and a type to build a particular concrete implementation of that type.

For example, here's a type class representing an abstract data type for a set of objects of type a:

class Set s where
    emptySet :: s a
    insert :: Ord a => a -> s a -> s a
    member :: Ord a => a -> s a -> Bool

In this type class, s represents a parameterized type. In other words, s has kind * -> *. You can read the definitions above as follows:

Here is a data type LSet that belongs to (i.e. implements) Set. It stores a set of objects naively, as a list:

data LSet a = LSet [a]

instance Set LSet where
    emptySet = LSet []
    insert x (LSet list) =
        if elem x list then LSet list else LSet (x : list)
    member x (LSet list) = elem x list

Of course, a type in Haskell may belong to multiple type classes. Let's also make LSet implement the Show type class, so that we can print out the elements in an LSet:

instance (Show a) => Show (LSet a) where
    show (LSet list) = "{" ++ unwords (map show list) ++ "}"

Notice that LSet belongs to Set, but (LSet a) belongs to Show. This difference is because, once again, the type class Set contains types of kind * -> * (i.e. parameterized types), but Show contains types of kind *, i.e. concrete types.

Also notice the type constaint (Show a) in this instance declaration. This means that an LSet of values of type a is showable only if a itself is showable. For example, Int belongs to Show, so if we have an LSet Int then we can print it out:

> LSet [2, 3, 4]
{2 3 4}

But the function type Int -> Int does not belong to Show, so an LSet (Int -> Int) cannot be printed:

> LSet [\x -> x]

<interactive>:23:1: error:
    • No instance for (Show (p0 -> p0)) arising from a use of ‘print’

Here is another implementation of a set of objects, namely a binary search tree. It also belongs to the Set type class:

data Tree a = Nil | Node (Tree a) a (Tree a)

instance Set Tree where
    emptySet = Nil
    
    insert x Nil = Node Nil x Nil
    insert x (Node left v right)
        | x < v = Node (insert x left) v right
        | x == v = Node left v right
        | x > v = Node left v (insert x right)
        
    member x Nil = False
    member x (Node left v right)
        | x < v = member x left
        | x == v = True
        | x > v = member x right

We can write functions that work with any type belonging to the Set type class. For example, here is a function that constructs a set with a single element:

set_of x = insert x emptySet

Notice the type of this function:

> :t set_of
set_of :: (Set s, Ord a) => a -> s a

Because set_of can return any type of Set, we cannot call it without any surrounding type context:

> set_of 3

<interactive>:27:1: error:
    • Ambiguous type variable ‘s0’ arising from a use of ‘print’
      prevents the constraint ‘(Show (s0 Integer))’ from being solved.

But we can use a type annotation to tell it what kind of set we want:

> set_of 3 :: LSet Int
{3}

functional queues

We didn't discuss this topic in the lecture or tutorial, however this information will be useful for one of the homework assignments this week.

Consider the problem of making a purely functional FIFO queue in Haskell. A naive implementation will hold all queue elements in a list. We can enqueue elements at the end of the list, and dequeue at the beginning:

data Queue a = Queue [a]

isEmpty :: Queue a -> Bool
isEmpty (Queue a) = null a

enqueue :: a -> Queue a -> Queue a
enqueue x (Queue xs) = Queue (xs ++ [x])

pop :: Queue a -> (a, Queue a)
pop (Queue []) = error "queue is empty"
pop (Queue (x : xs)) = (x, Queue xs)

Unfortunately this implementation is inefficient. The operation (xs ++ [x]) runs in O(N), where N is the length of xs, since it must generate a new list that is a copy of xs with x appended at the end. So in this implementation enqueue will run in O(N). Instead, we would like to be able to enqueue elements in O(1), at least on average.

In languages such as C++ or Java we can implement a more efficient queue either using a linked list, or a circular array. But both of those implementations rely on mutable state, which we do not have in Haskell.

So we must use another trick: we can implement an efficient functional queue using a pair of stacks. (You may remember that we saw this idea before a few weeks ago in Prolog.)

For example, such a queue (we will call it Q1) might look like this:

Q1: front: 6 : 2 : []
    back: 7 : 8 : 11 : []

In the picture above, the two stacks are called front and back. The numbers 6 and 7 are at the top of these stacks.

From the head to the tail, this queue contains the values 6, 2, 11, 8, 7. In other words, the values in the queue are the values in front, followed by the values in back in reverse order.

To enqueue a value, we simply push it onto the "back" stack., which takes O(1) . If we enqueue the number 12 into the queue above, we will get a new queue Q2 that looks like this:

Q2: front: 6 : 2 : []
    back: 12 : 7 : 8 : 11 : []

As always in a functional language, Q1 remains in memory unchanged. We can construct Q2 in O(1) because actually in memory the stack 12 : 7 : 8 : 11 : [] in Q2 contains the value 12 followed by a pointer to the stack 7 : 8 : 11 : [] in Q1, whose elements are shared between Q1 and Q2. The front stack is identical in both queues, and so it is shared in memory. No copy of either stack (which could take O(N)) is made during the enqueue operation.

How can we dequeue a value? If the front stack is non-empty, we can simply pop a value from it. For example, if we dequeue the value 6 from Q2, we get Q3:

Q3: front: 2 : []
    back: 12 : 7 : 8 : 11 : []

Now can dequeue the value 2, yielding Q4:

Q4: front: []
    back: 12 : 7 : 8 : 11 : []

But we want to dequeue another value, now what? The front stack is empty, and the next value that should be dequeued is 11, which is at the bottom of the back stack!

In this situation we reverse the back stack. The first element of the reversed stack is dequeued; the remaining elements move to the front. We call this action a "flip". The dequeue operation from Q4 yields the value 11 and a stack Q5, which looks like this:

Q5: front: 8 : 7 : 12 : []
    back: []

Successive dequeue operations will yield the values 8, 7, and 12.

We can see that a dequeue operation may take O(N). However if we enqueue and dequeue a series of N values in any order, the total time for all dequeue operations will be O(N), so dequeue runs in O(1) on average. Why is this? Each of the N elements will be flipped from the back stack to the front stack exactly once. A flip operation that moves K elements to the front stack will run in O(K). So since the flip operations will move a total of N elements, the total time to perform these operations will be O(N). And so the amortized overhead of flipping is O(1) per value that was enqueued and dequeued.

You will implement this efficient functional queue in a homework exercise this week.

You may now wonder: is it possible to implement a functional queue so that enqueue and dequeue operations always run in O(1), not only on average? Somewhat surprisingly, the answer is yes, however that requires a more advanced functional data structure that is beyond the scope of this course.