Our topics this week included
the $ operator
defining type classes
functors
'do'
input and output
You can read about these topics in chapters 6 ("Higher Order Functions"), 8 ("Making Our Own Types and Typeclasses") and 9 ("Input and Output") in the Learn You a Haskell book.
In many
programming languages we may define abstract
data types such
as stacks and queues. An abstract data type defines a set of
operations that exist on a data type. For example, a stack has push
and pop
operations, and probably an
isEmpty
operation
as well. We can then build concrete
implementations of
those data types using various data structures. For example, we can
implement a stack either using an array, or a linked list.
In Haskell we can use a type class to define an abstract data type, and a type to build a particular concrete implementation of that type.
For example, here's a type class representing an abstract data type for a set of objects of type a:
class Set s where emptySet :: s a insert :: Ord a => a -> s a -> s a member :: Ord a => a -> s a -> Bool
In this type class, s represents
a parameterized type. In
other words, s has kind *
-> *
. You
can read the definitions above as follows:
emptySet
is
a set of values of type a. Note that emptySet
is
not a function - it is a constant! But it is a polymorphic
constant,
meaning that it will have different values in different
implementations of the type class!
insert
takes
a value to insert (of type a) and a set of values of type a, and
returns a new set that is like the old set, but also contains the
inserted value. Because Haskell is a purely functional language,
insertion must work this way - it is simply not possible to modify
the previously existing set.
The type
constraint Ord a
exists
here because implementations of the type class may need to use
comparison operations (e.g. <, ==, >) to decide how
to perform the insertion operation, and only members of Ord
are guaranteed to have such
operations.
member
takes
a value to look for (of type a) and a set of values of type a, and
returns True
if
the value is present in the set.
Here is a data type LSet
that belongs
to (i.e. implements) Set
. It stores
a set of objects naively, as a list:
data LSet a = LSet [a] instance Set LSet where emptySet = LSet [] insert x (LSet list) = if elem x list then LSet list else LSet (x : list) member x (LSet list) = elem x list
Of course, a type in Haskell
may belong to multiple type classes. Let's also make LSet
implement the Show
type class, so that we can
print out the elements in an LSet
:
instance (Show a) => Show (LSet a) where show (LSet list) = "{" ++ unwords (map show list) ++ "}"
Notice that LSet
belongs to Set
, but (LSet
a)
belongs to Show
. This difference is because,
once again, the type class Set
contains types of kind *
-> *
(i.e. parameterized types), but Show
contains types of kind *
, i.e. concrete types.
Also notice the type constaint (Show a)
in this instance declaration. This means that an LSet
of
values of type a is showable only if a itself is showable. For
example, Int
belongs to Show
, so if we have
an LSet Int
then we can print it out:
> LSet [2, 3, 4] {2 3 4}
But the function type Int -> Int
does not belong to
Show, so an LSet (Int -> Int)
cannot be printed:
> LSet [\x -> x] <interactive>:23:1: error: • No instance for (Show (p0 -> p0)) arising from a use of ‘print’
Here is another
implementation of a set of objects, namely a binary
search tree. It
also belongs to the Set
type
class:
data Tree a = Nil | Node (Tree a) a (Tree a) instance Set Tree where emptySet = Nil insert x Nil = Node Nil x Nil insert x (Node left v right) | x < v = Node (insert x left) v right | x == v = Node left v right | x > v = Node left v (insert x right) member x Nil = False member x (Node left v right) | x < v = member x left | x == v = True | x > v = member x right
We can write functions that
work with any type belonging to the Set
type
class. For example, here is a function that constructs a set with a
single element:
set_of x = insert x emptySet
Notice the type of this function:
> :t set_of set_of :: (Set s, Ord a) => a -> s a
Because set_of
can return any type of Set
,
we cannot call it without any surrounding type context:
> set_of 3 <interactive>:27:1: error: • Ambiguous type variable ‘s0’ arising from a use of ‘print’ prevents the constraint ‘(Show (s0 Integer))’ from being solved.
But we can use a type annotation to tell it what kind of set we want:
> set_of 3 :: LSet Int {3}
We didn't discuss this topic in the lecture or tutorial, however this information will be useful for one of the homework assignments this week.
Consider the problem of making a purely functional FIFO queue in Haskell. A naive implementation will hold all queue elements in a list. We can enqueue elements at the end of the list, and dequeue at the beginning:
data Queue a = Queue [a] isEmpty :: Queue a -> Bool isEmpty (Queue a) = null a enqueue :: a -> Queue a -> Queue a enqueue x (Queue xs) = Queue (xs ++ [x]) pop :: Queue a -> (a, Queue a) pop (Queue []) = error "queue is empty" pop (Queue (x : xs)) = (x, Queue xs)
Unfortunately this implementation is
inefficient. The operation (xs ++ [x]
) runs in O(N),
where N is the length of xs, since it must generate a new list that
is a copy of xs with x appended at the end. So in this implementation
enqueue
will run in O(N). Instead, we would
like to be able to enqueue elements in O(1), at least on
average.
In languages such as C++ or Java we can implement a more efficient queue either using a linked list, or a circular array. But both of those implementations rely on mutable state, which we do not have in Haskell.
So we must use another trick: we can implement an efficient functional queue using a pair of stacks. (You may remember that we saw this idea before a few weeks ago in Prolog.)
For example, such a queue (we will call it Q1) might look like this:
Q1: front: 6 : 2 : [] back: 7 : 8 : 11 : []
In the picture above, the two stacks are called front
and back
. The
numbers 6 and 7 are at the top of these stacks.
From the head to the tail, this
queue contains the values 6, 2, 11, 8, 7. In other words, the values
in the queue are the values in front
,
followed by the values in back
in reverse
order.
To enqueue a value, we simply push it onto the "back" stack., which takes O(1) . If we enqueue the number 12 into the queue above, we will get a new queue Q2 that looks like this:
Q2: front: 6 : 2 : [] back: 12 : 7 : 8 : 11 : []
As always in a functional language, Q1 remains in memory unchanged.
We can construct Q2 in O(1) because actually in
memory the stack 12 : 7 : 8 : 11 : [] in Q2 contains the value 12
followed by a pointer to the stack 7 : 8 : 11 : [] in Q1, whose
elements are shared between Q1 and Q2. The front
stack is identical in both
queues, and so it is shared in memory. No copy of either stack (which
could take O(N)) is made during the enqueue
operation.
How can
we dequeue a value? If the front
stack
is non-empty, we can simply pop a value from it. For example, if we
dequeue the value 6 from Q2, we get Q3:
Q3: front: 2 : [] back: 12 : 7 : 8 : 11 : []
Now can dequeue the value 2, yielding Q4:
Q4: front: [] back: 12 : 7 : 8 : 11 : []
But we want to dequeue another value, now what? The front
stack is empty, and the next value that should be dequeued is 11,
which is at the bottom of the back
stack!
In this situation we reverse the back
stack. The first
element of the reversed stack is dequeued; the remaining
elements move to the front
. We call this
action a "flip". The
dequeue
operation
from Q4 yields the value 11 and a stack Q5, which looks like this:
Q5: front: 8 : 7 : 12 : [] back: []
Successive dequeue operations will yield the values 8, 7, and 12.
We can see that a dequeue
operation
may take O(N). However if we enqueue and dequeue a series of N values
in any order, the total time for all dequeue operations will
be O(N), so dequeue runs in O(1) on average. Why is
this? Each of the N elements will be flipped from the back stack to
the front stack exactly once. A flip operation that moves K elements
to the front stack will run in O(K). So since the flip operations
will move a total of N elements, the total time to perform these
operations will be O(N). And so the amortized overhead of flipping is
O(1) per value that was enqueued and dequeued.
You will implement this efficient functional queue in a homework exercise this week.
You may now wonder: is it possible to implement a functional queue so that enqueue and dequeue operations always run in O(1), not only on average? Somewhat surprisingly, the answer is yes, however that requires a more advanced functional data structure that is beyond the scope of this course.