Week 12: Notes

modules
random numbers
applicative functors

You can read about these topics in Learn You a Haskell, chapters 7, 9, and 11.

Note that the System.Random module is included in the random package, which is not included in a default Haskell installation. You can install it using the Cabal package manager:

$ cabal install --lib random

packages

Haskell modules are contained in installable units called packages. The default installation of Haskell includes a number of packages including

base (which contains the fundamental modules Prelude, Data.Int, Data.List and many others)
array (which contains Data.Array)
containers (which contains various collection types such as Data.Set and Data.Map)

and many others.

You can install additional packages using Cabal, a package manager that comes with Haskell. Cabal is somewhat similar to Python's package manager pip: it can download packages from the internet and install them on your system. For example, as mentioned above you can use Cabal to install the random package:

$ cabal install --lib random

The command ghc-pkg list will list packages that are installed on your system. Its output looks like this:

$ ghc-pkg list
/home/adam/.ghcup/ghc/9.2.7/lib/ghc-9.2.7/package.conf.d
    Cabal-3.6.3.0
    array-0.5.4.0
    base-4.16.4.0
    binary-0.8.9.0
    bytestring-0.11.4.0
    containers-0.6.5.1
    deepseq-1.4.6.1
    ...

However this list is not complete: it may not include packages that you installed using Cabal. Those will be listed in a package environment file on your machine. When you run ghci, it prints the path of this file:

$ ghci
Loaded package environment from /home/adam/.ghc/x86_64-linux-9.2.7/environments/default
GHCi, version 9.2.7: https://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/adam/Dropbox/ubuntu/ghci
>

On my machine, this package environment file looks like this:

$ cat /home/adam/.ghc/x86_64-linux-9.2.7/environments/default
clear-package-db
global-package-db
package-db /home/adam/.cabal/store/ghc-9.2.7/package.db
package-id ghc-9.2.7
package-id bytestring-0.11.4.0
package-id unix-2.7.2.2
...
package-id directory-1.3.6.2
package-id text-1.2.5.0
package-id random-1.2.1.1-14950f0d57c55eed8fd0701e27e87527c3c7ef42acca95164df8337d2d526ad3
$

Notice the random package at the end of the listing.

random-access lists

In a procedural language, we can read or write the array element at a given index in O(1). Haskell's lists are linked lists, so retrieving or replacing an element at the Nth index of a list will take O(N). Last week we learned that Haskell has immutable arrays. We can retrieve an element of a Haskell array in O(1), but if we want to update a single value we must create a new array, which will take O(N) if the array has N elements.

In some situations we may want a sequence of elements in which we can update the element at a given index more quickly. Let's call this a random-access list. If the size of our sequence is fixed, we may implement a random-access list using a binary tree. We can construct the tree to be balanced, so we will be able to read or update elements in O(log N). As one possibility, each tree node could contain a pair (i, v), where i is an index and v is a value. However that is wasteful: the indices form a continuous range of integers, so we don't actually need to store them all in the tree. Instead, we may build a binary tree in which the internal nodes only contain pointers to children, and all values are stored in leaves. As we descend the tree to look for a value at a particular index, we can decide whether to go left or right at each step based on the index we are seeking.

For example, suppose that our sequence contains 7 elements, namely the characters of the string "sundial". We can store them in a tree that looks like this:

The tree has an asymmetrical shape because 7 is not a power of 2, but that is fine. In the above tree, for any node at the root of a subtree with n elements, the left subtree holds (n div 2) elements and the right subtree holds the remaining elements. For example, at the root node the left subtree has (7 div 2) = 3 elements, and the right subtree has the remaining 4 elements. So if we are looking for an element with an index that is less than 3, we can go left, otherwise right.

In the tutorial we wrote a partial implementation of this structure:

data Tree a = Leaf a | Node (Tree a) (Tree a)
type IMap a = (Int, Tree a)

mk_tree :: [a] -> Tree a
mk_tree [x] = Leaf x
mk_tree xs =
    let n = length xs `div` 2
        (xs1, xs2) = splitAt n xs
    in Node (mk_tree xs1) (mk_tree xs2)

-- Build a tree containing the given elements.
mk_imap :: [a] -> IMap a
mk_imap xs = (length xs, mk_tree xs)

-- Retrieve an element by index.
iget :: IMap a -> Int -> a
iget (1, Leaf x) 0 = x
iget (n, Node left right) k
    | k < n_left = iget (n_left, left) k
    | otherwise = iget (n - n_left, right) (k - n_left)
    where n_left = n `div` 2  -- # of elements in the left subtree

-- Set an element by index.
iset :: IMap a -> Int -> a -> IMap a
...

Haskell's Data.Sequence module (in the containers package) contains a type Seq a which is a more powerful random-access list. This type allows you to read or update elements by index in O(log N), concatenate or split sequences in O(log N), and perform various other operations efficiently as well.

sets and dictionaries

How can we build a purely functional set or dictionary data structure? For a dictionary, first suppose that the keys form a fixed set, and are of a comparable type. Then we can build a balanced binary tree in which each node contains a (key, value) pair. Since the keys are fixed, the tree's shape will not change and it will always be balanced. For a set, if we want to store a dynamically changing subset of values from a known fixed set, we can similarly use a dictionary that maps values in that fixed set to a boolean.

For example, suppose that you are implementing an algorithm on a graph whose vertices are named by strings. Since the graph will not change as the algorithm runs, its vertex names are a fixed set and you could use this approach to store a visited set of vertices, or a map from vertices to integers.

In the general case, when keys may be added dynamically and are not all known at the beginning, you can implement a set or dictionary using a self-balancing binary tree structure such as a red-black or AVL tree.

Haskell's containers package includes modules Data.Set and Data.Map which provide set and map types using balanced binary trees. There are also modules Data.IntSet and Data.IntMap which are more efficient when keys are integers.

Week 12: Notes

modules random numbers applicative functors

packages

random-access lists

sets and dictionaries

modules
random numbers
applicative functors