You can read about these topics in Learn You a Haskell, chapter 13 "For a Few Monads More".
Note that the discussion of writing monads in Learn You a Haskell is slightly out of date, since it was written at a time when the Monad type class was not a subclass of Applicative. But now it is, so if you want to implement a monad then you'll need to declare that your type is an instance both of Functor and of Applicative.
As an example of this, consider writing a monad for randomized computations. To see why this could be useful, suppose we want to write a function that generates a binary tree of a given depth in which each node holds a random value from 1 to 1,000,000. We might write
data Tree a = Nil | Node (Tree a) a (Tree a) deriving (Show) randomTree :: Int -> StdGen -> (Tree Int, StdGen) randomTree 0 gen = (Nil, gen) randomTree n gen = let (left, gen1) = randomTree (n - 1) gen (x, gen2) = randomR (1, 1_000_000) gen1 (right, gen3) = randomTree (n - 1) gen2 in (Node left x right, gen3)
Notice how we have to use a series of variables gen1, gen2, gen3 to keep track of the random number generator as we pull values out of it. That's a bit awkward.
To make this easier, we'd now like to write a
monad Rand a
representing a computation
that may use random numbers, and that returns a value of type a. With
that monad, we'll be able to rewrite the above code as follows:
randomTree :: Int -> Rand (Tree Int) randomTree 0 = return Nil randomTree n = do left <- randomTree (n - 1) x <- rand_r (1, 1_000_000) right <- randomTree (n - 1) return (Node left x right)
In this version of the code, we no longer have the variables gen1 ...
gen3; instead, the monad will automatically keep track of the random
number generator. However, when the function calls itself recursively
we now need to use the <-
operator,
since the function returns a monadic value (i.e. a Rand (Tree
Int)
) and we want the value inside it (i.e. a Tree
Int
).
To implement the monad, we will need to do the following:
Declare the type Rand.
Declare that it is an instance of Functor, Applicative, and Monad.
Write the function rand_r which returns a random number.
Write a function runRand that allows us to run a randomized computation.
Let's now go through these steps. A Rand
a
is a randomized computation that returns a value of type a,
so we can represent it as a function of type StdGen
-> (a, StdGen)
. In fact we can see a variation of this type
in the declaration of the first version of randomTree above:
randomTree :: Int -> StdGen -> (Tree Int, StdGen)
So we can write
data Rand a = Rand (StdGen -> (a, StdGen))
We must now implement the Functor, Applicative, and Monad type
classes. Implementing Functor and Applicative is easy, because the
Haskell library includes functions liftM
and ap
that can implement fmap
and (<*>
) automatically given an
implementation of the >>= operator:
instance Functor Rand where fmap = liftM instance Applicative Rand where pure x = Rand (\gen -> (x, gen)) (<*>) = ap
In the Monad type class we must implemement >>=, which is slightly trickier. Recall that >>= has this type:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
The call (Rand r >>= f
) should
produce a function that does three things:
Run the randomized computation r, which will produce a value x.
Call (f x), which will produce another randomized computation r1.
Run the randomized computation r1.
Effectively our function will glue together two randomized computations by producing a function that runs them both and passes a random number generator between them.
Here is the implementation:
instance Monad Rand where return = pure Rand r >>= f = Rand $ \gen -> let (x, gen1) = r gen Rand r1 = f x in r1 gen1
Now let's write rand_r. As we can see in randomTree above, we want to be able to use it to generate a random number in a given range:
x <- rand_r (1, 1_000_000)
Its type will be
rand_r :: (Int, Int) -> Rand Int
since it uses randomness to produce an Int. Now, we know that a Rand
Int
holds a function of type StdGen ->
(Int, StdGen)
. And actually if we call the built-in function
randomR and give it a range, it will produce a function of exactly
the right type, since randomR has type
randomR :: RandomGen g => (r, r) -> g -> (r, g)
So we can just write
rand_r (lo, hi) = Rand (randomR (lo, hi))
Or, if we want to be clever:
rand_r = Rand . randomR
Finally, we can write a top-level function runRand that will run an randomized computation using the standard random number generator:
runRand :: Rand a -> IO a runRand (Rand r) = do gen <- newStdGen let (x, _) = r gen return x
Let's try it:
> runRand (randomTree 2) Node (Node Nil 55741 Nil) 75990 (Node Nil 411414 Nil)
Parsing is an important and useful task in writing many programs. Given a grammar, there are many ways that we can write a parser for it. For example, we can write a recursive-descent parser by hand, or can use a parser generator (such as the classic UNIX tool yacc) that reads a grammar and outputs code that will parse it.
One interesting way to write a parser is using parser combinators. With this approach, a parser is an object that can parse a particular grammatical item. A combinator is a function that takes one or more parsers as arguments, and returns a parser that combines or transforms the given parser(s) in some way. Using combinators, we can write a parser for a complex grammar by combining smaller parsers in a systematic way.
Haskell includes a parser combinator library
called Parsec. In Parsec, the Parser type is a monad, so you can
combine parsers using do
blocks and
operators such as <$>
. (Recall
that every monad is a functor, so functor operators such as <$>
work on monads.) Parsec first appeared around 2000, and has been
quite popular and influential - in fact, it has been ported to
various other programming languages.
It takes some time to learn Parsec. However it is a powerful tool, and in fact parser combinators are generally my preferred approach for writing any kind of parser.
The official Parsec documentation can be a bit difficult to approach at first. Our quick reference for the Haskell library includes some useful Parsec functions, and may be an easier starting point.
In Parsec, the type Parser a
is a parser that can parse and return a value of type a. Note that
any parser may succeed or fail. If it sees the text it expects, it
will succeed and return a value. If it does not, it will fail and
won't return anything.
To get started with Parsec, let's write a parser
that can parse a decimal integer. The built-in parser digit
has type Parser
Char, and will parse
and return an ASCII digit. The built-in combinator many1
:: Parser a -> Parser [a]
takes a parser p that produces a
value of type a. (many1 p
) produces a
parser that will run p for as many times as it succeeds, and will
return a list of all the returned values. p must succeed at least
once; otherwise many1 will fail. So we might write
int :: Parser String int = many1 digit
We can use the top-level function parse
to run a parser. It has this type:
parse :: Parser a -> String -> String -> Either ParseError a
The first string argument to parse
is a
filename to use in error reporting; we can just use the empty string
"". The second string argument is the text to parse. parse
returns an Either
, which is either a
Left
that holds an error, or a Right
that holds the value of a successful parse. Let's run our parser int
:
> parse int "" "33259" Right "33259"
OK, that wasn't too exciting - the parser just produced the same string it was given! Let's modify our parser so that it will return an Int. We might write
int :: Parser Int int = do s <- many1 digit return (read s)
That will work:
> parse int "" "33259" Right 33259
Or instead of the do
block above we may
use <$>
, which is equivalent:
int :: Parser Int int = read <$> many1 digit
Now let's write a parser that can parse arithmetic expressions such
as "2 * (3 + 4 * 5)
". Our
expressions will include integer constants plus the operators +
,
-
, *
, and
/
. We'd like *
and /
to have higher precedence than +
and -. All operators will be left-associative. We'll use this
datatype for representing expressions:
data Expr = Const Int | OpExpr Char Expr Expr deriving (Show)
As one possible approach, we could write a context-free grammar that
expresses our desired operator precedence, with separate non-terminal
symbols for each precedence level. Then we could translate that
grammar into Parsec code. However, there is an easier way. Parsec
includes a combinator called buildExpressionParser
that takes a table of operators with precedences plus a parser for
individual terms, and produces a parser that can parse entire
expressions. So let's use that.
Parsec includes this type for operator associativities:
data Assoc = AssocNone | AssocLeft | AssocRight
Each element of our operator table will have this type:
data Operator a = Infix (Parser (a -> a -> a)) Assoc | Prefix (Parser (a -> a)) | Postfix (Parser (a -> a))
(Note: This is a simplication; the actual Operator type constructor takes more arguments, which you can see in the official Parsec documentation. In our code here we will not give the operator table a type, which will let us avoid dealing with these extra arguments.)
Consider the operator *
.
In the operator table, we need to provide a parser that parses the
character '*
', and returns a function
that can combine two expressions. We could write it using do
:
parse_mul :: Parser (Expr -> Expr -> Expr) parse_mul = do char '*' return (OpExpr '*')
However it will be easier to use the built-in operator $>
,
which is equivalent:
parse_mul :: Parser (Expr -> Expr -> Expr) parse_mul = char '*' $> OpExpr '*'
Note that you'll need to import Data.Functor
to get $>
. This operator will work on
any functor type (including any monad, of course). You can think of
it as replacing the return value with something else:
> Just 4 $> 10 Just 10 > [2, 3, 4] $> 10 [10,10,10]
Instead of writing a version of parse_mul
for each operator, we'll write a function infix_op
that we can reuse. Here is our operator table:
infix_op c = Infix (sym c $> OpExpr c) AssocLeft table = [ [ infix_op '*', infix_op '/' ], [ infix_op '+', infix_op '-' ] ]
The table is a list of lists. Each sublist contains operators with the same precedence, and the sublists are in order of decreasing precedence.
Finally, let's present the complete parser. Here is the code:
num :: Parser Expr num = Const . read <$> many1 digit infix_op c = Infix (char c $> OpExpr c) AssocLeft table = [ [ infix_op '*', infix_op '/' ], [ infix_op '+', infix_op '-' ] ] term :: Parser Expr term = num <|> char '(' *> expr <* char ')' expr :: Parser Expr expr = buildExpressionParser table term
Above, the parser term
uses the <|>
operator, which represents a choice. Each term is either a constant
integer (represented by num
), or an
expression surrounded by parentheses. Notice the operators *>
and <*
, which we've used to combine
the parser expr
with the parsers for the
left and right parentheses. You can think of *>
and <*
as pointing to the parser
whose return value we want; the other return values will be ignored.
Note that
char '(' *> expr <* char ')'
is equivalent to
do char '(' e <- expr char ')' return e
In fact *> and <* will work with any Applicative (including monads, of course):
> Just 3 *> Just 4 Just 4 > Nothing *> Just 4 Nothing
Our parser works:
> parse expr "" "(2+3)*(3-2)" Right (OpExpr '*' (OpExpr '+' (Const 2) (Const 3)) (OpExpr '-' (Const 3) (Const 2)))
However notice that it will not accept whitespace:
> parse expr "" "(2 + 2)" Left (line 1, column 3): unexpected " " expecting digit, operator or ")"
So let's extend the parser to ignore whitespace. In theory we could
just first remove all whitespace from the input string before
parsing, but for many grammars that won't work; for example, if we
are parsing Haskell code then "f x
"
is certainly not the same as "fx
".
So let's use a more sophisticated approach. We'll modify our
low-level character parsers so that they will ignore any leading
spaces.
The built-in parser spaces
will skip 0 or more whitespace characters, and returns (). So we
might write
sym :: Char -> Parser Char sym c = spaces *> char c
and then call e.g. sym '*'
when we want
to parse a '*
' character.
Unfortunately that won't work. The problem is that
this parser may consume some whitespace even when it fails. For
example, suppose that the expression parser calls (sym
'*'
) to try to parse a '*
', but
actually the next characters in the input stream are a '
' and then a '+
'. Inside sym
,
the call to spaces
will consume the '
', and then the call to char '*'
will fail. So sym
will also fail, in a
state in which the space character '
'
has been consumed.
An important principle in Parsec is that the
choice p <|> q
will try q only if
p failed without consuming any input. This may seem surprising
as first, but actually it makes Parsec efficient because it means
that by default Parsec will never backtrack after successfully
matching any input character.
In fact, when parsing many programming languages we will never need to backtrack after successfully matching an input token (which may actually consist of several characters). Specifically, these are languages whose context-free grammars belong to the LL(1) or LR(1) classes, though this is a subject for a compilers class.
Furthermore, unrestricted backtracking may often lead to confusing error messages when parsing fails due to a syntax error in the input. So Parsec's approach generally improves the error messages that it produces.
Now, in some cases we may actually want Parsec to
backtrack after a successful match. For example, if sym
'*'
matches some whitespace but then fails, we certainly want
the expression parser to try other possible operators. To ask for
this behavior, we can use the built-in combinator try
.
try p
will succeed if p succeeds. If p
fails, then try p
will fail without
consuming any input. In other words, it will undo any input
consumption that occurred while processing p.
Using try
in this
way, here's an updated parser that will ignore whitespace in its
input:
sym :: Char -> Parser Char sym c = try (spaces *> char c) num :: Parser Expr num = Const . read <$> try (spaces *> many1 digit) infix_op c = Infix (char c $> OpExpr c) AssocLeft table = [ [ infix_op '*', infix_op '/' ], [ infix_op '+', infix_op '-' ] ] term :: Parser Expr term = num <|> sym '(' *> expr <* sym ')' expr :: Parser Expr expr = buildExpressionParser table term
Let's run it:
> parse expr "" "(2 + 3) * (3 - 2)" Right (OpExpr '*' (OpExpr '+' (Const 2) (Const 3)) (OpExpr '-' (Const 3) (Const 2)))