Haskell/Practical monads
In this chapter, we will present some very diverse examples of monads being used for practical tasks. Consider it as bonus material, and go through it at your own pace. You can always come back later if some parts (e.g. the final example about concurrency) seem too alien right now.
Parsing monads
[edit | edit source]This section is based on the "Parsing" chapter of Jonathan Tang's Write Yourself a Scheme in 48 Hours.
In the previous chapters, we saw how monads were used for IO, and started working more extensively with some of the more rudimentary monads like Maybe
, List
or State
. Now let us try something quintessentially "practical": writing a simple parser. Monads provide a clean way of embedding a domain specific parsing language directly into Haskell without the need for external tools or code generators. For a brief and accessible presentation of the subject, we suggest the paper Functional Pearls: Monadic Parsing in Haskell, by Graham Hutton and Erik Meijer. Right now, however, is time to get our hands dirty; and for that we will be using the Parsec library, version 3 or greater.
We need an extension for this code: FlexibleContexts. This allows us to write class constraints such as (Stream s u Char) =>
, where one of the type variables is defined instead of polymorphic.
{-# LANGUAGE FlexibleContexts #-}
Start by adding this line to the import section:
import Control.Monad import Control.Monad.Identity (Identity) import System.Environment (getArgs) import Text.Parsec hiding (spaces)
This makes the Parsec library functions and getArgs available to us, except the "spaces" function, whose name conflicts with a function that we'll be defining later. In addition, the Identity monad is made available so that we can use ParsecT on Identity.
Now, we'll define a parser that recognizes one of the symbols allowed in Scheme identifiers:
symbol :: Stream s m Char => ParsecT s u m Char symbol = oneOf "!#$%&|*+-/:<=>?@^_~"
This is another example of a monad: in this case, the "extra information" that is being hidden is all the info about position in the input stream, backtracking record, first and follow sets, etc. Parsec 3 uses a monad transformer to take care of all of that for us. We need only use the Parsec library function oneOf (see Text.Parsec.Char), and it'll recognize a single one of any of the characters in the string passed to it. And as you're about to see, you can compose primitive parsers into more sophisticated productions.
The type of the function is somewhat confusing. Stream s m Char defines a "stream" of Char's of type s, wrapped around monad m. Examples are of s would be String or ByteString. Accommodating both String and ByteString is the main reason for defining our function to be polymorphic around String. Parsec contains a type called Parser, but its not as polymorphic as we would normally like - it explicitly requires a stream type of String.
ParsecT defines a parser for a stream type s, state type u (we don't really need to use state, but its useful to define our functions to be polymorphic on state), inner monad m (usually Identity if we don't want to use it as a transformer) and result type Char, which is the "normal" type argument to Monads.
Let's define a function to call our parser and handle any possible errors:
readExpr :: Stream s Identity Char => s -> String readExpr input = case parse symbol "lisp" input of Left err -> "No match: " ++ show err Right val -> "Found value"
As you can see from the type signature, readExpr is a function (->) from a Stream (String or ByteString, most of the time) to a String. We name the parameter input
, and pass it, along with the symbol
action we defined above and the name of the parser ("lisp"), to the Parsec function parse.
Parse can return either the parsed value or an error, so we need to handle the error case. Following typical Haskell convention, Parsec returns an Either data type, using the Left constructor to indicate an error and the Right one for a normal value.
We use a case...of
construction to match the result of parse
against these alternatives. If we get a Left value (error), then we bind the error itself to err
and return "No match" with the string representation of the error. If we get a Right value, we bind it to val
, ignore it, and return the string "Found value".
The case...of
construction is an example of pattern matching, which we will see in much greater detail [evaluator1.html#primitiveval later on].
Finally, we need to change our main function to call readExpr and print out the result:
main :: IO ()
main = do args <- getArgs
putStrLn (readExpr (args !! 0))
To compile and run this, you need to specify "-package parsec -package mtl" on the command line, or else there will be link errors. For example:
debian:/home/jdtang/haskell_tutorial/code# ghc -package parsec -o simple_parser [../code/listing3.1.hs listing3.1.hs] debian:/home/jdtang/haskell_tutorial/code# ./simple_parser $ Found value debian:/home/jdtang/haskell_tutorial/code# ./simple_parser a No match: "lisp" (line 1, column 1): unexpected "a"
Whitespace
[edit | edit source]Next, we'll add a series of improvements to our parser that'll let it recognize progressively more complicated expressions. The current parser chokes if there's whitespace preceding our symbol:
debian:/home/jdtang/haskell_tutorial/code# ./simple_parser " %" No match: "lisp" (line 1, column 1): unexpected " "
Let's fix that, so that we ignore whitespace.
First, let's define a parser that recognizes any number of whitespace characters. Incidentally, this is why we included the "hiding (spaces)" clause when we imported Parsec: there's already a function "spaces" in that library, but it doesn't quite do what we want it to. (For that matter, there's also a parser called lexeme that does exactly what we want, but we'll ignore that for pedagogical purposes.)
spaces :: Stream s m Char => ParsecT s u m () spaces = skipMany1 space
Just as functions can be passed to functions, so can actions. Here we pass the Parser action space to the Parser action skipMany1, to get a Parser that will recognize one or more spaces.
Now, let's edit our parse function so that it uses this new parser:
readExpr input = case parse (spaces >> symbol) "lisp" input of
Left err -> "No match: " ++ show err
Right val -> "Found value"
We touched briefly on the >> ("then") operator in lesson 1, where we mentioned that it was used behind the scenes to combine the lines of a do-block. Here, we use it explicitly to combine our whitespace and symbol parsers. However, then has completely different semantics in the Parser and IO monads. In the Parser monad, then means "Attempt to match the first parser, then attempt to match the second with the remaining input, and fail if either fails." In general, then will have wildly different effects in different monads; it's intended as a general way to structure computations, and so needs to be general enough to accommodate all the different types of computations. Read the documentation for the monad to figure out precisely what it does.
Compile and run this code. Note that since we defined spaces in terms of skipMany1, it will no longer recognize a plain old single character. Instead you have to precede a symbol with some whitespace. We'll see how this is useful shortly:
debian:/home/jdtang/haskell_tutorial/code# ghc -package parsec -o simple_parser [../code/listing3.2.hs listing3.2.hs] debian:/home/jdtang/haskell_tutorial/code# ./simple_parser " %" Found value debian:/home/jdtang/haskell_tutorial/code# ./simple_parser % No match: "lisp" (line 1, column 1): unexpected "%" expecting space debian:/home/jdtang/haskell_tutorial/code# ./simple_parser " abc" No match: "lisp" (line 1, column 4): unexpected "a" expecting space
Return Values
[edit | edit source]Right now, the parser doesn't do much of anything - it just tells us whether a given string can be recognized or not. Generally, we want something more out of our parsers: we want them to convert the input into a data structure that we can traverse easily. In this section, we learn how to define a data type, and how to modify our parser so that it returns this data type.
First, we need to define a data type that can hold any Lisp value:
data LispVal = Atom String | List [LispVal] | DottedList [LispVal] LispVal | Number Integer | String String | Bool Bool
This is an example of an algebraic data type: it defines a set of possible values that a variable of type LispVal can hold. Each alternative (called a constructor and separated by |) contains a tag for the constructor along with the type of data that that constructor can hold. In this example, a LispVal can be:
- An Atom, which stores a String naming the atom
- A List, which stores a list of other LispVals (Haskell lists are denoted by brackets)
- A DottedList, representing the Scheme form (a b . c). This stores a list of all elements but the last, and then stores the last element as another field
- A Number, containing a Haskell Integer
- A String, containing a Haskell String
- A Bool, containing a Haskell boolean value
Constructors and types have different namespaces, so you can have both a constructor named String and a type named String. Both types and constructor tags always begin with capital letters.
Next, let's add a few more parsing functions to create values of these types. A string is a double quote mark, followed by any number of non-quote characters, followed by a closing quote mark:
parseString :: Stream s m Char => ParsecT s u m LispVal parseString = do char '"' x <- many (noneOf "\"") char '"' return $ String x
We're back to using the do-notation instead of the >> operator. This is because we'll be retrieving the value of our parse (returned by many (noneOf "\"")) and manipulating it, interleaving some other parse operations in the meantime. In general, use >> if the actions don't return a value, >>= if you'll be immediately passing that value into the next action, and do-notation otherwise.
Once we've finished the parse and have the Haskell String returned from many
, we apply the String constructor (from our LispVal data type) to turn it into a LispVal. Every constructor in an algebraic data type also acts like a function that turns its arguments into a value of its type. It also serves as a pattern that can be used in the left-hand side of a pattern-matching expression; we saw an example of this in [#symbols Lesson 3.1] when we matched our parser result against the two constructors in the Either data type.
We then apply the built-in function return to lift our LispVal into the Parser monad. Remember, each line of a do-block must have the same type, but the result of our String constructor is just a plain old LispVal. Return lets us wrap that up in a Parser action that consumes no input but returns it as the inner value. Thus, the whole parseString action will have type Parser LispVal.
The $ operator is infix function application: it's the same as if we'd written return (String x)
, but $ is right-associative, letting us eliminate some parentheses. Since $ is an operator, you can do anything with it that you'd normally do to a function: pass it around, partially apply it, etc. In this respect, it functions like the Lisp function apply.
Now let's move on to Scheme variables. An atom is a letter or symbol, followed by any number of letters, digits, or symbols:
parseAtom :: Stream s m Char => ParsecT s u m LispVal parseAtom = do first <- letter <|> symbol rest <- many (letter <|> digit <|> symbol) let atom = [first] ++ rest return $ case atom of "#t" -> Bool True "#f" -> Bool False otherwise -> Atom atom
Here, we introduce another Parsec combinator, the choice operator <|>. This tries the first parser, then if it fails, tries the second. If either succeeds, then it returns the value returned by that parser. The first parser must fail before it consumes any input: we'll see later how to implement backtracking.
Once we've read the first character and the rest of the atom, we need to put them together. The "let" statement defines a new variable "atom". We use the list concatenation operator ++ for this. Recall that first
is just a single character, so we convert it into a singleton list by putting brackets around it. If we'd wanted to create a list containing many elements, we need only separate them by commas.
Then we use a case statement to determine which LispVal to create and return, matching against the literal strings for true and false. The otherwise
alternative is a readability trick: it binds a variable named otherwise
, whose value we ignore, and then always returns the value of atom
.
Finally, we create one more parser, for numbers. This shows one more way of dealing with monadic values:
parseNumber :: Stream s m Char => ParsecT s u m LispVal parseNumber = liftM (Number . read) $ many1 digit
It's easiest to read this backwards, since both function application ($) and function composition (.) associate to the right. The parsec combinator many1 matches one or more of its argument, so here we're matching one or more digits. We'd like to construct a number LispVal from the resulting string, but we have a few type mismatches. First, we use the built-in function read to convert that string into a number. Then we pass the result to Number to get a LispVal. The function composition operator "." creates a function that applies its right argument and then passes the result to the left argument, so we use that to combine the two function applications.
Unfortunately, the result of many1 digit
is actually a Parser String, so our combined Number . read
still can't operate on it. We need a way to tell it to just operate on the value inside the monad, giving us back a Parser LispVal. The standard function liftM
does exactly that, so we apply liftM to our Number . read
function, and then apply the result of that to our Parser.
This style of programming - relying heavily on function composition, function application, and passing functions to functions - is very common in Haskell code. It often lets you express very complicated algorithms in a single line, breaking down intermediate steps into other functions that can be combined in various ways. Unfortunately, it means that you often have to read Haskell code from right-to-left and keep careful track of the types. We'll be seeing many more examples throughout the rest of the tutorial, so hopefully you'll get pretty comfortable with it.
Let's create a parser that accepts either a string, a number, or an atom:
parseExpr :: Stream s m Char => ParsecT s u m LispVal parseExpr = parseAtom <|> parseString <|> parseNumber
And edit readExpr so it calls our new parser:
readExpr :: String -> String
readExpr input = case parse parseExpr "lisp" input of
Left err -> "No match: " ++ show err
Right _ -> "Found value"
Compile and run this code, and you'll notice that it accepts any number, string, or symbol, but not other strings:
debian:/home/jdtang/haskell_tutorial/code# ghc -package parsec -o simple_parser [.../code/listing3.3.hs listing3.3.hs] debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "\"this is a string\"" Found value debian:/home/jdtang/haskell_tutorial/code# ./simple_parser 25 Found value debian:/home/jdtang/haskell_tutorial/code# ./simple_parser symbol Found value debian:/home/jdtang/haskell_tutorial/code# ./simple_parser (symbol) bash: syntax error near unexpected token `symbol' debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(symbol)" No match: "lisp" (line 1, column 1): unexpected "(" expecting letter, "\"" or digit
Exercises |
---|
|
Recursive Parsers: Adding lists, dotted lists, and quoted datums
[edit | edit source]Next, we add a few more parser actions to our interpreter. Start with the parenthesized lists that make Lisp famous:
parseList :: Stream s m Char => ParsecT s u m LispVal parseList = liftM List $ sepBy parseExpr spaces
This works analogously to parseNumber, first parsing a series of expressions separated by whitespace (sepBy parseExpr spaces
) and then apply the List constructor to it within the Parser monad. Note too that we can pass parseExpr to sepBy, even though it's an action we wrote ourselves.
The dotted-list parser is somewhat more complex, but still uses only concepts that we're already familiar with:
parseDottedList :: Stream s m Char => ParsecT s u m LispVal parseDottedList = do head <- endBy parseExpr spaces tail <- char '.' >> spaces >> parseExpr return $ DottedList head tail
Note how we can sequence together a series of Parser actions with >> and then use the whole sequence on the right hand side of a do-statement. The expression char '.' >> spaces
returns a Parser ()
, then combining that with parseExpr gives a Parser LispVal, exactly the type we need for the do-block.
Next, let's add support for the single-quote syntactic sugar of Scheme:
parseQuoted :: Stream s m Char => ParsecT s u m LispVal parseQuoted = do char '\'' x <- parseExpr return $ List [Atom "quote", x]
Most of this is fairly familiar stuff: it reads a single quote character, reads an expression and binds it to x, and then returns (quote x)
, to use Scheme notation. The Atom constructor works like an ordinary function: you pass it the String you're encapsulating, and it gives you back a LispVal. You can do anything with this LispVal that you normally could, like put it in a list.
Finally, edit our definition of parseExpr to include our new parsers:
parseExpr :: Stream s m Char => ParsecT s u m LispVal
parseExpr = parseAtom
<|> parseString
<|> parseNumber
<|> parseQuoted
<|> do char '('
x <- (try parseList) <|> parseDottedList
char ')'
return x
This illustrates one last feature of Parsec: backtracking. parseList and parseDottedList recognize identical strings up to the dot; this breaks the requirement that a choice alternative may not consume any input before failing. The try combinator attempts to run the specified parser, but if it fails, it backs up to the previous state. This lets you use it in a choice alternative without interfering with the other alternative.
Compile and run this code:
debian:/home/jdtang/haskell_tutorial/code# ghc -package parsec -o simple_parser [../code/listing3.4.hs listing3.4.hs] debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(a test)" Found value debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(a (nested) test)" Found value debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(a (dotted . list) test)" Found value debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(a '(quoted (dotted . list)) test)" Found value debian:/home/jdtang/haskell_tutorial/code# ./simple_parser "(a '(imbalanced parens)" No match: "lisp" (line 1, column 24): unexpected end of input expecting space or ")"
Note that by referring to parseExpr within our parsers, we can nest them arbitrarily deep. Thus, we get a full Lisp reader with only a few definitions. That's the power of recursion.
Exercises |
---|
|
Stateful monads for concurrent applications
[edit | edit source]You're going to have to know about Monad transformers before you can do these things. Although the example came up because of Concurrency , if you realize a TVar is a mutable variable of some kind, why this example came up might make some sense to you.
This is a little trick that I find makes writing stateful concurrent applications easier, especially for network applications. Lets look at an imaginary stateful server.
Each currently connected client has a thread allowing the client to update the state.
The server also has a main logic thread which also transforms the state.
So you want to allow the client to update the state of the program
It's sometimes really simple and easy to expose the whole state of the program in a TVar, but I find this can get really messy, especially when the definition of the state changes!
Also it can be very annoying if you have to do anything conditional.
So to help tidy things up ( Say your state is called World )
Make a monad over state
[edit | edit source]First, make a monad over the World type
import Control.Monad.State.Lazy
-- heres yer monad -- it can liftIO too type WorldM = StateT World IO
data World = World { objects :: [ WorldObject ] }
Now you can write some accessors in WorldM
-- maybe you have a bunch of objects each with a unique id import Data.Unique import Data.Maybe import Prelude hiding ( id )
data WorldObject = WorldObject { id :: Unique }
-- check Control.Monad.State.Lazy if you are confused about get and put addObject :: WorldObject -> WorldM ( ) addObject wO = do wst <- get put $ World $ wO : ( objects wst )
-- presuming unique id getObject :: Unique -> WorldM ( Maybe WorldObject ) getObject id1 = do wst <- get return $ listToMaybe $ filter ( \ wO -> id wO == id1 ) ( objects wst )
now heres a type representing a change to the World
data WorldChange = NewObj WorldObject | UpdateObj WorldObject | -- use the unique ids as replace match RemoveObj Unique -- delete obj with named id
What it looks like all there's left to do is to
type ChangeBucket = TVar [ WorldChange ]
mainLoop :: ChangeBucket -> WorldM ( ) mainLoop cB = -- do some stuff -- it's probably fun -- using your cheeky wee WorldM accessors mainLoop cB -- recurse on the shared variable
Remember, your main thread is a transformer of World and IO so it can run 'atomically' and read the changeBucket.
Now, presuming you have a function that can incorporate a WorldChange into the existing WorldM your 'wait-for-client-input' thread can communicate with the main thread of the program, and it doesn't look too nasty.
Make the external changes to the state monadic themselves
[edit | edit source]However! Since all the state inside your main thread is now hidden from the rest of the program and you communicate through a one way channel --- data goes from the client to the server, but the mainLoop keeps its state a secret --- your client thread is never going to be able to make conditional choices about the environment - the client thread runs in IO but the main thread runs in WorldM.
So the REAL type of your shared variable is
type ChangeBucket = TVar [ WorldM ( Maybe WorldChange ) ]
This can be generated from the client-input thread, but you'll be able to include conditional statements inside the code, which is only evaluated against the state when it is run from your main thread
It all sounds a little random, but it's made my life a lot easier. Heres some real working code, based on this idea
- this takes commands from a client, and attempts change the object representing the client inside the game's state
- the output from this function is then written to a ChangeBucket ( using the ChangeBucket definition in this section, above ) and run inside the DState of the game's main loop.
( you might want to mentally substitute DState for WorldM )
-- cmd is a command generated from parsing network input mkChange :: Unique -> Command -> DState ( Maybe WorldChange ) mkChange oid cmd = do mp <- getObject oid -- this is maybe an object, as per the getObject definition earlier in the article -- enter the maybe monad return $ do p <- mp -- if its Nothing, the rest will be nothing case cmd of -- but it might be something Disconnect -> Just $ RemoveObject oid Move x y -> Just $ UpdateObject $ DO ( oid ) ( name p ) ( speed p ) ( netclient p ) ( pos p ) [ ( x , y ) ] ( method p )
A note and some more ideas.
[edit | edit source]Another design might just have
type ChangeBucket = TVar [ WorldM ( ) ]
And so just update the game world as they are run. I have other uses for the WorldM ( Maybe Change ) type.
So I conclude - All I have are my monads and my word so go use your monads imaginatively and write some computer games ;)
This section is a stub. You can help Haskell by expanding it. |