Write Yourself a Scheme in 48 Hours/Adding Variables and Assignment
Finally, we get to the good stuff: variables. A variable lets us save the result of an expression and refer to it later. In Scheme, a variable can also be reset to new values, so that its value changes as the program executes. This presents a complication for Haskell, because the execution model is built upon functions that return values, but never change them.
Nevertheless, there are several ways to simulate state in Haskell, all involving monads. The simplest is probably the state monad, which lets you hide arbitrary state within the monad and pass it around behind the scenes. You specify the state type as a parameter to the monad (e.g. if a function returns an integer but modifies a list of string pairs, it would have type State [(String, String)] Integer
), and access it via the get and put functions, usually within a do-block. You'd specify the initial state via the runState myStateAction initialList
, which returns a pair containing the return value and the final state.
Unfortunately, the state monad doesn't work well for us, because the type of data we need to store is fairly complex. For a simple top-level environment, we could get away with [(String, LispVal)]
, storing mappings from variable names to values. However, when we start dealing with function calls, these mappings become a stack of nested environments, arbitrarily deep. And when we add closures, environments might get saved in an arbitrary Function value, and passed around throughout the program. In fact, they might be saved in a variable and passed out of the runState
monad entirely, something we're not allowed to do.
Instead, we use a feature called state threads, letting Haskell manage the aggregate state for us. This lets us treat mutable variables as we would in any other programming language, using functions to get or set variables. There are two flavors of state threads: the ST
monad creates a stateful computation that can be executed as a unit, without the state escaping to the rest of the program. The IORef
module lets you use stateful variables within the IO monad. Since our state has to be interleaved with IO anyway (it persists between lines in the REPL, and we will eventually have IO functions within the language itself), we'll be using IORef
s.
We can start out by importing Data.IORef
and defining a type for our environments:
import Data.IORef
type Env = IORef [(String, IORef LispVal)]
This declares an Env
as an IORef
holding a list that maps String
s to mutable LispVal
s. We need IORef
s for both the list itself and for individual values because there are two ways that the program can mutate the environment. It might use set!
to change the value of an individual variable, a change visible to any function that shares that environment (Scheme allows nested scopes, so a variable in an outer scope is visible to all inner scopes). Or it might use define
to add a new variable, which should be visible on all subsequent statements.
Since IORef
s can only be used within the IO monad, we'll want a helper action to create an empty environment. We can't just use the empty list []
because all accesses to IORef
s must be sequenced, and so the type of our null environment is IO Env
instead of just plain Env
:
nullEnv :: IO Env
nullEnv = newIORef []
From here, things get a bit more complicated, because we'll be simultaneously dealing with two monads. Remember, we also need an Error
monad to handle things like unbound variables. The parts that need IO functionality and the parts that may throw exceptions are interleaved, so we can't just catch all the exceptions and return only normal values to the IO monad.
Haskell provides a mechanism known as monad transformers that lets you combine the functionality of multiple monads. We'll be using one of these, ExceptT
, which lets us layer error-handling functionality on top of the IO monad. Our first step is to create a type synonym for our combined monad:
type IOThrowsError = ExceptT LispError IO
Like ThrowsError
, IOThrowsError
is really a type constructor: we've left off the last argument, the return type of the function. However, ExceptT
takes one more argument than plain old Either
: we have to specify the type of monad that we're layering our error-handling functionality over. We've created a monad that may contain IO actions that throw a LispError
.
We have a mix of ThrowsError
and IOThrowsError
functions, but actions of different types cannot be contained within the same do-block, even if they provide essentially the same functionality. Haskell already provides a mechanism, lifting, to bring values of the lower type (IO
) into the combined monad. Unfortunately, there's no similar support to bring a value of the untransformed upper type into the combined monad, so we need to write it ourselves:
liftThrows :: ThrowsError a -> IOThrowsError a
liftThrows (Left err) = throwError err
liftThrows (Right val) = return val
This destructures the Either
type and either re-throws the error type or returns the ordinary value. Methods in typeclasses resolve based on the type of the expression, so throwError
and return
(members of MonadError
and Monad
, respectively) take on their IOThrowsError
definitions. Incidentally, the type signature provided here is not fully general: if we'd left it off, the compiler would have inferred liftThrows :: (MonadError m a) => Either e a -> m a
.
We'll also want a helper function to run the whole top-level IOThrowsError
action, returning an IO action. We can't escape from the IO monad, because a function that performs IO has an effect on the outside world, and you don't want that in a lazily-evaluated pure function. But you can run the error computation and catch the errors.
runIOThrows :: IOThrowsError String -> IO String
runIOThrows action = runExceptT (trapError action) >>= return . extractValue
This uses our previously-defined trapError
function to take any error values and convert them to their string representations, then runs the whole computation via runExceptT
. The result is passed into extractValue
and returned as a value in the IO monad.
Now we're ready to return to environment handling. We'll start with a function to determine if a given variable is already bound in the environment, necessary for proper handling of define
:
isBound :: Env -> String -> IO Bool
isBound envRef var = readIORef envRef >>= return . maybe False (const True) . lookup var
This first extracts the actual environment value from its IORef
via readIORef
. Then we pass it to lookup to search for the particular variable we're interested in. lookup returns a Maybe
value, so we return False
if that value was Nothing
and True
otherwise (we need to use the const
function because maybe
expects a function to perform on the result and not just a value). Finally, we use return to lift that value into the IO monad. Since we're just interested in a true/false value, we don't need to deal with the actual IORef
that lookup returns.
Next, we'll want to define a function to retrieve the current value of a variable:
getVar :: Env -> String -> IOThrowsError LispVal
getVar envRef var = do env <- liftIO $ readIORef envRef
maybe (throwError $ UnboundVar "Getting an unbound variable" var)
(liftIO . readIORef)
(lookup var env)
Like the previous function, this begins by retrieving the actual environment from the IORef
. However, getVar
uses the IOThrowsError
monad, because it also needs to do some error handling. As a result, we need to use the liftIO
function to lift the readIORef
action into the combined monad. Similarly, when we return the value, we use liftIO . readIORef
to generate an IOThrowsError
action that reads the returned IORef
. We don't need to use liftIO
to throw an error, however, because throwError
is a defined for the MonadError
typeclass, of which ExceptT
is an instance.
Now we create a function to set values:
setVar :: Env -> String -> LispVal -> IOThrowsError LispVal
setVar envRef var value = do env <- liftIO $ readIORef envRef
maybe (throwError $ UnboundVar "Setting an unbound variable" var)
(liftIO . (flip writeIORef value))
(lookup var env)
return value
Again, we first read the environment out of its IORef
and run a lookup on it. This time, however, we want to change the variable instead of just reading it. The writeIORef
action provides a means for this, but takes its arguments in the wrong order (ref -> value instead of value -> ref). So we use the built-in function flip
to switch the arguments of writeIORef
around, and then pass it the value. Finally, we return the value we just set, for convenience.
We'll want a function to handle the special behavior of define
, which sets a variable if already bound or creates a new one if not. Since we've already defined a function to set values, we can use it in the former case:
defineVar :: Env -> String -> LispVal -> IOThrowsError LispVal
defineVar envRef var value = do
alreadyDefined <- liftIO $ isBound envRef var
if alreadyDefined
then setVar envRef var value >> return value
else liftIO $ do
valueRef <- newIORef value
env <- readIORef envRef
writeIORef envRef ((var, valueRef) : env)
return value
It's the latter case that's interesting, where the variable is unbound. We create an IO action (via do-notation) that creates a new IORef
to hold the new variable, reads the current value of the environment, then writes a new list back to that variable consisting of the new (key, variable) pair added to the front of the list. Then we lift that whole do-block into the IOThrowsError
monad with liftIO
.
There's one more useful environment function: being able to bind a whole bunch of variables at once, as happens when a function is invoked. We might as well build that functionality now, though we won't be using it until the next section:
bindVars :: Env -> [(String, LispVal)] -> IO Env
bindVars envRef bindings = readIORef envRef >>= extendEnv bindings >>= newIORef
where extendEnv bindings env = liftM (++ env) (mapM addBinding bindings)
addBinding (var, value) = do ref <- newIORef value
return (var, ref)
This is perhaps more complicated than the other functions, since it uses a monadic pipeline (rather than do-notation) and a pair of helper functions to do the work. It's best to start with the helper functions. addBinding takes a variable name and value, creates an IORef
to hold the new variable, and then returns the name–value pair. extendEnv
calls addBinding
on each member of bindings (mapM
) to create a list of (String, IORef LispVal)
pairs, and then appends the current environment to the end of that (++ env)
. Finally, the whole function wires these functions up in a pipeline, starting by reading the existing environment out of its IORef
, then passing the result to extendEnv
, then returning a new IORef
with the extended environment.
Now that we have all our environment functions, we need to start using them in the evaluator. Since Haskell has no global variables, we'll have to thread the environment through the evaluator as a parameter. While we're at it, we might as well add the set!
and define
special forms.
eval :: Env -> LispVal -> IOThrowsError LispVal
eval env val@(String _) = return val
eval env val@(Number _) = return val
eval env val@(Bool _) = return val
eval env (Atom id) = getVar env id
eval env (List [Atom "quote", val]) = return val
eval env (List [Atom "if", pred, conseq, alt]) =
do result <- eval env pred
case result of
Bool False -> eval env alt
otherwise -> eval env conseq
eval env (List [Atom "set!", Atom var, form]) =
eval env form >>= setVar env var
eval env (List [Atom "define", Atom var, form]) =
eval env form >>= defineVar env var
eval env (List (Atom func : args)) = mapM (eval env) args >>= liftThrows . apply func
eval env badForm = throwError $ BadSpecialForm "Unrecognized special form" badForm
Since a single environment gets threaded through a whole interactive session, we need to change a few of our IO functions to take an environment.
evalAndPrint :: Env -> String -> IO ()
evalAndPrint env expr = evalString env expr >>= putStrLn
evalString :: Env -> String -> IO String
evalString env expr = runIOThrows $ liftM show $ (liftThrows $ readExpr expr) >>= eval env
We need the runIOThrows
in evalString because the type of the monad has changed from ThrowsError
to IOThrowsError
. Similarly, we need a liftThrows
to bring readExpr
into the IOThrowsError
monad.
Next, we initialize the environment with a null variable before starting the program:
runOne :: String -> IO ()
runOne expr = nullEnv >>= flip evalAndPrint expr
runRepl :: IO ()
runRepl = nullEnv >>= until_ (== "quit") (readPrompt "Lisp>>> ") . evalAndPrint
We've created an additional helper function runOne to handle the single-expression case, since it's now somewhat more involved than just running evalAndPrint
. The changes to runRepl
are a bit more subtle: notice how we added a function composition operator before evalAndPrint. That's because evalAndPrint
now takes an additional Env
parameter, fed from nullEnv
. The function composition tells until_
that instead of taking plain old evalAndPrint
as an action, it ought to apply it first to whatever's coming down the monadic pipeline, in this case the result of nullEnv
. Thus, the actual function that gets applied to each line of input is (evalAndPrint env)
, just as we want it.
Finally, we need to change our main function to call runOne
instead of evaluating evalAndPrint
directly:
main :: IO ()
main = do args <- getArgs
case length args of
0 -> runRepl
1 -> runOne $ args !! 0
otherwise -> putStrLn "Program takes only 0 or 1 argument"
And we can compile and test our program:
$ ghc -package parsec -o lisp [../code/listing8.hs listing8.hs] $ ./lisp Lisp>>> (define x 3) 3 Lisp>>> (+ x 2) 5 Lisp>>> (+ y 2) Getting an unbound variable: y Lisp>>> (define y 5) 5 Lisp>>> (+ x (- y 2)) 6 Lisp>>> (define str "A string") "A string" Lisp>>> (< str "The string") Invalid type: expected number, found "A string" Lisp>>> (string<? str "The string") #t