Jump to content

Write Yourself a Scheme in 48 Hours/Evaluation, Part 1

From Wikibooks, open books for an open world
Write Yourself a Scheme in 48 Hours
 ← Parsing Evaluation, Part 1 Error Checking and Exceptions → 

Beginning the Evaluator

[edit | edit source]

Currently, we've just been printing out whether or not we recognize the given program fragment. We're about to take the first steps towards a working Scheme interpreter: assigning values to program fragments. We'll be starting with baby steps, but fairly soon you'll be progressing to working with computations.

Let's start by telling Haskell how to print out a string representation of the various possible LispVals:

showVal :: LispVal -> String
showVal (String contents) = "\"" ++ contents ++ "\""
showVal (Atom name) = name
showVal (Number contents) = show contents
showVal (Bool True) = "#t"
showVal (Bool False) = "#f"

This is our first real introduction to pattern matching. Pattern matching is a way of destructuring an algebraic data type, selecting a code clause based on its constructor and then binding the components to variables. Any constructor can appear in a pattern; that pattern matches a value if the tag is the same as the value's tag and all subpatterns match their corresponding components. Patterns can be nested arbitrarily deep, with matching proceeding in an inside → outside, left → right order. The clauses of a function definition are tried in textual order, until one of the patterns matches. If this is confusing, you'll see some examples of deeply-nested patterns when we get further into the evaluator.

For now, you only need to know that each clause of the above definition matches one of the constructors of LispVal, and the right-hand side tells what to do for a value of that constructor.

The List and DottedList clauses work similarly, but we need to define a helper function unwordsList to convert the contained list into a string:

showVal (List contents) = "(" ++ unwordsList contents ++ ")"
showVal (DottedList head tail) = "(" ++ unwordsList head ++ " . " ++ showVal tail ++ ")"

The unwordsList function works like the Haskell Prelude's unwords function, which glues together a list of words with spaces. Since we're dealing with a list of LispVals instead of words, we define a function that first converts the LispVals into their string representations and then applies unwords to it:

unwordsList :: [LispVal] -> String
unwordsList = unwords . map showVal

Our definition of unwordsList doesn't include any arguments. This is an example of point-free style: writing definitions purely in terms of function composition and partial application, without regard to individual values or "points". Instead, we define it as the composition of a couple of built-in functions. First, we partially-apply map to showVal, which creates a function that takes a list of LispVals and returns a list of their string representations. Haskell functions are curried: this means that a function of two arguments, like map, is really a function that returns a function of one argument. As a result, if you supply only a single argument, you get back a function of one argument that you can pass around, compose, and apply later. In this case, we compose it with unwords: map showVal converts a list of LispVals to a list of their string representations, and then unwords joins the result together with spaces.

We used the function show above. This standard Haskell function lets you convert any type that's an instance of the class Show into a string. We'd like to be able to do the same with LispVal, so we make it into a member of the class Show, defining its show method as showVal:

instance Show LispVal where show = showVal

A full treatment of type classes is beyond the scope of this tutorial; you can find more information in other tutorials and the Haskell 2010 report.

Let's try things out by changing our readExpr function so it returns the string representation of the value actually parsed, instead of just "Found value":

readExpr input = case parse parseExpr "lisp" input of
    Left err -> "No match: " ++ show err
    Right val -> "Found " ++ show val

And compile and run…

$ ghc -package parsec -o parser listing4.1.hs
$ ./parser "(1 2 2)"
Found (1 2 2)
$ ./parser "'(1 3 (\"this\" \"one\"))"
Found (quote (1 3 ("this" "one")))

Beginnings of an evaluator: Primitives

[edit | edit source]

Now, we start with the beginnings of an evaluator. The purpose of an evaluator is to map some "code" data type into some "data" data type, the result of the evaluation. In Lisp, the data types for both code and data are the same, so our evaluator will return a LispVal. Other languages often have more complicated code structures, with a variety of syntactic forms.

Evaluating numbers, strings, booleans, and quoted lists is fairly simple: return the datum itself.

eval :: LispVal -> LispVal
eval val@(String _) = val
eval val@(Number _) = val
eval val@(Bool _) = val
eval (List [Atom "quote", val]) = val

This introduces a new type of pattern. The notation val@(String _) matches against any LispVal that's a string and then binds val to the whole LispVal, and not just the contents of the String constructor. The result has type LispVal instead of type String. The underbar is the "don't care" variable, matching any value yet not binding it to a variable. It can be used in any pattern, but is especially useful with @-patterns (where you bind the variable to the whole pattern) and with simple constructor-tests where you're just interested in the tag of the constructor.

The last clause is our first introduction to nested patterns. The type of data contained by List is [LispVal], a list of LispVals. We match that against the specific two-element list [Atom "quote", val], a list where the first element is the symbol "quote" and the second element can be anything. Then we return that second element.

Let's integrate eval into our existing code. Start by changing readExpr back so it returns the expression instead of a string representation of the expression:

readExpr :: String -> LispVal
readExpr input = case parse parseExpr "lisp" input of
    Left err -> String $ "No match: " ++ show err
    Right val -> val

And then change our main function to read an expression, evaluate it, convert it to a string, and print it out. Now that we know about the >>= monad sequencing operator and the function composition operator, let's use them to make this a bit more concise:

main :: IO ()
main = getArgs >>= print . eval . readExpr . head

Here, we take the result of the getArgs action (a list of strings) and pass it into the composition of:

  1. take the first value (head);
  2. parse it (readExpr);
  3. evaluate it (eval);
  4. convert it to a string and print it (print).

Compile and run the code the normal way:

$ ghc -package parsec -o eval listing4.2.hs
$ ./eval "'atom" 
atom
$ ./eval 2
2
$ ./eval "\"a string\""
"a string"
$ ./eval "(+ 2 2)"

Fail: listing6.hs:83: Non-exhaustive patterns in function eval

We still can't do all that much useful with the program (witness the failed (+ 2 2) call), but the basic skeleton is in place. Soon, we'll be extending it with some functions to make it useful.

Adding basic primitives

[edit | edit source]

Next, we'll improve our Scheme so we can use it as a simple calculator. It's still not yet a "programming language", but it's getting close.

Begin by adding a clause to eval to handle function application. Remember that all clauses of a function definition must be placed together and are evaluated in textual order, so this should go after the other eval clauses:

eval (List (Atom func : args)) = apply func $ map eval args

This is another nested pattern, but this time we match against the cons operator ":" instead of a literal list. Lists in Haskell are really syntactic sugar for a chain of cons applications and the empty list: [1, 2, 3, 4] = 1:(2:(3:(4:[]))). By pattern-matching against cons itself instead of a literal list, we're saying "give me the rest of the list" instead of "give me the second element of the list". For example, if we passed (+ 2 2) to eval, func would be bound to "+" and args would be bound to [Number 2, Number 2].

The rest of the clause consists of a couple of functions we've seen before and one we haven't defined yet. We have to recursively evaluate each argument, so we map eval over the args. This is what lets us write compound expressions like (+ 2 (- 3 1) (* 5 4)). Then we take the resulting list of evaluated arguments, and pass it and the original function to apply:

apply :: String -> [LispVal] -> LispVal
apply func args = maybe (Bool False) ($ args) $ lookup func primitives

The built-in function lookup looks up a key (its first argument) in a list of pairs. However, lookup will fail if no pair in the list contains the matching key. To express this, it returns an instance of the built-in type Maybe. We use the function maybe to specify what to do in case of either success or failure. If the function isn't found, we return a Bool False value, equivalent to #f (we'll add more robust error-checking later). If it is found, we apply it to the arguments using ($ args), an operator section of the function application operator.

Next, we define the list of primitives that we support:

primitives :: [(String, [LispVal] -> LispVal)]
primitives = [("+", numericBinop (+)),
              ("-", numericBinop (-)),
              ("*", numericBinop (*)),
              ("/", numericBinop div),
              ("mod", numericBinop mod),
              ("quotient", numericBinop quot),
              ("remainder", numericBinop rem)]

Look at the type of primitives. It is a list of pairs, just like lookup expects, but the values of the pairs are functions from [LispVal] to LispVal. In Haskell, you can easily store functions in other data structures, though the functions must all have the same type.

Also, the functions that we store are themselves the result of a function, numericBinop, which we haven't defined yet. This takes a primitive Haskell function (often an operator section) and wraps it with code to unpack an argument list, apply the function to it, and wrap the result up in our Number constructor.

numericBinop :: (Integer -> Integer -> Integer) -> [LispVal] -> LispVal
numericBinop op params = Number $ foldl1 op $ map unpackNum params

unpackNum :: LispVal -> Integer
unpackNum (Number n) = n
unpackNum (String n) = let parsed = reads n :: [(Integer, String)] in 
                           if null parsed 
                              then 0
                              else fst $ parsed !! 0
unpackNum (List [n]) = unpackNum n
unpackNum _ = 0

As with R5RS Scheme, we don't limit ourselves to only two arguments. Our numeric operations can work on a list of any length, so (+ 2 3 4) = 2 + 3 + 4, and (- 15 5 3 2) = 15 - 5 - 3 - 2. We use the built-in function foldl1 to do this. It essentially changes every cons operator in the list to the binary function we supply, op.

Unlike R5RS Scheme, we're implementing a form of weak typing. That means that if a value can be interpreted as a number (like the string "2"), we'll use it as one, even if it's tagged as a string. We do this by adding a couple extra clauses to unpackNum. If we're unpacking a string, attempt to parse it with Haskell's built-in reads function, which returns a list of pairs of (parsed value, remaining string).

For lists, we pattern-match against the one-element list and try to unpack that. Anything else falls through to the next case.

If we can't parse the number, for any reason, we'll return 0 for now. We'll fix this shortly so that it signals an error.

Compile and run this the normal way. Note how we get nested expressions "for free" because we call eval on each of the arguments of a function:

$ ghc -package parsec -o eval listing7.hs
$ ./eval "(+ 2 2)"
4
$ ./eval "(+ 2 (-4 1))"
2
$ ./eval "(+ 2 (- 4 1))"
5
$ ./eval "(- (+ 4 6 3) 3 5 2)"
3
Exercises
  1. Add primitives to perform the various type-testing functions of R5RS: symbol?, string?, number?, etc.
  2. Change unpackNum so that it always returns 0 if the value is not a number, even if it's a string or list that could be parsed as a number.
  3. Add the symbol-handling functions from R5RS. A symbol is what we've been calling an Atom in our data constructors


Write Yourself a Scheme in 48 Hours
 ← Parsing Evaluation, Part 1 Error Checking and Exceptions →