Artwork . Games

Write Yourself a Scheme in Elm: Parsing

November 9, 2016

Simple Parser

Let’s try writing a very simple parser using the elm-combine package. You can add it to the project with elm package install Bogdanp/elm-combine.

Start by adding these lines to the import section:

import Combine
import Combine.Char as Combine
import Combine.Num as Combine
import Combine.Infix exposing (..)
import String

The easiest parser to start with is numbers. Scheme has a whole tower of number types, but for now, we will only worry about floats.

Combine provides some built-in parsers: for example, upper and digit are functions that match upper case letters and single number digits respectively. We might be able to combine multiple digit parsers to provide a simple number parser, but the “Combine.Num” module already provides a parser (float) that matches both positive and negative float numbers of any length.

The type of the float parser is not a function that parses floats, but Parser Float. You will need to provide it, as well as an input string, to Combine.parse to run the parser.

Let’s take a look at the full type of parse:

Parser res -> String -> Result (List String) res, Context)

You can ignore the Context type for now, since it is used internally to track the current state of the parser, like where the parsed value is in the input string.

A Result is the result of a computation that may fail and is a common way to manage errors in Elm. If parsing succeeds with the float parser, the value will be Ok value where value has type Float, and if it fails the value will be Err messages where messages is a list of strings.

Let’s define a function to call parse and handle any possible errors:

readExpression : String -> String
readExpression input =
    case Combine.parse float input of
        (Ok value, _) ->
            "Found value: " ++ (toString value)
        (Err messages, _) ->
            "No match: " ++ (String.join "\n" messages)

We use a case...of construction to pattern match the result of parse against the possible alternatives. If we get an Ok value, then we bind it to value, convert it to a string, and append it to the message “Found value: “. If we get an Err value, we bind it to messages, convert the list to a string using the String.join function, and append it to “No match: “.

Finally, we need to change our main function to call readExpression and output the result:

main =
    text (readExpression "123.45")

When you reload the web page, there should be a message:

Found value: 123.45

Exercises

  1. Try changing the input string to something that is not a float, for instance “!” or “123”, and see what happens to the output.

Combine Parsers

What if you also wanted to be able to handle numbers that do not contain a decimal point? Combine comes with another numerical parser called int that does just that.

In order to combine the two parsers you can use the built-in choice function:

choice : List (Parser res) -> Parser res

It takes a list of parsers of the same type and will try each one in order until one of them succeeds. The types of float and int are not the same, though, so how can you combine them with choice?

Typically, to convert a Float value to an Int, all you would need to do is use the built-in toFloat function, but the type of int is Parser Int. You need a way of transforming the value inside the parser.

Combine provides a function called map that you can think of like List.map, which modifies each value in a list and returns a new list with the updated values; Combine.map modifies the value inside a parser, returning a new parser.

The resulting value is a parser that will recognize floats or integers and return a float:

numberParser : Parser Float
numberParser =
    Combine.choice [ Combine.float, Combine.map toFloat Combine.int ]

Expressions

There are many other types of Scheme expressions other than numbers. To combine them with choice, you will need a more expressive type than Float.

In Elm, a type formed by combining other types is an algebraic data type:

type Expression
    = ENumber Float
    | EBool Bool
    | EString String
    | EIdentifier String
    | EList (List Expression)

Each alternative (called a constructor and separated by |) contains a tag for the constructor along with the type of data that that constructor can hold. In this example, an Expression can be:

  1. ENumber, containing an Elm float.
  2. EBool, containing an Elm boolean.
  3. EString, containing an Elm string.
  4. EIdentifier, containing an Elm string.
  5. EList, containing a list of expressions.

You can see that constructors can be recursive, for instance, an EList can have a list of any ENumber, EBool, EString, EIdentifier, and even Elist. Since a Scheme program is just nested lists of expressions, our data type seems capable of representing arbitrary Scheme programs.

Constructors and types have different namespaces, so you could have both a constructor named String and a type named String. Preceding each of the constructors with an “E” just helps make it easier to talk about “Scheme strings” as opposed to “Elm strings”. Both types and constructors always begin with capital letters.

Now that you have a type for expressions let’s change numberParser to return an Expression instead of Float. Each constructor is actually a function that takes a value of its type and returns an Expression, for instance, we can think of ENumber as Float -> Expression.

You already know how to convert the value inside a parser using Combine.map, so the new parser should be easy to create. Combine also provides many operators for common functions that can make parsers more succinct to read and write. The operator for map is <$>, and therefore the new parser should look something like this:

numberParser : Parser Expression
numberParser =
    ENumber <$> Combine.choice [ Combine.float, toFloat <$> Combine.int ]

When we change the input string to “123.45”, the new output will have changed to Found value: ENumber 123.45.

Up Next: Testing