Generalized Algebraic Data Types I

Part IV of the series Fun with Functional C#.
May 9, 2016

This is the first of two articles on GADTs. This first part will be a general introduction to GADTs and their utility, while the second part will show how we can wrangle GADT behaviour out of C#.

The canonical GADT introduction involves a demonstration of the inadequacy of algebraic data types. But since this is written from a C# perspective, and C# doesn’t have GADTs, we’ll start with a brief introduction to vanilla ADTs.

Algebraic Data Types

Algebraic data types allow us a sort of type-level composition that’s more rigorous than what we have in C#. There are two ways to compose types in this algebra: products and sums, which are roughly analogous1 to products and sums over the integers.

Product types

Product types allow us to combine two or more types into one compound type. In Haskell, we can combine two types into a pair:

data Pair a b = Pair a b

Even though there are some important differences, at first this doesn’t seem all too different from what we might write in C#:

class Pair<A, B> {
  A a;
  B b;
}

We can now encode tuples of arbitrary size by nesting Pair constructors. For example:

triple :: Pair (Pair String Int) Int
triple = Pair (Pair "one" 2) 3

is equivalent to:

triple' :: (String, Int, Int)
triple' = ("one", 2, 3)

We’ll use the more compact tuple notation from now on.

Sum types

While a product type (a, b) is effectively a type that is both a and b, a sum type Sum a b is a type that can only be either a or b. In Haskell, we represent sum types like this:

data Either a b = Left a | Right b

An expression evaluator

We’re now ready for the canonical GADT introduction.

Let’s say we want to represent expressions for a simple calculator. We can do this with ADTs:

data Expr = Val Int
          | Add Expr Expr
          | Mult Expr Expr

We can now represent expressions like 6 * ((3 * 4) + (5 + 9)):

expr1 = Mult (Val 6) (Add (Mult (Val 3) (Val 4)) (Add (Val 5) (Val 9)))

Now we can very easily write an evaluator that will work for arbitrarily complex expressions:

eval :: Expr -> Int
eval (Val x) = x
eval (Add e1 e2) = eval e1 + eval e2
eval (Mult e1 e2) = eval e1 * eval e2
ghci> eval expr1
156

Because the evaluation and representation are separate, we can write multiple evaluators, like a pretty printer:

pretty :: Expr -> String
pretty (Val x) = show x
pretty (Add e1 e2) = "(" ++ pretty e1 ++ " + " ++ pretty e2 ++ ")"
pretty (Mult e1 e2) = "(" ++ pretty e1 ++ " * " ++ pretty e2 ++ ")"
ghci> pretty expr1
"(6 * ((3 * 4) + (5 + 9)))"

Extending the expression

The allure of this style is clear; we have a declarative representation of our data, and we can interpret it in various ways by simply defining a function from Expr -> a, for any a we choose.

Let’s extend our expression type, and see if we can maintain this elegant style.

data Expr' = IntVal Int
           | BoolVal Bool
           | AddInt Expr' Expr'
           | MultInt Expr' Expr'
           | GreaterThan Expr' Expr'

Now we have a greater than operation that can evaluate to a boolean, allowing us to represent expressions like 3 + (2 * 4) > 9 * (9 + 2), which should ideally evaluate to False.

How would we go about writing our evaluator? Well, we could either return an Int or a Bool, so our type should be Either Int Bool.

But hang on, we have MultInt Expr' Expr', which means MultInt accepts anything of type Expr', even BoolVal! We could write something like:

MultInt (GreaterThan (AddInt (IntVal 3) (IntVal 4)) (IntVal 2)) (IntVal 9))

This is effectively ((3 + 4) > 2) * 9, which simplifies to False * 9, which is clearly nonsensical. This is a valid value of Expr', however, so we need to represent the possibility that the evaluation can fail by wrapping the return type in a Maybe:

eval' :: Expr' -> Maybe (Either Int Bool)
eval' (AddInt e1 e2) = case (eval' e1, eval' e2) of
  (Just (Left i1), Just (Left i2)) -> Just (Left $ i1 + i2)
  _ -> Nothing
eval' (MultInt e1 e2) = case (eval' e1, eval' e2) of
  (Just (Left i1), Just (Left i2)) -> Just (Left $ i1 * i2)
  _ -> Nothing
eval' (GreaterThan e1 e2) = case (eval' e1, eval' e2) of
  (Just (Left i1), Just (Left i2)) -> Just (Right $ i1 > i2)
eval' (IntVal x) = Just (Left x)
eval' (BoolVal b) = Just (Right b)

This works as expected:

ghci> -- 3 > 2:
ghci> eval' $ GreaterThan (IntVal 3) (IntVal 2)
Just (Right True)

ghci> -- (3 + 5) > 2:
ghci> eval' $ GreaterThan (AddInt (IntVal 3) (IntVal 5)) (IntVal 2)
Just (Right False)

ghci> -- (5 > 2) * 6:
ghci> eval' $ MultInt (GreaterThan (IntVal 5) (IntVal 2)) (IntVal 6)
Nothing

But we’ve completely lost the elegance and obviousness of the initial implementation. And this is a relatively simple evaluator; a more complex evaluator will be almost entirely consumed with complex and error-prone logic for validating expressions.

A better solution?

A better solution would be to make invalid expressions fail to typecheck, and offload all this tedious logic to the compiler:

data BetterExpr t =
    IntVal Int
  | BoolVal Bool
  | MultInt (BetterExpr Int) (BetterExpr Int)
  | AddInt (BetterExpr Int) (BetterExpr Int)
  | GreaterThan (BetterExpr Int) (BetterExpr Int)

But these invalid expressions still typecheck:

ghci> :t GreaterThan (BoolVal True) (IntVal 3)
GreaterThan (BoolVal True) (IntVal 3) :: BetterExpr t

We can hack around this by defining our own value constructors that enforce the correct types, and only exporting those. That way, users of our module won’t be able to construct invalid types:

module Expression (BetterExpr, boolVal, intVal, greaterThan, multInt) where

intVal :: Int -> BetterExpr Int
intVal = IntVal

boolVal :: Bool -> BetterExpr Bool
boolVal = BoolVal

greaterThan :: BetterExpr Int -> BetterExpr Int -> BetterExpr Bool
greaterThan = GreaterThan

-- etc
ghci> greaterThan (boolVal True) (intVal 3)

"Couldn't match type ‘Bool’ with ‘Int’
    Expected type: BetterExpr Int
      Actual type: BetterExpr Bool"

Evaluating better

Let’s try to evaluate our BetterExpr:

evaluate :: BetterExpr t -> t
evaluate (IntVal i) = i
evaluate (BoolVal b) = b
-- etc

Unfortunately, this doesn’t compile. This is because the type parameter t in BetterExpr t isn’t at all related to the types mentioned in the constructors! For instance, we could define:

IntVal 3 :: BetterExpr Bool

This typechecks! We can think of IntVal and other value constructors as functions that return a BetterExpr t:

-- The value constructors of this type:
data BetterExpr t =
    IntVal Int
  | BoolVal Bool
  | MultInt (BetterExpr Int) (BetterExpr Int)
  | AddInt (BetterExpr Int) (BetterExpr Int)
  | GreaterThan (BetterExpr Int) (BetterExpr Int)

-- can be written as:

IntVal :: Int -> BetterExpr t
BoolVal :: Bool -> BetterExpr t
MultInt :: BetterExpr Int -> BetterExpr Int -> BetterExpr t
AddInt :: BetterExpr Int -> BetterExpr Int -> BetterExpr t
GreaterThan :: BetterExpr Int -> BetterExpr Int -> BetterExpr t

This syntax makes the problem clear: all value constructors underspecify their return type. This syntax also makes the solution obvious- just specify the return types! This is exactly what GADTs allow us to do.

GADTs

We need to turn on a language extension to use GADTs:

{-# LANGUAGE GADTs #-}

Now we can write type declarations exactly how we want:

data BetterExpr t where
  IntVal :: Int -> BetterExpr Int
  BoolVal :: Bool -> BetterExpr Bool
  MultInt :: BetterExpr Int -> BetterExpr Int -> BetterExpr Int
  AddInt :: BetterExpr Int -> BetterExpr Int -> BetterExpr Int
  GreaterThan :: BetterExpr Int -> BetterExpr Int -> BetterExpr Bool

Invalid expressions that previously typechecked are now rejected by the compiler:

ghci> MultInt (GreaterThan (IntVal 3) (IntVal 2)) (IntVal 2)
   "Couldn't match type ‘Bool’ with ‘Int’
    Expected type: BetterExpr Int
      Actual type: BetterExpr Bool"

and writing our evaluator is again extremely straightforward:

eval :: BetterExpr t -> t
eval (IntVal i) = i
eval (BoolVal b) = b
eval (MultInt e1 e2) = eval e1 * eval e2
eval (AddInt e1 e2) = eval e1 + eval e2
eval (GreaterThan e1 e2) = eval e1 > eval e2
ghci> eval $ MultInt (IntVal 3) (IntVal 2)
6

ghci> eval $ GreaterThan (IntVal 3) (IntVal 2)
True

That’s all for today! Next time we’ll implement GADTs in C#.


  1. Or more formally, they’re equivalent up to an isomorphism in that they both form a semiring, with types as the operands, product/ sum composition as the binary operator, and Unit/ Void as the respective identity elements.