Generalized Algebraic Data Types I
This is the first of two articles on GADTs. This first part will be a general introduction to GADTs and their utility, while the second part will show how we can wrangle GADT behaviour out of C#.
The canonical GADT introduction involves a demonstration of the inadequacy of algebraic data types. But since this is written from a C# perspective, and C# doesn’t have GADTs, we’ll start with a brief introduction to vanilla ADTs.
Algebraic Data Types
Algebraic data types allow us a sort of type-level composition that’s more rigorous than what we have in C#. There are two ways to compose types in this algebra: products and sums, which are roughly analogous1 to products and sums over the integers.
Product types
Product types allow us to combine two or more types into one compound type. In Haskell, we can combine two types into a pair:
data Pair a b = Pair a b
Even though there are some important differences, at first this doesn’t seem all too different from what we might write in C#:
class Pair<A, B> {
A a;
B b;
}
We can now encode tuples of arbitrary size by nesting Pair
constructors. For example:
triple :: Pair (Pair String Int) Int
triple = Pair (Pair "one" 2) 3
is equivalent to:
triple' :: (String, Int, Int)
triple' = ("one", 2, 3)
We’ll use the more compact tuple notation from now on.
Sum types
While a product type (a, b)
is effectively a type that is both a
and b
, a sum type Sum a b
is a type that can only be either a
or b
. In Haskell, we represent sum types like this:
data Either a b = Left a | Right b
An expression evaluator
We’re now ready for the canonical GADT introduction.
Let’s say we want to represent expressions for a simple calculator. We can do this with ADTs:
data Expr = Val Int
| Add Expr Expr
| Mult Expr Expr
We can now represent expressions like 6 * ((3 * 4) + (5 + 9))
:
expr1 = Mult (Val 6) (Add (Mult (Val 3) (Val 4)) (Add (Val 5) (Val 9)))
Now we can very easily write an evaluator that will work for arbitrarily complex expressions:
eval :: Expr -> Int
eval (Val x) = x
eval (Add e1 e2) = eval e1 + eval e2
eval (Mult e1 e2) = eval e1 * eval e2
ghci> eval expr1
156
Because the evaluation and representation are separate, we can write multiple evaluators, like a pretty printer:
pretty :: Expr -> String
pretty (Val x) = show x
pretty (Add e1 e2) = "(" ++ pretty e1 ++ " + " ++ pretty e2 ++ ")"
pretty (Mult e1 e2) = "(" ++ pretty e1 ++ " * " ++ pretty e2 ++ ")"
ghci> pretty expr1
"(6 * ((3 * 4) + (5 + 9)))"
Extending the expression
The allure of this style is clear; we have a declarative representation of our data, and we can interpret it in various ways by simply defining a function from Expr -> a
, for any a
we choose.
Let’s extend our expression type, and see if we can maintain this elegant style.
data Expr' = IntVal Int
| BoolVal Bool
| AddInt Expr' Expr'
| MultInt Expr' Expr'
| GreaterThan Expr' Expr'
Now we have a greater than operation that can evaluate to a boolean, allowing us to represent expressions like 3 + (2 * 4) > 9 * (9 + 2)
, which should ideally evaluate to False
.
How would we go about writing our evaluator? Well, we could either return an Int
or a Bool
, so our type should be Either Int Bool
.
But hang on, we have MultInt Expr' Expr'
, which means MultInt
accepts anything of type Expr'
, even BoolVal
! We could write something like:
MultInt (GreaterThan (AddInt (IntVal 3) (IntVal 4)) (IntVal 2)) (IntVal 9))
This is effectively ((3 + 4) > 2) * 9
, which simplifies to False * 9
, which is clearly nonsensical. This is a valid value of Expr'
, however, so we need to represent the possibility that the evaluation can fail by wrapping the return type in a Maybe
:
eval' :: Expr' -> Maybe (Either Int Bool)
eval' (AddInt e1 e2) = case (eval' e1, eval' e2) of
(Just (Left i1), Just (Left i2)) -> Just (Left $ i1 + i2)
_ -> Nothing
eval' (MultInt e1 e2) = case (eval' e1, eval' e2) of
(Just (Left i1), Just (Left i2)) -> Just (Left $ i1 * i2)
_ -> Nothing
eval' (GreaterThan e1 e2) = case (eval' e1, eval' e2) of
(Just (Left i1), Just (Left i2)) -> Just (Right $ i1 > i2)
eval' (IntVal x) = Just (Left x)
eval' (BoolVal b) = Just (Right b)
This works as expected:
ghci> -- 3 > 2:
ghci> eval' $ GreaterThan (IntVal 3) (IntVal 2)
Just (Right True)
ghci> -- (3 + 5) > 2:
ghci> eval' $ GreaterThan (AddInt (IntVal 3) (IntVal 5)) (IntVal 2)
Just (Right False)
ghci> -- (5 > 2) * 6:
ghci> eval' $ MultInt (GreaterThan (IntVal 5) (IntVal 2)) (IntVal 6)
Nothing
But we’ve completely lost the elegance and obviousness of the initial implementation. And this is a relatively simple evaluator; a more complex evaluator will be almost entirely consumed with complex and error-prone logic for validating expressions.
A better solution?
A better solution would be to make invalid expressions fail to typecheck, and offload all this tedious logic to the compiler:
data BetterExpr t =
IntVal Int
| BoolVal Bool
| MultInt (BetterExpr Int) (BetterExpr Int)
| AddInt (BetterExpr Int) (BetterExpr Int)
| GreaterThan (BetterExpr Int) (BetterExpr Int)
But these invalid expressions still typecheck:
ghci> :t GreaterThan (BoolVal True) (IntVal 3)
GreaterThan (BoolVal True) (IntVal 3) :: BetterExpr t
We can hack around this by defining our own value constructors that enforce the correct types, and only exporting those. That way, users of our module won’t be able to construct invalid types:
module Expression (BetterExpr, boolVal, intVal, greaterThan, multInt) where
intVal :: Int -> BetterExpr Int
intVal = IntVal
boolVal :: Bool -> BetterExpr Bool
boolVal = BoolVal
greaterThan :: BetterExpr Int -> BetterExpr Int -> BetterExpr Bool
greaterThan = GreaterThan
-- etc
ghci> greaterThan (boolVal True) (intVal 3)
"Couldn't match type ‘Bool’ with ‘Int’
Expected type: BetterExpr Int
Actual type: BetterExpr Bool"
Evaluating better
Let’s try to evaluate our BetterExpr
:
evaluate :: BetterExpr t -> t
evaluate (IntVal i) = i
evaluate (BoolVal b) = b
-- etc
Unfortunately, this doesn’t compile. This is because the type parameter t
in BetterExpr t
isn’t at all related to the types mentioned in the constructors! For instance, we could define:
IntVal 3 :: BetterExpr Bool
This typechecks! We can think of IntVal
and other value constructors as functions that return a BetterExpr t
:
-- The value constructors of this type:
data BetterExpr t =
IntVal Int
| BoolVal Bool
| MultInt (BetterExpr Int) (BetterExpr Int)
| AddInt (BetterExpr Int) (BetterExpr Int)
| GreaterThan (BetterExpr Int) (BetterExpr Int)
-- can be written as:
IntVal :: Int -> BetterExpr t
BoolVal :: Bool -> BetterExpr t
MultInt :: BetterExpr Int -> BetterExpr Int -> BetterExpr t
AddInt :: BetterExpr Int -> BetterExpr Int -> BetterExpr t
GreaterThan :: BetterExpr Int -> BetterExpr Int -> BetterExpr t
This syntax makes the problem clear: all value constructors underspecify their return type. This syntax also makes the solution obvious- just specify the return types! This is exactly what GADTs allow us to do.
GADTs
We need to turn on a language extension to use GADTs:
{-# LANGUAGE GADTs #-}
Now we can write type declarations exactly how we want:
data BetterExpr t where
IntVal :: Int -> BetterExpr Int
BoolVal :: Bool -> BetterExpr Bool
MultInt :: BetterExpr Int -> BetterExpr Int -> BetterExpr Int
AddInt :: BetterExpr Int -> BetterExpr Int -> BetterExpr Int
GreaterThan :: BetterExpr Int -> BetterExpr Int -> BetterExpr Bool
Invalid expressions that previously typechecked are now rejected by the compiler:
ghci> MultInt (GreaterThan (IntVal 3) (IntVal 2)) (IntVal 2)
"Couldn't match type ‘Bool’ with ‘Int’
Expected type: BetterExpr Int
Actual type: BetterExpr Bool"
and writing our evaluator is again extremely straightforward:
eval :: BetterExpr t -> t
eval (IntVal i) = i
eval (BoolVal b) = b
eval (MultInt e1 e2) = eval e1 * eval e2
eval (AddInt e1 e2) = eval e1 + eval e2
eval (GreaterThan e1 e2) = eval e1 > eval e2
ghci> eval $ MultInt (IntVal 3) (IntVal 2)
6
ghci> eval $ GreaterThan (IntVal 3) (IntVal 2)
True
That’s all for today! Next time we’ll implement GADTs in C#.
Or more formally, they’re equivalent up to an isomorphism in that they both form a semiring, with types as the operands, product/ sum composition as the binary operator, and
Unit
/Void
as the respective identity elements.↩