How does HaxlSharp work?

June 11, 2016

I wrote a C# version of Haxl called HaxlSharp. The original Haxl paper by Marlow, et al. is brilliantly titled There is no Fork: an Abstraction for Efficient, Concurrent, and Concise Data Access, and provides a great overview of Haxl. This post will focus more on the differences between the original Haskell implementation and my C# implementation.

What is Fetch<>?

In HaxlSharp, we use query syntax to combine functions that return Fetch<> objects. Let’s say we have these three Fetch<> objects/ functions:

Fetch<string> a;
Fetch<int> b;
Func<string, Fetch<int>> c;

We can combine them like this:

from x in a
from y in b
from z in c(a)
select y + z;

Fetch<> is actually a free monad that collects lambda expression trees instead of lambda functions.

It divides these expression trees into groups of expressions that can be written applicatively, using a version of the ApplicativeDo algorithm that was recently merged into GHC1.

More...




What's wrong with async/ await?

June 10, 2016

Async/ await is great for writing sequential-looking code when you’re only waiting for a single asynchronous request at a time. But we often want to combine information from multiple data sources, like different calls on the same API, or multiple remote APIs.

The async/ await abstraction breaks down in these situations1. To illustrate, let’s say we have a blogging site, and a post’s metadata and content are retrieved using separate API calls. We could use async/ await to fetch both these pieces of information:

public Task<PostDetails> GetPostDetails(int postId)
{
    var postInfo = await FetchPostInfo(postId);
    var postContent = await FetchPostContent(postId);
    return new PostDetails(postInfo, postContent);
}

Here, we’re making two successive await calls, which means the execution will be suspended at the first request- FetchPostInfo- and only begin executing the second request- FetchPostContent- once the first request has completed.

But fetching FetchPostContent doesn’t require the result of FetchPostInfo, which means we could have started both these requests concurrently! Async/ await lets us write asynchronous code in a nice, sequential-looking way, but doesn’t let us write concurrent code like this.

More...




Generalized Algebraic Data Types I

Part IV of the series Fun with Functional C#.
May 9, 2016

This is the first of two articles on GADTs. This first part will be a general introduction to GADTs and their utility, while the second part will show how we can wrangle GADT behaviour out of C#.

The canonical GADT introduction involves a demonstration of the inadequacy of algebraic data types. But since this is written from a C# perspective, and C# doesn’t have GADTs, we’ll start with a brief introduction to vanilla ADTs.

Algebraic Data Types

Algebraic data types allow us a sort of type-level composition that’s more rigorous than what we have in C#. There are two ways to compose types in this algebra: products and sums, which are roughly analogous1 to products and sums over the integers.

Product types

Product types allow us to combine two or more types into one compound type. In Haskell, we can combine two types into a pair:

data Pair a b = Pair a b
More...




Visualizing the Metropolis Algorithm

April 2, 2016

Let’s say you’re doing some sort of Bayesian analysis. You’ll have a prior P(θ) over your model parameters θ. You get some data D, and you want to update on this data to get the posterior P(θ|D), the updated distribution of the model parameters given D. Let’s wheel out Bayes’ theorem and work out how to calculate P(θ|D), which for convenience we’ll call π(θ):

π(θ)=P(θ|D)=P(D|θ)P(θ)P(D)

If we can calculate P(D|θ) by using some sort of loss function, computing the numerator P(D|θ)P(θ) is relatively straightforward. And once we have the numerator, we’ve specified our posterior π(θ) up to the normalization constant P(D).

Computing the normalization constant is trickier. P(D) is the probability of seeing this data in the model, which means we have to integrate over all possible values of θ:

P(D)=ΘP(D|θ)P(θ)dθ

In most cases1, this won’t have a closed-form solution, and deterministic numerical integration can scale poorly with increasing dimensionality.

Monte Carlo integration

Let’s assume we can’t easily compute this integral; we can turn to Monte Carlo methods to estimate it instead. If we can directly draw from the posterior distribution π(θ), we can simply compute the density of π(θ) at a set of uniformly distributed values θ1,,θN that cover a broad range of the parameter space for θ:

P(D)=ΘP(D|θ)P(θ)dθ1NNi=1P(D|θ(i))

By the law of large numbers, our estimate will converge to the true distribution as N goes to infinity. But this only works if we can directly draw from π(θ). Many Bayesian models use arbitrarily complex distributions over θ that we can’t easily sample.

More...




Free monads in category theory

Part II of the series Free monads.
March 23, 2016

Forgetting how to multiply

It’s probably easiest to understand what a free monad is if we first understand forgetful functors1.

In category theory, a functor maps between categories, mapping objects to objects and morphisms to morphisms in a way that preserves compositionality2.

A forgetful functor is just a functor that discards some of the structure or properties of the input category.

For example, unital rings have objects (R,+,,0,1), where R is a set, and (+,) are binary operations with identity elements (0,1) respectively.

Let’s denote the category of all unital rings and their homomorphisms by Ring, and the category of all non-unital rings and their homomorphisms with Rng. We can now define a forgetful functor: I:RingRng, which just drops the multiplicative identity.

Similarly, we can define another forgetful functor A:RngAb, which maps from the category of rngs to the category of abelian groups. A discards the multiplicative binary operation, simply mapping all morphisms of multiplication to morphisms of addition.

Forgetting monoids

The forgetful functor A forgets ring multiplication. What happens if instead you forget addition? You get monoids! We can define monoids as the triple (S,,e), where S is a set, is an associative binary operation, and e is the neutral element of that operation.

The forgetful functor M:RingMon maps from the category of rings to the category of monoids, Mon, in which the objects are monoids, and the morphisms are monoid homomorphisms.

Monoid homomorphisms map between monoids in a way that preserves their monoidal properties. Given X, a monoid defined by (X,,e), and Y, a monoid defined by (Y,,f), a function ϕ:XY from X to Y is a monoid homomorphism iff:

it preserves compositionality3:

ϕ(ab)=ϕ(a)ϕ(b),abX

and maps the identity element: ϕ(e)=f

More...




Next