This will be a very non-traditional introduction to the ideas behind category theory. It will essentially be a slice through model theory (presented in a more programmer-friendly manner) with an unusual organization. Of course the end result will be ***SPOILER ALERT*** it was category theory all along. A secret decoder ring will be provided at the end. This approach is inspired by the notion of an internal logic/language and by Vaughn Pratt’s paper The Yoneda Lemma Without Category Theory.
I want to be very clear, though. This is not meant to be an analogy or an example or a guide for intuition. This is category theory. It is simply presented in a different manner.
Dan Doel pointed me at some draft lecture notes by Mike Shulman that seem very much in the spirit of this blog post (albeit aimed at an audience familiar with category theory): Categorical Logic from a Categorical Point of View. A theory in my sense corresponds to a category presentation (Definition 1.7.1) as defined by these lecture notes. Its oft-mentioned Appendix A will also look very familiar.
The first concept we’ll need is that of a theory. If you’ve ever implemented an interpreter for even the simplest language, then most of what follows modulo some terminological differences should be both familiar and very basic. If you are familiar with algebraic semantics, then that is exactly what is happening here only restricting to unary (but multi-sorted) algebraic theories.
For us, a theory,
A (raw) term in a theory is either a variable labelled by a sort,
data Term = Var Sort | Apply FunctionSymbol Term
Using GADTs, we could capture the sorting constraints as well:
data Term (s :: Sort) (t :: Sort) where
Var :: Term t t
Apply :: FunctionSymbol x t -> Term s x -> Term s t
An important operation on terms is substitution. Given a term
If
then . If
then .
Using the theory from before, we have:
As a shorthand, for arbitrary terms
Finally, equations2. An equation is a pair of terms with equal source and target,
for example,
Expanding the theory from before, we get the theory of isomorphisms,
In traditional model theory or universal algebra we tend to focus on multi-ary operations, i.e. function symbols that can take
multiple inputs. By restricting ourselves to only unary function symbols, we expose a duality. For every theory
Given two theories Either
type. Here we’ll write
The collection of equations for
The above is probably unreadable. If you work through it, you can show that every term of
To establish the equivalence between terms of
The above might seem a bit round about. An alternative approach would be to define the function symbols of
As a concrete and useful example, consider the theory
Nevertheless, for clarity I will typically write a term of a product theory as a pair of terms.
As a relatively easy exercise — easier than the above — you can formulate and define the disjoint sum of two theories
Related to these, we have the theory
Sometimes we’d like to talk about function symbols whose source is in one theory and target is in another. As a simple example, that we’ll explore in more depth later, we may want function symbols whose sources are in a product theory. This would let us consider terms with multiple inputs.
The natural way to achieve this is to simply make a new theory that contains sorts from both theories plus the new function symbols.
A collage,
Summarizing, we have
Here’s a somewhat silly example. Consider
More usefully, if a bit degenerately, every theory induces a collage in the following way. Given a theory
You can think of a bridging term in a collage as a sequence of function symbols partitioned into two parts by a bridge. Naturally,
we might consider partitioning into more than two parts by having more than one bridge. It’s easy to generalize the definition
of collage to combine an arbitrary number of theories, but I’ll take a different, probably worse, route. Given
collages
This composition is associative… almost. Furthermore, the collages generated by theories,
To talk about isomorphism of theories we need the notion of…
An interpretation of a theory gives meaning to the syntax of a theory. There are two nearly identical notions of
interpretation for us: interpretation (into sets) and interpretation into a theory. I’ll define them in parallel.
An interpretation (into a theory),
We extend the mapping to a mapping on terms via:
, i.e. the identity function, or, for interpretation into a theory, or, for interpretation into a theory,
and we require that for any equation of the theory,
An interpretation of the theory of isomorphisms produces a bijection between two specified sets. Spelling out a simple example where
plus the proof
As another simple example, we can interpret the theory of isomorphisms into itself slightly non-trivially.
As an (easy) exercise, you should define foldNat
. Something
very interesting happens when you consider what an interpretation of the collage generated by a theory,
Two theories are isomorphic if there exists interpretations
As a crucially important example, the set of terms,
We have, dually,
We can abstract from both parameters making
Via an abuse of notation, I’ll identify
The theories we’ve presented are (multi-sorted) universal algebra theories. Universal algebra allows us to specify a general notion of “homomorphism” that generalizes monoid homomorphism or group homomorphism or ring homomorphism or lattice homomorphism.
In universal algebra, the algebraic theory of groups consists of a single sort, a nullary operation,
Say
is a group homomorphism from the group to the group and then:
Using something more akin to our notation, it would look like:
The
So a homomorphism
Another way to look at homomorphisms is via collages. A homomorphism
The homomorphism law guarantees that it satisfies the equation on
Consider a homomorphism
Looking at this equation, the possibility of viewing it as a recursive “definition” leaps out
suggesting that the action of
We can easily establish that there’s a one-to-one correspondence between the
set of homomorphisms
which clearly satisfies the condition on homomorphisms by definition. It’s easy to verify
that
We can state something stronger. Given any homomorphism
Verifying that this is a homomorphism is straightforward:
and like any homomorphism of this form, as we’ve just established, it is completely determined by
it’s action on variables, namely
There is another result, dual in a different way, called the co-Yoneda lemma. It turns out it is a corollary of
the fact that for a collage
Note, that the Yoneda and co-Yoneda lemmas only apply to interpretations into sets as
The Yoneda lemma suggests that the interpretations
We call an interpretation,
As a rather liberating exercise, you should establish the following result
called parameterized representability. Assume we have theories
Let’s look at some examples.
As a not-so-special case of representability, we can consider
Define
for any function symbol and term, , of sort , note .
This equation states what the isomorphism also fairly directly states: there is exactly one term of sort
for any function symbol , note .
For the next example, I’ll leverage collages. Consider the collage
One aspect of note is regardless of whether
Define an interpretation
More generally, whenever we have the situation
Here’s a few of exercises using this. First, a moderately challenging one (until you catch the pattern):
spell out the details to the left adjoint to
Finite products start to lift us off the ground. So far the theories we’ve been working with have been extremely basic: a language with only unary functions, all terms being just a sequence of applications of function symbols. It shouldn’t be underestimated though. It’s more than enough to do monoid and group theory. A good amount of graph theory can be done with just this. And obviously we were able to establish several general results assuming only this structure. Nevertheless, while we can talk about specific groups, say, we can’t talk about the theory of groups. Finite products change this.
A theory with finite products allows us to talk about multi-ary function symbols and terms by
considering unary function symbols from products. This allows us to do all of universal
algebra. For example, the theory of groups,
A signature,
data SigTerm
= SigVar [Sort] Int
| Fst SigTerm
| Snd SigTerm
| Unit Sort
| Pair SigTerm SigTerm
| SigApply FunctionSymbol [SigTerm]
At this point, sorting (i.e. typing) the terms is no longer trivial, though it is still pretty straightforward.
Sorts are either
where where where where and
The point of a signature was to represent a theory so we can compile a term of a signature
into a term of a theory with finite products. The theory generated from a signature
where means the -fold application of
As you may have noticed, the generated theory will have an infinite number of sorts, an infinite number of function symbols, and an infinite number of equations no matter what the signature is — even an empty one! Having an infinite number of things isn’t a problem as long as we can algorithmically describe them and this is what the signature provides. Of course, if you’re a (typical) mathematician you nominally don’t care about an algorithmic description. Besides being compact, signatures present a nicer term language. The theories are like a core or assembly language. We could define a slightly nicer variation where we keep a context and manage named variables leading to terms-in-context like:
which is
for our current term language for signatures. Of course, compilation will be (slightly) trickier for the nicer language.
The benefit of having compiled the signature to a theory, in addition to being able to reuse the results we’ve established for theories, is we only need to define operations on the theory, which is simpler since we only need to deal with pairs and unary function symbols. One example of this is we’d like to extend our notion of interpretation to one that respects the structure of the signature, and we can do that by defining an interpretation of theories that respects finite products.
A finite product preserving interpretation (into a finite product theory),
where, for
With signatures, we can return to our theory, now signature, of groups.
or using the nicer syntax:
An actual group is then just a finite product preserving interpretation of (the theory generated by) this signature. All of universal algebra and much of abstract algebra can be formulated this way.
We can consider additionally assuming that our theory has exponentials. I left articulating exactly what that means as an exercise, but the upshot is we have the following two operations:
For any term
where and .
We can view these, together with the the product operations, as combinators, and it turns out we can compile the simply typed lambda calculus into the above theory. This is exactly what the Categorical Abstract Machine did. The “Caml” in “O’Caml” stands for “Categorical Abstract Machine Language”, though O’Caml no longer uses the CAM. Conversely, every term of the theory can be expressed as a simply typed lambda term. This means we can view the simply typed lambda calculus as just a different presentation of the theory.
At this point, this presentation of category theory starts to connect to the mainstream categorical literature on universal algebra, internal languages, sketches, and internal logic. This page gives a synopsis of the relationship between type theory and category theory. For some reason, it is unusual to talk about the internal language of a plain category, but that is exactly what we’ve done here.
I haven’t talked about finite limits or colimits beyond products and coproducts, nor have I
talked about even the infinitary versions of products and coproducts, let alone arbitrary
limits and colimits. These can be handled the same way as products and coproducts. Formulating
a language like signatures or the simply typed lambda calculus is a bit more complicated, but
not that hard. I may make a follow-up article covering this among other things. I also have
a side project (don’t hold your breath), that implements the internal language of a category
with finite limits. The result looks roughly like a simple version of an algebraic specification
language like the OBJ family. The RING
theory
described in the Maude manual
gives an idea of what it would look like. In fact, here’s an example of the current actual
syntax I’m using.3
theory Categories
type O
type A
given src : A -> O
given tgt : A -> O
given id : O -> A
satisfying o:O | src (id o) = o, tgt (id o) = o
given c : { f:A, g:A | src f = tgt g } -> A
satisfying (f, g):{ f:A, g:A | src f = tgt g }
| tgt (c (f, g)) = tgt f, src (c (f, g)) = src g
satisfying "left unit" (o, f):{ o:O, f:A | tgt f = o }
| c (id o, f) = f
satisfying "right unit" (o, f):{ o:O, f:A | src f = o }
| c (f, id o) = f
satisfying "associativity" (f, g, h):{ f:A, g:A, h:A | src f = tgt g, src g = tgt h }
| c (c (f, g), h) = c (f, c (g, h))
endtheory
It turns out this is a particularly interesting spot in the design space. The fact that the theory of theories with finite limits is itself a theory with finite limits has interesting consequences. It is still relatively weak though. For example, it’s not possible to describe the theory of fields in this language.
There are other directions one could go. For example, the internal logic of monoidal categories is (a fragment of) ordered linear logic. You can cross this bridge either way. You can look at different languages and consider what categorical structure is needed to support the features of the language, or you can add features to the category and see how that impacts the internal language. The relationship is similar to the source language and a core/intermediate language in a compiler, e.g. GHC Haskell and System Fc.
If you’ve looked at category theory at all, you can probably make most of the connections without me telling you.
The table below outlines the mapping, but there are some subtleties. First, as a somewhat technical detail,
my definition of a theory corresponds to a small category, i.e. a category which has a set of objects and
a set of arrows. For more programmer types, you should think of “set” as Set
in Agda, i.e. similar to the *
kind in Haskell. Usually “category” means “locally small category” which may have a proper class of objects
and between any two objects a set of arrows (though the union of all those sets may be a proper class). Again,
for programmers, the distinction between “class” and “set” is basically the difference between Set
and Set1
in Agda.4 To make my definition of theory closer to this, all
that is necessary is instead of having a set of function symbols, have a family of sets indexed by pairs of
objects. Here’s what a partial definition in Agda of the two scenarios would look like:
-- Small category (the definition I used)
record SmallCategory : Set1 where
field
: Set
objects : Set
arrows : arrows -> objects
src : arrows -> objects
tgt ...
-- Locally small category
record LocallySmallCategory : Set2 where
field
: Set1
objects : objects -> objects -> Set
hom ...
-- Different presentation of a small category
record SmallCategory' : Set1 where
field
: Set
objects : objects -> objects -> Set
hom ...
The benefit of the notion of locally small category is that Set
itself is a locally small category. The
distinction I was making between interpretations into theories and interpretations into Set was due to
the fact that Set wasn’t a theory. If I used a definition theory corresponding to a locally small
category, I could have combined the notions of interpretation by making Set a theory. The notion
of a small category, though, is still useful. Also, an interpretation into Set corresponds to the
usual notion of a model or semantics, while interpretations into other theories was a less emphasized
concept in traditional model theory and universal algebra.
A less technical and more significant difference is that my definition of a theory doesn’t correspond to a category, but rather to a presentation of a category, from which a category can be generated. The analog of arrows in a category is terms, not function symbols. This is a bit more natural route from the model theory/universal algebra/programming side. Similarly, having an explicit collection of equations, rather than just an equivalence relation on terms is part of the presentation of the category but not part of the category itself.
model theory | category theory |
---|---|
sort | object |
term | arrow |
function symbol | generating arrow |
theory | presentation of a (small) category |
collage | collage, cograph of a profunctor |
bridge | heteromorphism |
signature | presentation of a (small) category with finite products |
interpretation into sets, aka models | a functor into Set, a (co)presheaf |
interpretation into a theory | functor |
homomorphism | natural transformation |
simply typed lambda calculus (with products) | a cartesian closed category |
In some ways I’ve stopped just when things were about to get good. I may do a follow-up to elaborate on this good stuff. Some examples are: if I expand the definition so that Set becomes a “theory”, then interpretations also form such a “theory”, and these are often what we’re really interested in. The category of finite-product preserving interpretations of the theory of groups essentially is the category of groups. In fact, universal algebra is, in categorical terms, just the study of categories with finite products and finite-product preserving functors from them, particularly into Set. It’s easy to generalize this in many directions. It’s also easy to make very general definitions, like a general definition of a free algebraic structure. In general, we’re usually more interested in the interpretations of a theory than the theory itself.
While I often do advocate thinking in terms of internal languages of categories, I’m not sure that it is a preferable perspective for the very basics of category theory. Nevertheless, there are a few reasons for why I wrote this. First, this very syntactical approach is, I think, more accessible to someone coming from a programming background. From this view, a category is a very simple programming language. Adding structure to the category corresponds to adding features to this programming language. Interpretations are denotational semantics.
Another aspect about this presentation that is quite different is the use and emphasis on collages. Collages
correspond to profunctors, a crucially important and enabling concept that is rarely covered in categorical
introductions. The characterization of profunctors as collages in Vaughn Pratt’s paper (not using that name) was
one of the things I enjoyed about that paper and part of what prompted me to start writing this. In earlier
drafts of this article, I was unable to incorporate collages in a meaningful way as I was trying
to start from profunctors. This approach just didn’t add value. Collages just looked like a bizarre curio
and weren’t integrated into the narrative at all. For other reasons, though, I ended up revisiting the idea
of a heteromorphism. My (fairly superficial) opinion is that once you have the notion of functors and
natural transformations, adding the notion of heteromorphisms has a low power-to-weight ratio, though it does
make some things a bit nicer. Nevertheless, in thinking of how best to fit them into this context, it was
clear that collages provided the perfect mechanism (which isn’t a big surprise), and the result works rather
nicely. When I realized a fact that can be cryptically but compactly represented as
It’s actually better to organize this as a family of collections of function symbols indexed by pairs of sorts.↩︎
Instead of having equations that generate an equivalence relation on (raw) terms, we could simply require an equivalence relation on (raw) terms be directly provided.↩︎
Collaging is actually quite natural in this context. I already intend to support one theory importing another. A collage is just a theory that imports two others and then adds function symbols between them.↩︎
For programmers familiar with Agda, at least, if you haven’t made this connection, this might help you understand and appreciate what a “class” is versus a “set” and what “size issues” are, which is typically handled extremely vaguely in a lot of the literature.↩︎