Here’s a small example of why people find category theory interesting. Consider the
category, call it
Spelling things out,
Presheaves are extremely important in category theory, so categorists know a lot about them. It turns out
that they have a lot of structure (mostly because
But set that aside for now. This isn’t an article about graphs so let’s start generalizing away from them.
We’ll start with a baby step. You can probably guess how we’d go about making edge labeled graphs. We’d
add another object to e.s
, e.t
and
e.l_e
. Since e.s = e'.s
and e.t = e'.t
and e.l_e = e'.l_e
. This
is different from a normal abstract data type (in a purely functional language) where an abstract data type
with three fields would be isomorphic to a 3-tuple. The extra ability to differentiate them could be modeled
by having a fourth field that returned something that could only be compared for equality, kind of like
Data.Unique (ignoring the Ord
instance…) It may look like:
data Vertex -- abstract
vertexId :: Vertex -> Unique
data Label -- abstract
labelId :: Label -> Unique
data Edge -- abstract
edgeId :: Edge -> Unique
src :: Edge -> Vertex
tgt :: Edge -> Vertex
label :: Edge -> Label
Of course, if these are literally all the functions we have for these types (plus some constants otherwise we can’t make anything), then these abstract types might as well be records. Anything else they have is unobservable and thus superfluous.
data Vertex = Vertex { vertexId :: Unique }
data EdgeLabel = EdgeLabel { edgeLabelId :: Unique }
data Edge = Edge {
edgeId :: Unique,
src :: Vertex,
tgt :: Vertex,
edgeLabel :: EdgeLabel
}
This starts to suggest that we can just turn each object in our category
Something a bit more interesting happens if we want to label the vertices. We proceed as before: add another
object, call it
data VertexLabel = VertexLabel { vertexLabelId :: Unique }
data Vertex = Vertex {
vertexId :: Unique,
vertexLabel :: VertexLabel
}
data EdgeLabel = EdgeLabel { edgeLabelId :: Unique }
data Edge = Edge {
edgeId :: Unique,
src :: Vertex,
tgt :: Vertex,
edgeLabel :: EdgeLabel
-- With our earlier "rule" to add a field for each arrow
-- we should also have:
-- srcVertexLabel :: VertexLabel
-- tgtVertexLabel :: VertexLabel
-- but, by definition, these are:
-- vertexLabel . src and
-- vertexLabel . tgt respectively.
-- So we don't explicitly add a field for composite arrows.
}
The latter choice would look the same, except there would be an extra constraint that we can’t
capture in Haskell, namely vertexLabel . src == vertexLabel . tgt
.
If we stick to the abstract data type intuition and we have a cycle in the arrows in
Some of you have probably thought of another more appropriate intuition. A presheaf is a database.
The category which our presheaf is over, what we’ve been calling
For our earlier example:
CREATE VertexLabel (Id uniqueidentifier PRIMARY KEY);
CREATE Vertex (
Id uniqueidentifier PRIMARY KEY,
Label uniqueidentifier REFERENCES VertexLabel(Id)
);CREATE EdgeLabel (Id uniqueidentifier PRIMARY KEY);
CREATE Edge (
Id uniqueidentifier PRIMARY KEY,
REFERENCES VertexLabel(Id),
Src uniqueidentifier REFERENCES VertexLabel(Id),
Tgt uniqueidentifier Label uniqueidentifier REFERENCES EdgeLabel(Id)
-- Nothing stops us from having additional columns here and elsewhere
-- corresponding to the fact that our types were really abstract data
-- types in the Haskell, and to the fact that we can choose an arbitrary
-- set as long as it has at least this much structure. They won't be
-- preserved by "homomorphisms" though.
);
To be clear, each presheaf corresponds to a database with data. Inserting a row into a table is a
homomorphism of presheaves, but so is adding a (not preserved by homomorphism) column. The schema
above corresponds to the
If we had made that second choice before where we had only one arrow back that would lead to the following integrity constraint:
NOT EXISTS(SELECT *
FROM Edge e
JOIN Vertex src ON e.Src = src.Id
JOIN Vertex tgt ON e.Tgt = tgt.Id
WHERE src.Label <> tgt.Label)
It turns out this intuition is completely general. You can actually think of all of presheaf theory as a (slightly unusual) database theory. More objects just means more tables. More arrows means more foreign keys and potentially additional integrity constraints. It’s not exactly relational though. In fact, in some ways it’s a bit closer to SQL3 or object databases. You can turn this around too going back to the point at the beginning. Whether or not we can think of all presheaves like databases or all databases like presheaves, we can certainly think of this example, and many like it, as presheaves. This means for many “databases”, we can talk about doing dependently typed programming where our types are databases of the given shape. This also dramatically shifts the focus in database theory from databases to database migrations, i.e. homomorphisms of databases.
David Spivak is the one who has done the most on this, so I’ll refer you to his work. He also has a better story for schemas that are much closer to normal schemas. I’ll end as I began:
Here’s a small example of why people find category theory interesting.