Hedonistic Learning

Learning for the fun of it

Introduction

I want to talk about one of the many pretty areas of number theory. This involves the notion of an arithmetic function and related concepts. A few relatively simple concepts will allow us to produce a variety of useful functions and theorems. This provides only a glimpse of the start of the field of analytic number theory, though many of these techniques are used in other places as we’ll also start to see.

(See the end for a summary of identities and results.)

Prelude

As some notation, I’ll write |\mathbb N_+| for the set of positive naturals, and |\mathbb P| for the set of primes. |\mathbb N| will contain |0|. Slightly atypically, I’ll write |[n]| for the set of numbers from |1| to |n| inclusive, i.e. |a \in [n]| if and only if |1 \leq a \leq n|.

I find that the easiest way to see results in number theory is to view a positive natural number as a multiset of primes which is uniquely given by factorization. Coprime numbers are ones where these multisets are disjoint. Multiplication unions the multisets. The greatest common divisor is multiset intersection. |n| divides |m| if and only if |n| corresponds to a sub-multiset of |m|, in which case |m/n| corresponds to the multiset difference. The multiplicity of an element of a multiset is the number of occurrences. For a multiset |P|, |\mathrm{dom}(P)| is the set of elements of the multiset |P|, i.e. those with multiplicity greater than |0|. For a finite multiset |P|, |\vert P\vert| will be the sum of the multiplicities of the distinct elements, i.e. the number of elements (with duplicates) in the multiset.

We can represent a multiset of primes as a function |\mathbb P \to \mathbb N| which maps an element to its multiplicity. A finite multiset would then be such a function that is |0| at all but finitely many primes. Alternatively, we can represent the multiset as a partial function |\mathbb P \rightharpoonup \mathbb N_+|. It will be finite when it is defined for only finitely many primes. Equivalently, when it is a finite subset of |\mathbb P\times\mathbb N_+| (which is also a functional relation).

Unique factorization provides a bijection between finite multisets of primes and positive natural numbers. Given a finite multiset |P|, the corresponding positive natural number is |n_P = \prod_{(p, k) \in P} p^k|.

I will refer to this view often in the following.

Arithmetic Functions

An arithmetic function is just a function defined on the positive naturals. Usually, they’ll land in (not necessarily positive) natural numbers, but that isn’t required.

In most cases, we’ll be interested in the specific subclass of multiplicative arithmetic functions. An arithmetic function, |f|, is multiplicative if |f(1) = 1| and |f(ab) = f(a)f(b)| whenever |a| and |b| are coprime. We also have the notion of a completely multiplicative arithmetic function for which |f(ab) = f(a)f(b)| always. Obviously, completely multiplicative functions are multiplicative. Analogously, we also have a notion of (completely) additive where |f(ab) = f(a) + f(b)|. Warning: In other mathematical contexts, “additive” means |f(a+b)=f(a)+f(b)|. An obvious example of a completely additive function being the logarithm. Exponentiating an additive function will produce a multiplicative function.

For an additive function, |f|, we automatically get |f(1) = 0| since |f(1) = f(1\cdot 1) = f(1) + f(1)|.

Lemma: The product of two multiplicative functions |f| and |g| is multiplicative.
Proof: For |a| and |b| coprime, |f(ab)g(ab) = f(a)f(b)g(a)g(b) = f(a)g(a)f(b)g(b)|. |\square|

A parallel statement holds for completely multiplicative functions.

It’s also clear that a completely multiplicative function is entirely determined by its action on prime numbers. Since |p^n| is coprime to |q^n| whenever |p| and |q| are coprime, we see that a multiplicative function is entirely determined by its action on powers of primes. To this end, I’ll often define multiplicative/additive functions by their action on prime powers and completely multiplicative/additive functions by their action on primes.

Multiplicative functions aren’t closed under composition, but we do have that if |f| is completely multiplicative and |g| is multiplicative, then |f \circ g| is multiplicative when that composite makes sense.

Here are some examples. Not all of these will be used in the sequel.

  • The power function |({-})^z| for any |z|, not necessarily an integer, is completely multiplicative.
  • Choosing |z=0| in the previous, we see the constantly one function |\bar 1(n) = 1| is completely multiplicative.
  • The identity function is clearly completely multiplicative and is also the |z=1| case of the above.
  • The Kronecker delta function |\delta(n) = \begin{cases}1, & n = 0 \\ 0, & n \neq 0\end{cases}| is completely multiplicative. Often written |\varepsilon| in this context.
  • Define a multiplicative function via |\mu(p^n) = \begin{cases} -1, & n = 1 \\ 0, & n > 1\end{cases}| where |p| is prime. This is the Möbius function. More holistically, |\mu(n)| is |0| if |n| has any square factors, otherwise |\mu(n) = (-1)^k| where |k| is the number of (distinct) prime factors.
  • Define a completely multiplicative function via |\lambda(p) = -1|. |\lambda(n) = \pm 1| depending on whether there is an even or odd number of prime factors (including duplicates). This function is known as the Liouville function.
  • |\lambda(n) = (-1)^{\Omega(n)}| where |\Omega(n)| is the completely additive function which counts the number of prime factors of |n| including duplicates. |\Omega(n_P) = \vert P\vert|.
  • Define a multiplicative function via |\gamma(p^n) = -1|. |\gamma(n) = \pm 1| depending on whether there is an even or odd number of distinct prime factors.
  • |\gamma(n) = (-1)^{\omega(n)}| where |\omega(n)| is the additive function which counts the number of distinct prime factors of |n|. See Prime omega function. We also see that |\omega(n_P) = \vert\mathrm{dom}(P)\vert|.
  • The completely additive function for |q\in\mathbb P|, |\nu_q(p) = \begin{cases}1,&p=q\\0,&p\neq q\end{cases}| is the p-adic valuation.
  • It follows that the |p|-adic absolute value |\vert r\vert_p = p^{-\nu_p(r)}| is completely multiplicative. It can be characterized on naturals by |\vert p\vert_q = \begin{cases}p^{-1},&p=q\\1,&p\neq q\end{cases}|.
  • |\gcd({-}, k)| for a fixed |k| is multiplicative. Given any multiplicative function |f|, |f \circ \gcd({-},k)| is multiplicative. This essentially “restricts” |f| to only see the prime powers that divide |k|. Viewing the finite multiset of primes |P| as a function |\mathbb P\to\mathbb N|, |f(\gcd(p^n,n_P)) = \begin{cases}f(p^n),&n\leq P(p)\\f(p^{P(p)}),&n>P(p)\end{cases}|.
  • The multiplicative function characterized by |a(p^n) = p(n)| where |p(n)| is the partition function counts the number of abelian groups the given order. That this function is multiplicative is a consequence of the fundamental theorem of finite abelian groups.
  • The Jacobi symbol |\left(\frac{a}{n}\right)| where |a\in\mathbb Z| and |n| is an odd positive integer is a completely multiplicative function with either |a| or |n| fixed. When |n| is an odd prime, it reduces to the Legendre symbol. For |p| an odd prime, we have |(\frac{a}{p}) = a^{\frac{p-1}{2}} \pmod p|. This will always be in |\{-1, 0, 1\}| and can be alternately defined as |\left(\frac{a}{p}\right) = \begin{cases}0,&p\mid a\\1,&p\nmid a\text{ and }\exists x.x^2\equiv a\pmod p\\-1,&\not\exists x.x^2\equiv a\pmod p\end{cases}|. Therefore, |\left(\frac{a}{p}\right)=1| (|=0|) when |a| is a (trivial) quadratic residue mod |p|.
  • An interesting example which is not multiplicative nor additive is the arithmetic derivative. Let |p\in\mathbb P|. Define |\frac{\partial}{\partial p}(n)| via |\frac{\partial}{\partial p}(p) = 1|, |\frac{\partial}{\partial p}(q) = 0| for |q\neq p| and |q\in\mathbb P|, and |\frac{\partial}{\partial p}(nm) = \frac{\partial}{\partial p}(n)m + n\frac{\partial}{\partial p}(m)|. We then have |D_S = \sum_{p\in S}\frac{\partial}{\partial p}| for non-empty |S\subseteq\mathbb P| which satisfies the same product rule identity. This perspective views a natural number (or, more generally, a rational number) as a monomial in infinitely many variables labeled by prime numbers.
  • A Dirichlet character of modulus |m| is, by definition, a completely multiplicative function |\chi| satisfying |\chi(n + m) = \chi(n)| and |\chi(n)| is non-zero if and only if |n| is coprime to |m|. The Jacobi symbol |\left(\frac{({-})}{m}\right)| is a Dirichlet character of modulus |m|. |\bar 1| is the Dirichlet character of modulus |1|.

Dirichlet Series

Given an arithmetic function |f|, we define the Dirichlet series:

\[\mathcal D[f](s) = \sum_{n=1}^\infty \frac{f(n)}{n^s} = \sum_{n=1}^\infty f(n)n^{-s}\]

When |f| is a Dirichlet character, |\chi|, this is referred to as the (Dirichlet) |L|-series of the character, and the analytic continuation is the (Dirichlet) |L|-function and is written |L(s, \chi)|.

We’ll not focus much on when such a series converges. See this section of the above Wikipedia article for more details. Alternatively, we could talk about formal Dirichlet series. We can clearly see that if |s = 0|, then we get the sum |\sum_{n=1}^\infty f(n)| which clearly won’t converge for, say, |f = \bar 1|. We can say that if |f| is asymptotically bounded by |n^k| for some |k|, i.e. |f \in O(n^k)|, then the series will converge absolutely when the real part of |s| is greater than |k+1|. For |\bar 1|, it follows that |\mathcal D[\bar 1](x + iy)| is defined when |x > 1|. We can use analytic continuation to go beyond these limits.

See A Catalog of Interesting Dirichlet Series for a more reference-like listing. Beware differences in notation.

Dirichlet Convolution

Why is this interesting in this context? Let’s consider two arithmetic functions |f| and |g| and multiply their corresponding Dirichlet series. We’ll get:

\[\mathcal D[f](s)\mathcal D[g](s) = \sum_{n=1}^\infty h(n)n^{-s} = \mathcal D[h](s)\]

where now we need to figure out what |h(n)| is. But |h(n)| is going to be the sum of all the terms of the form |f(a)a^{-s}g(b)b^{-s} = f(a)g(b)(ab)^{-s}| where |ab = n|. We can thus write: \[h(n) = \sum_{ab=n} f(a)g(b) = \sum_{d\mid n} f(d)g(n/d)\] We’ll write this more compactly as |h = f \star g| which we’ll call Dirichlet convolution. We have thus shown a convolution theorem of the form \[\mathcal D[f]\mathcal D[g] = \mathcal D[f \star g]\]

The Kronecker delta serves as a unit to this operation which is reflected by |\mathcal D[\delta](s) = 1|.

In the same way we can view a sum of the form |\sum_{a+b=n}f(a)g(b)| that arises in “normal” convolution as a sum along the line |y = n - x|, we can view the sum |\sum_{ab=n}f(a)g(b)| as a sum along a hyperbola of the form |y = n/x|. For all of |\sum_{n=1}^\infty\sum_{k=1}^\infty f(n)g(k)|, |\sum_{n=1}^\infty\sum_{k=1}^n f(k)g(n-k)|, and |\sum_{n=1}^\infty\sum_{k\mid n}f(k)g(n/k)| we’re including |f(a)g(b)| for every |(a,b)\in\mathbb N_+\times\mathbb N_+| in the sum exactly once. The difference is whether we’re grouping the internal sum by rows, diagonals, or hyperbolas. This idea of summing hyperbolas can be expanded to a computational technique for sums of multiplicative functions called the Dirichlet hyperbola method.

Since we will primarily be interested in multiplicative functions, we should check that |f \star g| is a multiplicative function when |f| and |g| are.

Lemma: Assume |a| and |b| are coprime, and |f| and |g| are multiplicative. Then |(f \star g)(ab) = (f \star g)(a)(f \star g)(b)|.

Proof: Since |a| and |b| are coprime, they share no divisors besides |1|. This means every |d| such that |d \mid ab| factors as |d = d_a d_b| where |d_a \mid a| and |d_b \mid b|. More strongly, write |D_n = \{ d \in \mathbb N_+ \mid d \mid n\}|, then for any coprime pair of numbers |i| and |j|, we have |D_{ij} \cong D_i \times D_j| and that every pair |(d_i, d_j) \in D_i \times D_j| are coprime1. Thus,

\[\begin{flalign} (f \star g)(ab) & = \sum_{d \in D_{ab}} f(d)g((ab)/d) \tag{by definition} \\ & = \sum_{(d_a, d_b) \in D_a \times D_b} f(d_a d_b)g((ab)/(d_a d_b)) \tag{via the bijection} \\ & = \sum_{(d_a, d_b) \in D_a \times D_b} f(d_a)f(d_b)g(a/d_a)g(b/d_b) \tag{f and g are multiplicative} \\ & = \sum_{d_a \in D_a} \sum_{d_b \in D_b} f(d_a)f(d_b)g(a/d_a)g(b/d_b) \tag{sum over a Cartesian product} \\ & = \sum_{d_a \in D_a} f(d_a)g(a/d_a) \sum_{d_b \in D_b} f(d_b)g(b/d_b) \tag{undistributing} \\ & = \sum_{d_a \in D_a} f(d_a)g(a/d_a) (f \star g)(b) \tag{by definition} \\ & = (f \star g)(b) \sum_{d_a \in D_a} f(d_a)g(a/d_a) \tag{undistributing} \\ & = (f \star g)(b) (f \star g)(a) \tag{by definition} \\ & = (f \star g)(a) (f \star g)(b) \tag{commutativity of multiplication} \end{flalign}\] |\square|

It is not the case that the Dirichlet convolution of two completely multiplicative functions is completely multiplicative.

We can already start to do some interesting things with this. First, we see that |\mathcal D[\bar 1] = \zeta|, the Riemann zeta function. Now consider |(\bar 1 \star \bar 1)(n) = \sum_{k \mid n} 1 = d(n)|. |d(n)| is the divisor function which counts the number of divisors of |n|. We see that |\mathcal D[d](s) = \zeta(s)^2|. A simple but useful fact is |\zeta(s - z) = \mathcal D[(-)^z](s)|. This directly generalizes the result for |\mathcal D[\bar 1]| and also implies |\mathcal D[\operatorname{id}](s) = \zeta(s - 1)|.

Generalizing in a different way, we get the family of functions |\sigma_k = ({-})^k \star \bar 1|. |\sigma_k(n) = \sum_{d \mid n} d^k|. From the above, we see |\mathcal D[\sigma_k](s) = \zeta(s - k)\zeta(s)|.

Lemma: Given a completely multiplicative function |f|, we get |f(n)(g \star h)(n) = (fg \star fh)(n)|.
Proof: \[\begin{flalign} (fg \star fh)(n) & = \sum_{d \mid n} f(d)g(d)f(n/d)h(n/d) \\ & = \sum_{d \mid n} f(d)f(n/d)g(d)h(n/d) \\ & = \sum_{d \mid n} f(n)g(d)h(n/d) \\ & = f(n)\sum_{d \mid n} g(d)h(n/d) \\ & = f(n)(g \star h)(n) \end{flalign}\] |\square|

As a simple corollary, for a completely multiplicative |f|, |f \star f = f(\bar 1 \star \bar 1) = fd|.

Euler Product Formula

However, the true power of this is unlocked by the following theorem:

Theorem (Euler product formula): Given a multiplicative function |f| which doesn’t grow too fast, e.g. is |O(n^k)| for some |k > 0|, \[\mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s} = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-ns} = \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty f(p^n)p^{-ns}\right) \] where the series converges.

Proof: The last equality is simply using the fact that |f(p^0)p^0 = f(1) = 1| because |f| is multiplicative. The idea for the main part is similar to how we derived Dirichlet convolution. When we start to distribute out the infinite product, each term will correspond to the product of selections of a term from each series. When all but finitely many of those selections select the |1| term, we get |\prod_{(p, k) \in P}f(p^k)(p^k)^{-s}| where |P| is some finite multiset of primes induced by those selections. Therefore, |\prod_{(p, k) \in P}f(p^k)(p^k)^{-s} = f(n_P)n_P^{-s}|. Thus, by unique factorization, |f(n)n^{-s}| for every positive natural occurs in the sum produced by distributing the right-hand side exactly once.

In the case where |P| is not a finite multiset, we’ll have \[ \frac{\prod_{(p, k) \in P}f(p^k)}{\left(\prod_{(p, k) \in P}p^k\right)^s}\]

The denominator of this expression goes to infinity when the real part of |s| is greater than |0|. As long as the numerator doesn’t grow faster than the denominator (perhaps after restricting the real part of |s| to be greater than some bound), then this product goes to |0|. Therefore, the only terms that remain are these corresponding to the Dirichlet series on the left-hand side. |\square|

If we assume |f| is completely multiplicative, we can further simplify Euler’s product formula via the usual sum of a geometric series, |\sum_{n=0}^\infty x^n = (1-x)^{-1}|, to:

\[ \sum_{n=1}^\infty f(n)n^{-s} = \prod_{p \in \mathbb P}\sum_{n=0}^\infty (f(p)p^{-s})^n = \prod_{p \in \mathbb P}(1 - f(p)p^{-s})^{-1} \]

Now let’s put this to work. The first thing we can see is |\zeta(s) = \mathcal D[\bar 1](s) = \prod_{p\in\mathbb P}(1 - p^{-s})^{-1}|. But this lets us write |1/\zeta(s) = \prod_{p\in\mathbb P}(1 - p^{-s})|. If we look for a multiplicative function that would produce the right-hand side, we see that it must send a prime |p| to |-1| and |p^n| for |n > 1| to |0|. In other words, it’s the Möbius function |\mu| we defined before. So |\mathcal D[\mu](s) = 1/\zeta(s)|.

Using |\mathcal D[d](s) = \zeta(s)^2|, we see that \[\begin{flalign} \zeta(s)^2 & = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty p^{-ns}\right)^{-2} \\ & = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty (n+1)p^{-ns}\right)^{-1} \\ & = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty d(p^n)p^{-ns}\right)^{-1} \\ & = \mathcal D[d](s) \end{flalign}\] Therefore, |d(p^n) = n + 1|. This intuitively makes sense because the only divisors of |p^n| are |p^k| for |k = 0, \dots, n|, and for |a| and |b| coprime |d(ab) = \vert D_{ab} \vert = \vert D_a \times D_b\vert = \vert D_a\vert\vert D_b\vert = d(a)d(b)|.

Another result leveraging the theorem is given any multiplicative function |f|, we can define a new multiplicative function via |f^{[k]}(p^n) = \begin{cases}f(p^m), & km = n\textrm{ for }m\in\mathbb N \\ 0, & k \nmid n\end{cases}|.

Lemma: The operation just defined has the property that |\mathcal D[f^{[k]}](s) = \mathcal D[f](ks)|.
Proof: \[\begin{flalign} \mathcal D[f^{[k]}](s) & = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f^{[k]}(p^n)p^{-ns} \\ & = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f^{[k]}(p^{kn})p^{-nks} \\ & = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-nks} \\ & = \mathcal D[f](ks) \end{flalign}\] |\square|

Möbius Inversion

We can write a sum over some function, |f|, of the divisors of a given natural |n| as |(f \star \bar 1)(n) = \sum_{d \mid n} f(d)|. Call this |g(n)|. But then we have |\mathcal D[f \star \bar 1] = \mathcal D[f]\mathcal D[\bar 1] = \mathcal D[f]\zeta| and thus |\mathcal D[f] = \mathcal D[f]\zeta/\zeta = \mathcal D[(f \star \bar 1) \star \mu]|. Therefore, if we only have the sums |g(n) = \sum_{d \mid n} f(d)| for some unknown |f|, we can recover |f| via |f(n) = (g \star \mu)(n) = \sum_{d\mid n}g(d)\mu(n/d)|. This is Möbius inversion.

Formally:

\[g(n) = \sum_{d\mid n} f(d) \iff f(n) = \sum_{d \mid n} \mu(d)g(n/d)\]

As a simple example, we clearly have |\zeta(s)/\zeta(s) = 1 = \mathcal D[\delta](s)| so |\bar 1 \star \mu = \delta| or |\sum_{d \mid n}\mu(d) = 0| for |n > 1| and |1| when |n = 1|.

We also get generalized Möbius inversion via |\delta(n) = \delta(n)n^k = (\mu\star\bar 1)(n)n^k = (({-})^k\mu\star({-})^k)(n)|. Which is to say if |g(n) = \sum_{d\mid n}d^k f(n/d)| then |f(n) = \sum_{d\mid n} \mu(d)d^kg(n/d)|.

By considering logarithms, we also get a multiplicative form of (generalized) Möbius inversion: \[g(n) = \prod_{d\mid n}f(n/d)^{d^k} \iff f(n) = \prod_{d\mid n}g(n/d)^{\mu(d)d^k}\]

Theorem: As another guise of Möbius inversion, given any completely multiplicative function |h|, let |g(m) = \sum_{n=1}^\infty f(mh(n))|. Assuming these sums make sense, we can recover |f(k)| via |f(k) = \sum_{m=1}^\infty \mu(m)g(kh(m))|.

Proof: \[\begin{align} \sum_{m=1}^\infty \mu(m)g(kh(m)) & = \sum_{m=1}^\infty \mu(m)\sum_{n=1}^\infty f(kh(m)h(n)) \\ & = \sum_{N=1}^\infty \sum_{N=mn} \mu(m)f(kh(N)) \\ & = \sum_{N=1}^\infty f(kh(N)) \sum_{N=nm} \mu(m) \\ & = \sum_{N=1}^\infty f(kh(N)) (\mu\star\bar 1)(N) \\ & = \sum_{N=1}^\infty f(kh(N)) \delta(N) \\ & = f(k) \end{align}\] |\square|

This will often show up in the form of |r(x^{1/n})| or |r(x^{1/n})/n|, i.e. with |h(n)=n^{-1}| and |f_x(k) = r(x^k)| or |f_x(k) = kr(x^k)|. Typically, we’ll then be computing |f_x(1) = r(x)|.

Lambert Series

As a brief aside, it’s worth mentioning Lambert Series.

Given an arithmetic function |a|, these are series of the form: \[ \sum_{n=1}^\infty a(n) \frac{x^n}{1-x^n} = \sum_{n=1}^\infty a(n) \sum_{k=1}^\infty x^{kn} = \sum_{n=1}^\infty (a \star \bar 1)(n) x^n \]

This leads to: \[\sum_{n=1}^\infty \mu(n) \frac{x^n}{1-x^n} = x\] and: \[\sum_{n=1}^\infty \varphi(n) \frac{x^n}{1-x^n} = \frac{x}{(1-x)^2}\]

Inclusion-Exclusion

The Möbius and |\zeta| functions can be generalized to incidence algebras where this form is from the incidence algebra induced by the divisibility order2. A notable and relevant example of a Möbius functions for another, closely related, incidence algebra is when we consider the incidence algebra induced by finite multisets with the inclusion ordering. Let |T| be a finite multiset, we get |\mu(T) = \begin{cases}0,&T\text{ has repeated elements}\\(-1)^{\vert T\vert},&T\text{ is a set}\end{cases}|. Since we can view a natural number as a finite multiset of primes, and we can always relabel the elements of a finite multiset with distinct primes, this is equivalent to the Möbius function we’ve been using.

This leads to a nice and compact way of describing the principle of inclusion-exclusion. Let |A| and |S| be (finite) multisets with |S \subseteq A| and assume we have |f| and |g| defined on the set of sub-multisets of |A|. If \[g(A) = \sum_{S\subseteq A} f(S)\] then \[f(A) = \sum_{S\subseteq A}\mu(A\setminus S)g(S)\] and this is Möbius inversion for this notion of Möbius function. We can thus take a different perspective on Möbius inversion. If |P| is a finite multiset of primes, then \[g(n_P) = \sum_{Q\subseteq P}f(n_Q) \iff f(n_P) = \sum_{Q\subseteq P}\mu(P\setminus Q)g(n_Q)\] recalling that |Q\subseteq P \iff n_Q \mid n_P| and |n_{P\setminus Q} = n_P/n_Q| when |Q\subseteq P|.

We get traditional inclusion-exclusion by noting that |\mu(T)=(-1)^{\vert T\vert}| when |T| is a set, i.e. all elements have multiplicity at most |1|. Let |I| be a finite set and assume we have a family of finite sets, |\{T_i\}_{i\in I}|. Write |T = \bigcup_{i\in I}T_i| and define |\bigcap_{i\in\varnothing}T_i = T|.

Define \[f(J) = \left\vert\bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i\right\vert\] for |J\subseteq I|. In particular, |f(I) = 0|. |f(J)| is then the number of elements shared by all |T_i| for |i\notin J| and no |T_j| for |j\in J|. Every |x \in \bigcup_{i\in I}T_i| is thus associated to exactly one such subset of |I|, namely |\{j\in I\mid x\notin T_j\}|. Formally, |x \in \bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i \iff J = \{j\in I\mid x\notin T_j\}| so each |\bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i| is disjoint and \[g(J) = \sum_{S\subseteq J}f(S) = \left\vert\bigcup_{S\subseteq J}\left(\bigcap_{i\in I\setminus S}T_i\setminus\bigcup_{i \in S}T_i\right)\right\vert = \left\vert\bigcap_{i\in I\setminus J}T_i\right\vert \] for |J \subseteq I|. In particular, |g(I) = \vert\bigcup_{i\in I}T_i\vert|.

By the Möbius inversion formula for finite sets, we thus have: \[f(J) = \sum_{S\subseteq J}(-1)^{\vert J\vert - \vert S\vert}g(S)\] which for |J = I| gives: \[ 0 = \sum_{J\subseteq I}(-1)^{\vert I\vert - \vert J\vert}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert = \left\vert\bigcup_{i\in I}T_i\right\vert + \sum_{J\subsetneq I}(-1)^{\vert I\vert - \vert J\vert}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert \] which is equivalent to the more usual form: \[\left\vert\bigcup_{i\in I}T_i\right\vert = \sum_{J\subsetneq I}(-1)^{\vert I\vert - \vert J\vert - 1}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert = \sum_{\varnothing\neq J\subseteq I}(-1)^{\vert J\vert + 1}\left\vert\bigcap_{i\in J}T_i\right\vert \]

|\varphi|

An obvious thing to explore is to apply Möbius inversion to various arithmetic functions. A fairly natural first start is applying Möbius inversion to the identity function. From the above results, we know that this unknown function |\varphi| will satisfy |\mathcal D[\varphi](s) = \zeta(s-1)/\zeta(s) = \mathcal D[\operatorname{id}\star\mu](s)|. We also immediately have the property that |n = \sum_{d \mid n}\varphi(d)|. Using Euler’s product formula we have: \[\begin{flalign} \zeta(s-1)/\zeta(s) & = \prod_{p \in \mathbb P} \frac{1 - p^{-s}}{1 - p^{-s+1}} \\ & = \prod_{p \in \mathbb P} \frac{1 - p^{-s}}{1 - pp^{-s}} \\ & = \prod_{p \in \mathbb P} (1 - p^{-s})\sum_{n=0}^\infty p^n p^{-ns} \\ & = \prod_{p \in \mathbb P} \left(\sum_{n=0}^\infty p^n p^{-ns}\right) - \left(\sum_{n=0}^\infty p^n p^{-s} p^{-ns}\right) \\ & = \prod_{p \in \mathbb P} \left(\sum_{n=0}^\infty p^n p^{-ns}\right) - \left(\sum_{n=0}^\infty p^n p^{-(n + 1)s}\right) \\ & = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty p^n p^{-ns}\right) - \left(\sum_{n=1}^\infty p^{n-1} p^{-ns}\right) \\ & = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty (p^n - p^{n-1}) p^{-ns}\right) \\ & = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty \varphi(p^n) p^{-ns}\right) \\ & = \mathcal D[\varphi](s) \end{flalign}\]

So |\varphi| is the multiplicative function defined by |\varphi(p^n) = p^n - p^{n-1}|. For |p^n|, we can see that this counts the number of positive integers less than or equal to |p^n| which are coprime to |p^n|. There are |p^n| positive integers less than or equal to |p^n|, and every |p|th one is a multiple of |p| so |p^n/p = p^{n-1}| are not coprime to |p^n|. All the remainder are coprime to |p^n| since they don’t have |p| in their prime factorizations and |p^n| only has |p| in its. We need to verify that this interpretation is multiplicative. To be clear, we know that |\varphi| is multiplicative and that this interpretation works for |p^n|. The question is whether |\varphi(n)| for general |n| meets the above description, i.e. whether the number of coprime numbers less than |n| is multiplicative.

Theorem: The number of coprime numbers less than |n| is multiplicative and is equal to |\varphi(n)|.

Proof: |\varphi = \mu\star\operatorname{id}|. We have:

\[\begin{flalign} \varphi(n_P) & = \sum_{d\mid n_P}\mu(d)\frac{n_P}{d} \\ & = \sum_{Q\subseteq P}\mu(Q)\frac{n_P}{n_Q} \\ & = \sum_{Q\subseteq \mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{n_Q} \end{flalign}\]

We can see an inclusion-exclusion pattern. Specifically, let |C_k = \{ c \in [k] \mid \gcd(c, k) = 1\}| be the numbers less than or equal to |k| and coprime to |k|. Let |S_{k,m} = \{ c \in [k] \mid m \mid c\}|. We have |S_{k,a} \cap S_{k,b} = S_{k,\operatorname{lcm}(a,b)}|. Also, when |c \mid k|, then |\vert S_{k,c}\vert = k/c|. |C_{n_P} = [n_P] \setminus \bigcup_{p \in \mathrm{dom}(P)} S_{n_P,p}| because every number not coprime to |n_P| shares some prime factor with it. Applying inclusion-exclusion to the union yields \[\begin{align} \vert C_{n_P}\vert & = n_P - \sum_{\varnothing\neq Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert+1}\left\vert \bigcap_{p\in Q}S_{n_P,p}\right\vert \\ & = n_P + \sum_{\varnothing\neq Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{\prod_{p\in Q}p} \\ & = \sum_{Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{n_Q} \end{align}\] |\square|

Many of you will already have recognized that this is Euler’s totient function.

Combinatorial Species

The book Combinatorial Species and Tree-Like Structures has many examples where Dirichlet convolutions and Möbius inversion come up. A combinatorial species is a functor |\operatorname{Core}(\mathbf{FinSet})\to\mathbf{FinSet}|. Any permutation on a finite set can be decomposed into a collection of cyclic permutations. Let |U| be a finite set of cardinality |n| and |\pi : U \cong U| a permutation of |U|. For any |u\in U|, there is a smallest |k\in\mathbb N_+| such that |\pi^k(u) = u| where |\pi^{k+1} = \pi \circ \pi^k| and |\pi^0 = \operatorname{id}|. The |k| elements |\mathcal O(u)=\{\pi^{i-1}(u)\mid i\in[k]\}| make up a cycle of length |k|, and |\pi| restricted to |U\setminus O(u)| is a permutation on this smaller set. We can just inductively pull out another cycle until we run out of elements. Write |\pi_k| for the number of cycles of length |k| in the permutation |\pi|. We clearly have |n = \sum_{k=1}^\infty k\pi_k| as every cycle has |k| elements in it.

Write |\operatorname{fix}\pi| for the number of fixed points of |\pi|, i.e. the cardinality of the set |\{u\in U\mid \pi(u) = u\}|. Clearly, every element that is fixed by |\pi^k| needs to be in a cycle whose length divides |k|. This leads to the equation:

\[ \operatorname{fix}\pi^k = \sum_{d\mid k} d\pi_d = ((d \mapsto d\pi_d) \star \bar 1)(k)\]

Since |F(\pi^k) = F(\pi)^k| for a combinatorial species |F|, Möbius inversion, as explicitly stated in Proposition 2.2.3 of Combinatorial Species and Tree-Like Structures, leads to:

\[k(F(\pi))_k = \sum_{d\mid k}\mu\left(\frac{k}{d}\right)\operatorname{fix}F(\pi^d) = (\mu\star(d\mapsto \operatorname{fix}F(\pi^d)))(k) \]

If we Dirichlet convolve both sides of this with |\operatorname{id}|, replacing |F(\pi)| with |\beta| as it doesn’t matter that this permutation comes from an action of a species, we get:

\[\sum_{d\mid m} d\beta_d(m/d) = m\sum_{d\mid m} \beta_d = (\varphi\star(d\mapsto \operatorname{fix}\beta^d))(m)\]

This is just using |\varphi = \operatorname{id}\star\mu|. If we choose |m| such that |\beta^m = \operatorname{id}|, then we get |\sum_{d\mid m} \beta_d = \sum_{k=1}^\infty \beta_k| because |\beta_k| will be |0| for all the |k| which don’t divide |m|. This makes the previous equation into equation 2.2 (34) in the book.

Since we know |n = \sum_{k=1}^\infty k\pi_k| for any permutation |\pi|, we also get: \[\vert F([n])\vert = \sum_{k=1}^\infty\sum_{d\mid k}\mu\left(\frac{k}{d}\right)\operatorname{fix}F(\pi^d) = \sum_{k=1}^\infty(\mu\star(d\mapsto\operatorname{fix}F(\pi^d)))(k)\]

These equations give us a way to compute some of these divisor sums by looking at the number fixed points and cycles of the action of species and vice versa. For example, 2.3 (49) is a series of Dirichlet convolutions connected to weighted species.

Example 12 from this book presents a nice and perhaps surprising identity. The core of it can be written as: \[\sum_{k=1}^\infty\ln(1-ax^k) = \sum_{k=1}^\infty\rho_k(a)\ln(1-x^k)\] where |\rho_k(a) = k^{-1}\sum_{d\mid k}\varphi(k/d)a^d|. We can rewrite this definition as the characterization |k\rho_k(a) = (\varphi\star a^{({-})})(k)|. Recalling that |\varphi = \mu \star \operatorname{id}| and |\ln(1-x) = -\sum_{n=1}^\infty x^n/n|, we get the following derivation:

Theorem: \[\sum_{k=1}^\infty\ln(1-ax^k) = \sum_{k=1}^\infty\rho_k(a)\ln(1-x^k)\] where |\rho_k(a) = k^{-1}\sum_{d\mid k}\varphi(k/d)a^d|.

Proof: \[\begin{flalign} \sum_{k=1}^\infty\ln(1-ax^k) & = -\sum_{k=1}^\infty\sum_{n=1}^\infty \frac{a^n x^{nk}}{n} \\ & = -\sum_{n=1}^\infty\sum_{k=1}^\infty \frac{a^n x^{nk}}{n} \\ & = -\sum_{N=1}^\infty\sum_{k\mid N} \frac{a^{N/k} x^N}{N/k} \tag{N=nk} \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N} ka^{N/k} \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\operatorname{id}\star a^{({-})})(N) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\delta\star\operatorname{id}\star a^{({-})})(N) \tag{the trick} \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\bar 1\star\mu\star\operatorname{id}\star a^{({-})})(N) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\bar 1\star\varphi\star a^{({-})})(N) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N}(\varphi\star a^{({-})})(k) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N}k\rho_k(a) \\ & = -\sum_{n=1}^\infty\frac{x^{nk}}{n}\sum_{k=1}^\infty\rho_k(a) \tag{N=nk again} \\ & = \sum_{k=1}^\infty\rho_k(a) \ln(1-x^k) \end{flalign}\] |\square|

Derivative of Dirichlet series

We can easily compute the derivative of a Dirichlet series (assuming sufficiently strong convergence so we can push the differentiation into the sum):

\[\begin{flalign} \mathcal D[f]’(s) & = \frac{d}{ds}\sum_{n=1}^\infty f(n)n^{-s} \\ & = \sum_{n=1}^\infty f(n)\frac{d}{ds}n^{-s} \\ & = \sum_{n=1}^\infty f(n)\frac{d}{ds}e^{-s\ln n} \\ & = \sum_{n=1}^\infty -f(n)\ln n e^{-s\ln n} \\ & = -\sum_{n=1}^\infty f(n)\ln n n^{-s} \\ & = -\mathcal D[f\ln](s) \end{flalign}\]

This leads to the identity |\frac{d}{ds}\ln\mathcal D[f](s) = \mathcal D[f]’ (s)/\mathcal D[f](s) = -\mathcal D[f\ln \star \mu](s)|. For example, we have |-\zeta’(s)/\zeta(s) = \mathcal D[\ln \star \mu](s)|. Using the Euler product formula, we have |\ln\zeta(s) = -\sum_{p\in\mathbb P}\ln(1-p^{-s})|. Differentiating this gives \[\begin{flalign} \frac{d}{ds}\ln\zeta(s) & = -\sum_{p\in\mathbb P} p^{-s}\ln p/(1 - p^{-s}) \\ & = -\sum_{p\in\mathbb P} \sum_{k=1}^\infty \ln p (p^k)^{-s} \\ & = -\sum_{n=1}^\infty \Lambda(n) n^{-s} \\ & = -\mathcal D[\Lambda](s) \end{flalign}\] where |\Lambda(n) = \begin{cases}\ln p,&p\in\mathbb P\land\exists k\in\mathbb N_+.n=p^k \\ 0, & \text{otherwise}\end{cases}|. |\Lambda|, which is not a multiplicative nor an additive function, is known as the von Mangoldt function. Just to write it explicitly, the above implies |\Lambda = \ln \star \mu|, i.e. |\Lambda| is the Möbius inversion of |\ln|. This can be generalized for arbitrary completely multiplicative functions besides |\bar 1| to get |\mathcal D[f]’/\mathcal D[f] = \mathcal D[f\Lambda]|.

We now have multiple perspectives on |\Lambda| which is a kind of “indicator function” for prime powers.

Dirichlet Inverse

Let’s say we’re given an arithmetic function |f|, and we want to find an arithmetic function |g| such that |f \star g = \delta| which we’ll call the Dirichlet inverse of |f|. We immediately get |(f \star g)(1) = f(1)g(1) = 1 = \delta(1)|. So, supposing |f(1)\neq 1|, we can define |g(1) = 1/f(1)|. We then get a recurrence relation for all the remaining values of |g| via: \[0 = (f \star g)(n) = f(1)g(n) + \sum_{d \mid n, d\neq 1} f(d)g(n/d)\] for |n > 1|. Solving for |g(n)|, we have: \[g(n) = -f(1)^{-1}\sum_{d\mid n,d\neq 1}f(d)g(n/d)\] where the right-hand side only requires |g(k)| for |k < n|. If |f| is multiplicative, then |f(1) = 1| and the inverse of |f| exists.

If |f| is completely multiplicative, its Dirichlet inverse is |\mu f|. This follows easily from |f \star \mu f = (\bar 1 \star \mu)f = \delta f = \delta|. As an example, |({-})^z| is completely multiplicative so its inverse is |({-})^z\mu|. Since the inverse of a Dirichlet convolution is the convolution of the inverses, we get |\varphi^{-1}(n) = \sum_{d\mid n}d\mu(d)|. Not to be confused with |\varphi(n) = (\operatorname{id}\star\mu)(n) = \sum_{d\mid n} d\mu(n/d)|.

Less trivially, the inverse of a multiplicative function is also a multiplicative function. We can prove it by complete induction on |\mathbb N_+| using the formula for |g| from above.

Theorem: If |f\star g = \delta|, then |g| is multiplicative when |f| is.

Proof: Let |n = ab| where |a| and |b| are coprime. If |a| (or, symmetrically, |b|) is equal to |1|, then since |g(1) = 1/f(1) = 1|, we have |g(1n) = g(1)g(n) = g(n)|. Now assume neither |a| nor |b| are |1| and, as the induction hypothesis, assume that |g| is multiplicative on all numbers less than |n|. We have: \[\begin{flalign} g(ab) & = -\sum_{d\mid ab,d\neq 1}f(d)g(ab/d) \\ & = -\sum_{d_a \mid a}\sum_{d_b \mid b,d_a d_b \neq 1}f(d_ad_b)g(ab/(d_ad_b)) \\ & = -\sum_{d_a \mid a}\sum_{d_b \mid b,d_a d_b \neq 1}f(d_a)f(d_b)g(a/d_a)g(b/d_b)) \\ & = -\sum_{d_b \mid b,d_b \neq 1}f(d_b)g(a)g(b/d_b)) - \sum_{d_a \mid a,d_a \neq 1}\sum_{d_b \mid b}f(d_a)f(d_b)g(a/d_a)g(b/d_b)) \\ & = -g(a)\sum_{d \mid b,d \neq 1}f(d)g(b/d)) - \sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a)\sum_{d_b \mid b}f(d_b)g(b/d_b)) \\ & = g(a)g(b) - \sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a) (f \star g)(b) \\ & = g(a)g(b) - \delta(b)\sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a) \\ & = g(a)g(b) \end{flalign}\] |\square|

Assuming |f| has a Dirichlet inverse, we also have: \[\mathcal D[f^{-1}](s) = \mathcal D[f](s)^{-1}\] immediately from the convolution theorem.

More Examples

Given a multiplicative function |f|:

\[\begin{align} \mathcal D[f(\gcd({-},n_P))](s) & = \zeta(s)\prod_{(p,k)\in P}(1 - p^{-s})\left(\sum_{n=0}^\infty f(p^{\min(k,n)})p^{-ns}\right) \\ & = \zeta(s)\prod_{(p,k)\in P}(1 - p^{-s})\left(\frac{f(p^k)p^{-(k+1)s}}{1 - p^{-s}} + \sum_{n=0}^k f(p^n)p^{-ns}\right) \end{align}\]

As an example, |\eta(s) = (1 - 2^{1-s})\zeta(s) = \mathcal D[f](s)| where |f(n) = \begin{cases}-1,&n=2\\1,&n\neq 2\end{cases}|.

Alternatively, |f(n) = \mu(\gcd(n, 2))| and we can apply the above formula to see: \[\begin{flalign} \mathcal D[\mu(\gcd({-},2))] & = \zeta(s)(1-2^{-s})\left(\frac{\mu(2)2^{-2s}}{1 - 2^{-s}} + \sum_{n=0}^1 \mu(2^n)2^{-ns}\right) \\ & = \zeta(s)(1-2^{-s})\left(\frac{-2^{-2s}}{1 - 2^{-s}} + 1 - 2^{-s}\right) \\ & = \zeta(s)(-2^{-2s} + (1 - 2^{-s})^2) \\ & = \zeta(s)(1 - 2^{1-s}) \end{flalign}\]

|\lambda| and |\gamma|

Recalling, |\lambda| is completely multiplicative and is characterized by |\lambda(p) = -1|.

We can show that |\mathcal D[\lambda](s) = \zeta(2s)/\zeta(s)| which is equivalent to saying |\bar 1^{(2)} \star \mu = \lambda| or |\lambda\star\bar 1 = \bar 1^{(2)}|.

\[\begin{flalign} \zeta(2s)/\zeta(s) & = \prod_{p\in\mathbb P} \frac{1-p^{-s}}{1-(p^{-s})^2} \\ & = \prod_{p\in\mathbb P} \frac{1-p^{-s}}{(1-p^{-s})(1+p^{-s})} \\ & = \prod_{p\in\mathbb P} (1 + p^{-s})^{-1} \\ & = \prod_{p\in\mathbb P} (1 - \lambda(p)p^{-s})^{-1} \\ & = \mathcal D[\lambda](s) \end{flalign}\]

We have |\lambda\mu = \vert\mu\vert = \mu\mu| is the inverse of |\lambda| so |\mathcal D[\vert\mu\vert](s) = \zeta(s)/\zeta(2s)|.

Recalling, |\gamma| is multiplicative and is characterized by |\gamma(p^n) = -1|.

\[\begin{flalign} \mathcal D[\gamma](s) & = \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty \gamma(p^n)p^{-ns}\right) \\ & = \prod_{p \in \mathbb P}\left(1 - \sum_{n=1}^\infty p^{-ns}\right) \\ & = \prod_{p \in \mathbb P}\left(1 - \left(\sum_{n=0}^\infty p^{-ns} - 1\right)\right) \\ & = \prod_{p \in \mathbb P}\frac{2(1 - p^{-s}) - 1}{1 - p^{-s}} \\ & = \prod_{p \in \mathbb P}\frac{1 - 2p^{-s}}{1 - p^{-s}} \end{flalign}\]

This implies that |(\gamma\star\mu)(p^n) = \begin{cases}-2, & n=1 \\ 0, & n > 1 \end{cases}|.

Indicator Functions

Let |1_{\mathbb P}| be the indicator function for the primes. We have |\omega = 1_{\mathbb P}\star\bar 1| or |1_{\mathbb P} = \omega\star\mu|. Directly, |\mathcal D[1_{\mathbb P}](s) = \sum_{p\in\mathbb P}p^{-s}| so we have |\mathcal D[\omega](s)/\zeta(s) = \sum_{p\in\mathbb P} p^{-s}|.

Lemma: |\mathcal D[1_{\mathbb P}](s)=\sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns)|
Proof: We proceed as follows: \[\begin{align} \sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns) & = \sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\left(\prod_{p\in\mathbb P}(1 - p^{-ns})^{-1}\right) \\ & = -\sum_{n=1}^\infty \frac{\mu(n)}{n}\sum_{p\in\mathbb P}\ln(1 - p^{-ns}) \\ & = \sum_{p\in\mathbb P}\sum_{n=1}^\infty \frac{\mu(n)}{n}\sum_{k=1}^\infty p^{-kns}/k \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \sum_{N=kn} \frac{\mu(n)}{N}p^{-Ns} \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\sum_{N=kn}\mu(n) \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}(\mu\star\bar 1)(N) \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\delta(N) \\ & = \sum_{p\in\mathbb P} p^{-s} \\ & = \mathcal D[1_{\mathbb P}](s) \end{align}\] |\square|

Let |1_{\mathcal P}| be the indicator function for prime powers. |\Omega = 1_{\mathcal P}\star\bar 1| or |1_{\mathcal P} = \Omega\star\mu|. |\mathcal D[1_{\mathcal P}](s) = \sum_{p\in\mathbb P}(1 - p^{-s})^{-1}| so we have |\mathcal D[\Omega](s)/\zeta(s) = \sum_{p\in\mathbb P}(1 - p^{-s})^{-1}|.

Lemma: |\mathcal D[1_{\mathcal P}](s)=\sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns)|
Proof: This is quite similar to the previous proof. \[\begin{align} \sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns) & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\sum_{N=kn}\varphi(n) \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}(\varphi\star\bar 1)(N) \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N} N \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty p^{-Ns} \\ & = \mathcal D[1_{\mathcal P}](s) \end{align}\] |\square|

Summatory Functions

One thing we’ve occasionally been taking for granted is that the operator |\mathcal D| is injective. That is, |\mathcal D[f] = \mathcal D[g]| if and only if |f = g|. To show this, we’ll use the fact that we can (usually) invert the Mellin transform which can be viewed roughly as a version of |\mathcal D| that operates on continuous functions.

Before talking about the Mellin transform, we’ll talk about summatory functions as this will ease our later discussion.

We will turn a sum into a continuous function via a zero-order hold, i.e. we will take the floor of the input. Thus |\sum_{n\leq x} f(n)| is constant on any interval of the form |[k,k+1)|. It then (potentially) has jump discontinuities at integer values. The beginning of the sum is at |n=1| so for all |x<1|, the sum up to |x| is |0|. We will need a slight tweak to better deal with these discontinuities. This will be indicated by a prime on the summation sign.

For non-integer values of |x|, we have: \[\sum_{n \leq x}’ f(n) = \sum_{n \leq x} f(n)\]

For |m| an integer, we have: \[ \sum_{n \leq m}’ f(n) = \frac{1}{2}\left(\sum_{n<m} f(n) + \sum_{n \leq m} f(n)\right) = \sum_{n\leq m} f(n) - f(m)/2 \]

This kind of thing should be familiar to those who’ve worked with things like Laplace transforms of discontinuous functions. (Not for no reason…)

One reason for introducing these summation functions is they are a little easier to work with. Arguably, we want something like |\frac{d}{dx}\sum_{n\leq x}f(n) = \sum_{n=1}^\infty f(n)\delta(n-x)|, but that means we end up with a bunch of distribution nonsense and even more improper integrals. The summation function may be discontinuous, but it at least has a finite value everywhere. Of course, another reason for introducing these functions is that they often are values we’re interested in.

Several important functions are these continuous “sums” of arithmetic functions of this form:

  • Mertens function: |M(x) = \sum_{n\leq x}’ \mu(n)|
  • Chebyshev function: |\vartheta(x) = \sum_{p\leq x, p\in\mathbb P}’ \ln p = \sum_{n\leq x} 1_{\mathbb P}(n)\ln n|
  • Second Chebyshev function: |\psi(x) = \sum_{n\leq x}’ \Lambda(n) = \sum_{n=1}^\infty \vartheta(x^{1/n})|
  • The prime-counting function: |\pi(x) = \sum_{n\leq x}’ 1_{\mathbb P}|
  • Riemann’s prime-power counting function: |\Pi_0(x) = \sum_{n\leq x} \frac{\Lambda(n)}{\ln n} = \sum_{n=1}^\infty \sum_{p^n\leq x,p\in\mathbb P}’ n^{-1} = \sum_{n=1}^\infty\pi(x^{1/n})n^{-1}|
  • |D(x) = \sum_{n\leq x}d(n)|

These are interesting in how they related to the prime-counting function.

Let’s consider the arithmetic function |\Lambda/\ln| whose Dirichlet series is |\ln\zeta|.

We have the summation function |\sum_{n\leq x}’ \Lambda(n)/\ln(n)|, but |\Lambda(n)| is |0| except when |n=p^k| for some |p\in\mathbb P| and |k\in\mathbb N_+|. Therefore, we have \[\begin{align} \sum_{n\leq x}’ \frac{\Lambda(n)}{\ln(n)} & = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{\Lambda(p^k)}{\ln(p^k)} \\ & = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{\ln(p)}{k\ln(p)} \\ & = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{1}{k} \\ & = \sum_{k=1}^\infty \frac{1}{k} \sum_{p^k\leq x, p\in\mathbb P}’ 1 \\ & = \sum_{k=1}^\infty \frac{1}{k} \sum_{p\leq x^{1/k}, p\in\mathbb P}’ 1 \\ & = \sum_{k=1}^\infty \frac{\pi(x^{1/k})}{k} \\ \end{align}\]

|\ln\zeta(s) = s\mathcal M[\Pi_0](-s)=\mathcal D[\Lambda/\ln](s)| where |\mathcal M| is the Mellin transform, and the connection to Dirichlet series is described in the following section.

Mellin Transform

The definition of the Mellin transform and its inverse are:

\[\mathcal M[f](s) = \int_0^\infty x^s\frac{f(x)}{x}dx\] \[\mathcal M^{-1}[\varphi](x) = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty} x^{-s}\varphi(s)ds\]

The contour integral is intended to mean the vertical line with real part |c| traversed from negative to positive imaginary values. Modulo the opposite sign of |s| and the extra factor of |x|, this is quite similar to a continuous version of a Dirichlet series.

The Mellin transform is closely related to the two-sided Laplace transform.

\[\mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s)\]

Using Mellin transform properties, particularly the one for transforming the derivative, we can write the following.

\[\begin{align} \mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s) & \iff \mathcal D[f](1-s) = -(s-1)\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](s-1) \\ & \iff \mathcal D[f](1-s) = \mathcal M\left[x\mapsto \frac{d}{dx}\sum_{n\leq x}’ f(n)\right](s) \\ & \iff \mathcal D[f](1-s) = \int_0^\infty x^{s-1}\frac{d}{dx}\sum_{n\leq x}’ f(n)dx \\ & \iff \mathcal D[f](1-s) = \int_0^\infty x^{s-1}\sum_{n=1}^\infty f(n)\delta(x-n)dx \\ & \iff \mathcal D[f](1-s) = \sum_{n=1}^\infty f(n)n^{s-1} \\ & \iff \mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s} \end{align}\]

This leads to Perron’s formula

\[\begin{align} \sum_{n\leq x}’ f(n) & = \mathcal M^{-1}[s\mapsto -\mathcal D[f](-s)/s](x) \\ & = \frac{1}{2\pi i}\int_{-c-i\infty}^{-c+i\infty}\frac{\mathcal D[f](-s)}{-s} x^{-s} ds \\ & = -\frac{1}{2\pi i}\int_{c+i\infty}^{c-i\infty}\frac{\mathcal D[f](s)}{s} x^s ds \\ & = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty}\frac{\mathcal D[f](s)}{s} x^s ds \end{align}\]

for which we need to take the Cauchy principal value to get something defined. (See also Abel summation.)

There are side conditions on the convergence of |\mathcal D[f]| for these formulas to be justified. See the links.

Many of the operations we’ve described on Dirichlet series follow from Mellin transform properties. For example, we have |\mathcal M[f]’(s) = \mathcal M[f\ln](s)| generally.

Summary

Properties

Dirichlet Convolution

Dirichlet convolution is |(f\star g)(n) = \sum_{d\mid n} f(d)g(n/d) = \sum_{mk=n} f(m)g(k)|.

Dirichlet convolution forms a commutative ring with it as the multiplication, |\delta| as the multiplicative unit and the usual additive structure. This is to say that Dirichlet convolution is commutative, associative, unital, and bilinear.

For |f| completely multiplicative, |f(g\star h) = fg \star fh|.

Dirichlet Inverse

For any |f| such that |f(1)\neq 0|, there is a |g| such that |f\star g = \delta|. In particular, the set of multiplicative functions forms a subgroup of this multiplicative group, i.e. the Dirichlet convolution of multiplicative functions is multiplicative.

If |f(1) \neq 0|, then |f \star g = \delta| where |g| is defined by the following recurrence:

\[\begin{flalign} g(1) & = 1/f(1) \\ g(n) & = -f(1)^{-1}\sum_{d\mid n,d\neq 1}f(d)g(n/d) \end{flalign}\]

For a completely multiplicative |f|, its Dirichlet inverse is |\mu f|.

Convolution Theorem

\[\mathcal D[f](s)\mathcal D[g](s) = \mathcal D[f\star g](s)\]

Möbius Inversion

\[\delta = \bar 1 \star \mu\]

This means from a divisor sum |g(n)\sum_{d\mid n}f(d) = (f\star\bar 1)(n)| for each |n|, we can recover |f| via |g\star\mu = f\star\bar 1\star\mu = f|. Which is to say |f(n)=\sum_{d\mid n}g(d)\mu(n/d)|.

This can be generalized via |({-})^k\mu\star({-})^k = \delta|. In sums, this means when |g(n)=\sum_{d\mid n}d^k f(n/d)|, then |f(n)=\sum_{d\mid n}\mu(d)d^k g(n/d)|.

Let |h| be a completely multiplicative function. Given |g(m) = \sum_{n=1}^\infty f(mh(n))|, then |f(n) = \sum_{m=1}^\infty \mu(m)g(nh(m))|.

Using the Möbius function for finite multisets and their inclusion ordering, we can recast Möbius inversion of naturals as Möbius inversion of finite multisets (of primes) a la: \[n_P = \sum_{Q\subseteq P}\mu(P\setminus Q)n_Q = \sum_{Q\subseteq P}\mu(n_P/n_Q)n_Q = \sum_{d\mid n_P}\mu(n_P/d)d \]

As a nice result, we have: \[\sum_{n=1}^\infty\ln(1-ax^n) = \sum_{n=1}^\infty\rho_n(a)\ln(1-x^n)\] where |n\rho_n(a) = (\varphi \star a^{({-})})(n)|.

Dirichlet Series

\[\mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s}\]

\[\mathcal D[n\mapsto f(n)n^k](s) = \mathcal D[f](s - k)\]

\[\mathcal D[f^{-1}](s) = \mathcal D[f](s)^{-1}\] where the inverse on the left is the Dirichlet inverse.

\[\mathcal D[f]’(s) = -\mathcal D[f\ln](s)\]

For a completely multiplicative |f|, \[\mathcal D[f]’(s)/\mathcal D[f](s) = -\mathcal D[f\Lambda](s)\] and: \[\ln\mathcal D[f](s) = \mathcal D[f\Lambda/\ln](s)\]

Dirichlet series as a Mellin transform:

\[\mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s)\]

The corresponding inverse Mellin transform statement is called Perron’s Formula:

\[\sum_{n\leq x}’ f(n) = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty}\frac{\mathcal D[f](s)}{s} x^s ds\]

Euler Product Formula

Assuming |f| is multiplicative, we have:

\[\mathcal D[f](s) = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-ns} = \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty f(p^n)p^{-ns}\right) \]

When |f| is completely multiplicative, this can be simplified to:

\[\mathcal D[f](s) = \prod_{p \in \mathbb P}(1 - f(p)p^{-s})^{-1} \]

Lambert Series

Given an arithmetic function |a|, these are series of the form: \[ \sum_{n=1}^\infty a(n) \frac{x^n}{1-x^n} = \sum_{n=1}^\infty (a \star \bar 1)(n) x^n \]

\[\sum_{n=1}^\infty \mu(n) \frac{x^n}{1-x^n} = x\]

\[\sum_{n=1}^\infty \varphi(n) \frac{x^n}{1-x^n} = \frac{x}{(1-x)^2}\]

Arithmetic function definitions

|f(p^n)=\cdots| implies a multiplicative/additive function, while |f(p)=\cdots| implies a completely multiplicative/additive function.

|p^z| for |z\in\mathbb C| is completely multiplicative. This includes the identity function (|z=1|) and |\bar 1| (|z=0|). For any multiplicative |f|, |f\circ \gcd({-},k)| is multiplicative.

|\ln| is completely additive.

Important but neither additive nor multiplicative are the indicator functions for primes |1_{\mathbb P}| and prime powers |1_{\mathcal P}|.

The following functions are (completely) multiplicative unless otherwise specified.

\[\begin{flalign} \delta(p) & = 0 \tag{Kronecker delta} \\ \bar 1(p) & = 1 = p^0 \\ \mu(p^n) & = \begin{cases}-1, & n = 1 \\ 0, & n > 1\end{cases} \tag{Möbius function} \\ \Omega(p) & = 1 \tag{additive} \\ \lambda(p) & = -1 = (-1)^{\Omega(p)} \tag{Liouville function} \\ \omega(p^n) & = 1 \tag{additive} \\ \gamma(p^n) & = -1 = (-1)^{\omega(p^n)} \\ a(p^n) & = p(n) \tag{p(n) is the partition function} \\ \varphi(p^n) & = p^n - p^{n-1} = p^n(1 - 1/p) = J_1(p^n) \tag{Euler totient function} \\ \sigma_k(p^n) & = \sum_{m=0}^n p^{km} = \sum_{d\mid p^n} d^k = \frac{p^{k(n+1)}-1}{p^k - 1} \tag{last only works for k>0} \\ d(p^n) & = n + 1 = \sigma_0 \\ f^{[k]}(p^n) & = \begin{cases}f(p^m),& km=n\\0,& k\nmid n\end{cases} \tag{f multiplicative} \\ \Lambda(n) & = \begin{cases}\ln p,&p\in\mathbb P\land\exists k\in\mathbb N_+.n=p^k \\ 0, & \text{otherwise}\end{cases} \tag{not multiplicative} \\ J_k(p^n) & = p^{kn} - p^{k(n-1)} = p^{kn}(1 - p^{-k}) \tag{Jordan totient function} \\ \psi_k(p^n) & = p^{kn} + p^{k(n-1)} = p^{kn}(1 + p^{-k}) = J_{2k}(p^n)/J_k(p^n) \tag{Dedekind psi function} \\ \end{flalign}\]

Dirichlet convolutions

\[\begin{flalign} \delta & = \bar 1 \star \mu \\ \varphi & = \operatorname{id}\star\mu \\ \sigma_z & = ({-})^z \star \bar 1 = \psi_z \star \bar 1^{(2)} \\ \sigma_1 & = \varphi \star d \\ d & = \sigma_0 = \bar 1 \star \bar 1 \\ f \star f & = fd \tag{f completely multiplicative} \\ f\Lambda & = f\ln \star f\mu = f\ln \star f^{-1} \tag{f completely multiplicative, Dirichlet inverse} \\ \lambda & = \bar 1^{(2)} \star \mu \\ \vert\mu\vert & = \lambda^{-1} = \mu\lambda \tag{Dirichlet inverse} \\ 2^\omega & = \vert\mu\vert \star \bar 1 \\ \psi_z & = ({-})^z \star \vert\mu\vert \\ \operatorname{fix} \pi^{(-)} & = \bar 1 \star (k \mapsto k\pi_k) \tag{for a permutation} \\ ({-})^k & = J_k \star \bar 1 \end{flalign}\]

More Dirichlet convolution identities are here, though many are trivial consequences of the earlier properties.

Dirichlet series

\[\begin{array}{l|ll} f(n) & \mathcal D[f](s) & \\ \hline \delta(n) & 1 & \\ \bar 1(n) & \zeta(s) & \\ n & \zeta(s-1) & \\ n^z & \zeta(s-z) & \\ \sigma_z(n) & \zeta(s-z)\zeta(s) & \\ \mu(n) & \zeta(s)^{-1} & \\ \vert\mu(n)\vert & \zeta(s)/\zeta(2s) & \\ \varphi(n) & \zeta(s-1)/\zeta(s) & \\ d(n) & \zeta(s)^2 & \\ \mu(\gcd(n, 2)) & \eta(s) = (1-2^{1-s})\zeta(s) & \\ \lambda(n) & \zeta(2s)/\zeta(s) \\ \gamma(n) & \prod_{p \in \mathbb P}\frac{1-2p^{-s}}{1-p^{-s}} & \\ f^{[k]}(n) & \mathcal D[f](ks) & \\ f(n)\ln n & -\mathcal D[f]’ (s) & f\text{ completely multiplicative}\\ \Lambda(n) & -\zeta’(s)/\zeta(s) & \\ \Lambda(n)/\ln(n) & \ln\zeta(s) & \\ 1_{\mathbb P}(n) & \sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns) & \\ 1_{\mathcal P}(n) & \sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns) & \\ \psi_k(n) & \zeta(s)\zeta(s - k)/\zeta(2s) & \\ J_k(n) & \zeta(s - k)/\zeta(s) & \end{array}\]


  1. Viewing natural numbers as multisets, |D_n| is the set of all sub-multisets of |n|. The isomorphism described is then simply the fact that given any sub-multiset of the union of two disjoint multisets, we can sort the elements into their original multisets producing two sub-multisets of the disjoint multisets.↩︎

  2. Incidence algebras are a decategorification of the notion of a category algebra.↩︎

Introduction

Purely functional list concatenation, xs ++ ys in Haskell syntax, is well known to be linear time in the length of the first input and constant time in the length of the second, i.e. xs ++ ys is O(length xs). This leads to quadratic complexity if we have a bunch of left associated uses of concatenation.

The ancient trick to resolve this is to, instead of producing lists, produce list-to-list functions a la [a] -> [a] or ShowS = String -> String = [Char] -> [Char]. “Concatenation” of “lists” represented this way is just function composition which is a constant time operation. We can lift a list xs to this representation via the section (xs ++). This will still lead to O(length xs) amount of work to apply this function, but a composition of such functions applied to a list will always result in a fully right associated expression even if the function compositions aren’t right associated.

In the last several years, it has become popular to refer to this technique as “difference lists”. Often no justification is given for this name. When it is given, it is usually a reference to the idea of difference lists in logic programming. Unfortunately, other than both techniques giving rise to efficient concatenation, they have almost no similarities.

Read more...

Introduction

Classical First-Order Logic (Classical FOL) has an absolutely central place in traditional logic, model theory, and set theory. It is the foundation upon which ZF(C), which is itself often taken as the foundation of mathematics, is built. When classical FOL was being established there was a lot of study and debate around alternative options. There are a variety of philosophical and metatheoretic reasons supporting classical FOL as The Right Choice.

This all happened, however, well before category theory was even a twinkle in Mac Lane’s and Eilenberg’s eyes, and when type theory was taking its first stumbling steps.

My focus in this article is on what classical FOL looks like to a modern categorical logician. This can be neatly summarized as “classical FOL is the internal logic of a Boolean First-Order Hyperdoctrine.” Each of the three words in this term, “Boolean”, “First-Order”, and “Hyperdoctrine”, suggest a distinct axis in which to vary the (class of categorical models of the) logic. All of them have compelling categorical motivations to be varied.

Read more...

Introduction

In 1983, Mark Overmars described global rebuilding in The Design of Dynamic Data Structures. The problem it was aimed at solving was turning the amortized time complexity bounds of batched rebuilding into worst-case bounds. In batched rebuilding we perform a series of updates to a data structure which may cause the performance of operations to degrade, but occasionally we expensively rebuild the data structure back into an optimal arrangement. If the updates don’t degrade performance too much before we rebuild, then we can achieve our target time complexity bounds in an amortized sense. An update that doesn’t degrade performance too much is called a weak update.

Taking an example from Okasaki’s Purely Functional Data Structures, we can consider a binary search tree where deletions occur by simply marking the deleted nodes as deleted. Then, once about half the tree is marked as deleted, we rebuild the tree into a balanced binary search tree and clean out the nodes marked as deleted at that time. In this case, the deletions count as weak updates because leaving the deleted nodes in the tree even when it corresponds to up to half the tree can only mildly impact the time complexity of other operations. Specifically, assuming the tree was balanced at the start, then deleting half the nodes could only reduce the tree’s depth by about 1. On the other hand, naive inserts are not weak updates as they can quickly increase the tree’s depth.

The idea of global rebuilding is relatively straightforward, though how you would actually realize it in any particular example is not. The overall idea is simply that instead of waiting until the last moment and then rebuilding the data structure all at once, we’ll start the rebuild sooner and work at it incrementally as we perform other operations. If we update the new version faster than we update the original version, we’ll finish it by the time we would have wanted to perform a batched rebuild, and we can just switch to this new version.

More concretely, though still quite vaguely, global rebuilding involves, when a threshold is reached, rebuilding by creating a new “empty” version of the data structure called the shadow copy. The original version is the working copy. Work on rebuilding happens incrementally as operations are performed on the data structure. During this period, we service queries from the working copy and continue to update it as usual. Each update needs to make more progress on building the shadow copy than it worsens the working copy. For example, an insert should insert more nodes into the shadow copy than the working copy. Once the shadow copy is built, we may still have more work to do to incorporate changes that occurred after we started the rebuild. To this end, we can maintain a queue of update operations performed on the working copy since the start of a rebuild, and then apply these updates, also incrementally, to the shadow copy. Again, we need to apply the updates from the queue at a fast enough rate so that we will eventually catch up. Of course, all of this needs to happen fast enough so that 1) the working copy doesn’t get too degraded before the shadow copy is ready, and 2) we don’t end up needing to rebuild the shadow copy before it’s ready to do any work.

Read more...

Introduction

Morleyization is a fairly important operation in categorical logic for which it is hard to find readily accessible references to a statement and proof. Most refer to D1.5.13 of “Sketches of an Elephant” which is not an accessible text. 3.2.8 of “Accessible Categories” by Makkai and Paré is another reference, and “Accessible Categories” is more accessible but still a big ask for just a single theorem.

Here I reproduce the statement and proof from “Accessible Categories” albeit with some notational and conceptual adaptations as well as some commentary. This assumes some basic familiarity with the ideas and notions of traditional model theory, e.g. what structures, models, and |\vDash| are.

Read more...

Introduction

Andrej Bauer has a paper titled The pullback lemma in gory detail that goes over the proof of the pullback lemma in full detail. This is a basic result of category theory and most introductions leave it as an exercise. It is a good exercise, and you should prove it yourself before reading this article or Andrej Bauer’s.

Andrej Bauer’s proof is what most introductions are expecting you to produce. I very much like the representability perspective on category theory and like to see what proofs look like using this perspective.

So this is a proof of the pullback lemma from the perspective of representability.

Preliminaries

The key thing we need here is a characterization of pullbacks in terms of representability. To just jump to the end, we have for |f : A \to C| and |g : B \to C|, |A \times_{f,g} B| is the pullback of |f| and |g| if and only if it represents the functor \[\{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \}\]

That is to say we have the natural isomorphism \[ \mathrm{Hom}({-}, A \times_{f,g} B) \cong \{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \} \]

We’ll write the left to right direction of the isomorphism as |\langle u,v\rangle : U \to A \times_{f,g} B| where |u : U \to A| and |v : U \to B| and they satisfy |f \circ u = g \circ v|. Applying the isomorphism right to left on the identity arrow gives us two arrows |p_1 : A \times_{f,g} B \to A| and |p_2 : A \times_{f,g} B \to B| satisfying |p_1 \circ \langle u, v\rangle = u| and |p_2 \circ \langle u,v \rangle = v|. (Exercise: Show that this follows from being a natural isomorphism.)

One nice thing about representability is that it reduces categorical reasoning to set-theoretic reasoning that you are probably already used to, as we’ll see. You can connect this definition to a typical universal property based definition used in Andrej Bauer’s article. Here we’re taking it as the definition of the pullback.

Proof

The claim to be proven is if the right square in the below diagram is a pullback square, then the left square is a pullback square if and only if the whole rectangle is a pullback square. \[ \xymatrix { A \ar[d]_{q_1} \ar[r]^{q_2} & B \ar[d]_{p_1} \ar[r]^{p_2} & C \ar[d]^{h} \\ X \ar[r]_{f} & Y \ar[r]_{g} & Z }\]

Rewriting the diagram as equations, we have:

Theorem: If |f \circ q_1 = p_1 \circ q_2|, |g \circ p_1 = h \circ p_2|, and |(B, p_1, p_2)| is a pullback of |g| and |h|, then |(A, q_1, q_2)| is a pullback of |f| and |p_1| if and only if |(A, q_1, p_2 \circ q_2)| is a pullback of |g \circ f| and |h|.

Proof: If |(A, q_1, q_2)| was a pullback of |f| and |p_1| then we’d have the following.

\[\begin{align} \mathrm{Hom}({-}, A) & \cong \{(u_1, u_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, B) \mid f \circ u_1 = p_1 \circ u_2 \} \\ & \cong \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = p_1 \circ \langle v_1, v_2\rangle \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = v_1 \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, v_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, C) \mid g \circ f \circ u_1 = h \circ v_2 \} \end{align}\]

The second isomorphism is |B| being a pullback and |u_2| is an arrow into |B| so it’s necessarily of the form |\langle v_1, v_2\rangle|. The first equality is just |p_1 \circ \langle v_1, v_2\rangle = v_1| mentioned earlier. The second equality merely eliminates the use of |v_1| using the equation |f \circ u_1 = v_1|.

This overall natural isomorphism, however, is exactly what it means for |A| to be a pullback of |g \circ f| and |h|. We verify the projections are what we expect by pushing |id_A| through the isomorphism. By assumption, |u_1| and |u_2| will be |q_1| and |q_2| respectively in the first isomorphism. We see that |v_2 = p_2 \circ \langle v_1, v_2\rangle = p_2 \circ q_2|.

We simply run the isomorphism backwards to get the other direction of the if and only if. |\square|

The simplicity and compactness of this proof demonstrates why I like representability.

Introduction

It is not uncommon for universal quantification to be described as (potentially) infinite conjunction1. Quoting Wikipedia’s Quantifier_(logic) page (my emphasis):

For a finite domain of discourse |D = \{a_1,\dots,a_n\}|, the universal quantifier is equivalent to a logical conjunction of propositions with singular terms |a_i| (having the form |Pa_i| for monadic predicates).

The existential quantifier is equivalent to a logical disjunction of propositions having the same structure as before. For infinite domains of discourse, the equivalences are similar.

While there’s a small grain of truth to this, I think it is wrong and/or misleading far more often than it’s useful or correct. Indeed, it takes a bit of effort to even get a statement that makes sense at all. There’s a bit of conflation between syntax and semantics that’s required to have it naively make sense, unless you’re working (quite unusually) in an infinitary logic where it is typically outright false.

What harm does this confusion do? The most obvious harm is that this view does not generalize to non-classical logics. I’ll focus on constructive logics, in particular. Besides causing problems in these contexts, which maybe you think you don’t care about, it betrays a significant gap in understanding of what universal quantification actually is. Even in purely classical contexts, this confusion often manifests, e.g., in confusion about |\omega|-inconsistency.

So what is the difference between universal quantification and infinite conjunction? Well, the most obvious difference is that infinite conjunction is indexed by some (meta-theoretic) set that doesn’t have anything to do with the domain the universal quantifier quantifies over. However, even if these sets happened to coincide2 there are still differences between universal quantification and infinite conjunction. The key is that universal quantification requires the predicate being quantified over to hold uniformly, while infinite conjunction does not. It just so happens that for the standard set-theoretic semantics of classical first-order logic this “uniformity” constraint is degenerate. However, even for classical first-order logic, this notion of uniformity will be relevant.

Read more...

Introduction

The purpose of this article is to answer the question: what is the coproduct of two groups? The approach, however, will be somewhat absurd. Instead of simply presenting a construction and proving that it satisfies the appropriate universal property, I want to find the general answer and simply instantiate it for the case of groups.

Specifically, this will be a path through the theory of Lawvere theories and their models with the goal of motivating some of the theory around it in pursuit of the answer to this relatively simple question.

If you really just want to know the answer to the title question, then the construction is usually called the free product and is described on the linked Wikipedia page.

Read more...

Introduction

This is a brief article about the notions of preserving, reflecting, and creating limits and, by duality, colimits. Preservation is relatively intuitive, but the distinction between reflection and creation is subtle.

Preservation of Limits

A functor, |F|, preserves limits when it takes limiting cones to limiting cones. As often happens in category theory texts, the notation focuses on the objects. You’ll often see things like |F(X \times Y) \cong FX \times FY|, but implied is that one direction of this isomorphism is the canonical morphism |\langle F\pi_1, F\pi_2\rangle|. To put it yet another way, in this example we require |F(X \times Y)| to satisfy the universal property of a product with the projections |F\pi_1| and |F\pi_2|.

Other than that subtlety, preservation is fairly intuitive.

Reflection of Limits versus Creation of Limits

A functor, |F|, reflects limits when whenever the image of a cone is a limiting cone, then the original cone was a limiting cone. For products this would mean that if we had a wedge |A \stackrel{p}{\leftarrow} Z \stackrel{q}{\to} B|, and |FZ| was the product of |FA| and |FB| with projections |Fp| and |Fq|, then |Z| was the product of |A| and |B| with projections |p| and |q|.

A functor, |F|, creates limits when whenever the image of a diagram has a limit, then the diagram itself has a limit and |F| preserves the limiting cones. For products this would mean if |FX| and |FY| had a product, |FX \times FY|, then |X| and |Y| have a product and |F(X \times Y) \cong FX \times FY| via the canonical morphism.

Creation of limits implies reflection of limits since we can just ignore the apex of the cone. While creation is more powerful, often reflection is enough in practice as we usually have a candidate limit, i.e. a cone. Again, this is often not made too explicit.

Example

Consider the posets:

$$\xymatrix{ & & & c \\ X\ar@{}[r]|{\Large{=}} & a \ar[r] & b \ar[ur] \ar[dr] & \\ & & & d \save "1,2"."3,4"*+[F]\frm{} \restore } \qquad \xymatrix{ & & c \\ Y\ar@{}[r]|{\Large{=}} & b \ar[ur] \ar[dr] & \\ & & d \save "1,2"."3,3"*+[F]\frm{} \restore } \qquad \xymatrix{ & c \\ Z\ar@{}[r]|{\Large{=}} & \\ & d \save "1,2"."3,2"*+[F]\frm{} \restore }$$

Failure of reflection

Let |X=\{a, b, c, d\}| with |a \leq b \leq c| and |b \leq d| mapping to |Y=\{b, c, d\}| where |a \mapsto b|. Reflection fails because |a| maps to a meet but is not itself a meet.

Failure of creation

If we change the source to just |Z=\{c, d\}|, then creation fails because |c| and |d| have a meet in the image but not in the source. Reflection succeeds, though, because there are no non-trivial cones in the source, so every cone (trivially) gets mapped to a limit cone. It’s just that we don’t have any cones with both |c| and |d| in them.

In general, recasting reflection and creation of limits for posets gives us: Let |F: X \to Y| be a monotonic function. |F| reflects limits if every lower bound that |F| maps to a meet is already a meet. |F| creates limits if whenever |F[U]| has a meet for |U \subseteq X|, then |U| already had a meet and |F| sends the meet of |U| to the meet of |F[U]|.

tl;dr The notion of two sets overlapping is very common. Often it is expressed via |A \cap B \neq \varnothing|. Constructively, this is not the best definition as it does not imply |\exists x. x \in A \land x \in B|. Even classically, this second-class treatment of overlapping obscures important and useful connections. In particular, writing |U \between A| for “|U| overlaps |A|”, we have a De Morgan-like duality situation with |\between| being dual to |\subseteq|. Recognizing and exploiting this duality, in part by using more appropriate notation for “overlaps”, can lead to new concepts and connections.

Introduction

The most common way I’ve seen the statement “|A| overlaps |B|” formalized is |A \cap B \neq \varnothing|. To a constructivist, this definition isn’t very satisfying. In particular, this definition of overlaps does not allow us to constructively conclude that there exists an element contained in both |A| and |B|. That is, |A \cap B \neq \varnothing| does not imply |\exists x. x \in A \land x \in B| constructively.

As is usually the case, even if you are not philosophically a constructivist, taking a constructivist perspective can often lead to better definitions and easier to see connections. In this case, constructivism suggests the more positive statement |\exists x. x \in A \land x \in B| be the definition of “overlaps”. However, given that we now have two (constructively) non-equivalent definitions, it is better to introduce notation to abstract from the particular definition. In many cases, it makes sense to have a primitive notion of “overlaps”. Here I will use the notation |A \between B| which is the most common option I’ve seen.

Properties

We can more compactly write the quantifier-based definition as |\exists x \in A.x \in B| using a common set-theoretic abbreviation. This presentation suggests a perhaps surprising connection. If we swap the quantifier, we get |\forall x\in A.x \in B| which is commonly abbreviated |A \subseteq B|. This leads to a duality between |\subseteq| and |\between|, particularly in topological contexts. In particular, if we pick a containing set |X|, then |\neg(U \between A) \iff U \subseteq A^c| where the complement is relative to |X|, and |A| is assumed to be a subset of |X|. This is a De Morgan-like duality.

If we want to characterize these operations via an adjunction, or, more precisely, a Galois connection, we have a slight awkwardness arising from |\subseteq| and |\between| being binary predicates on sets. So, as a first step we’ll identify sets with predicates via, for a set |A|, |\underline A(x) \equiv x \in A|. In terms of predicates, the adjunctions we want are just a special case of the adjunctions characterizing the quantifiers.

\[\underline U(x) \land P \to \underline A(x) \iff P \to U \subseteq A\]

\[U \between B \to Q \iff \underline B(x) \to (\underline U(x) \to Q)\]

What we actually want is a formula of the form |U \between B \to Q \iff B \subseteq (\dots)|. To do this, we need an operation that will allow us to produce a set from a predicate. This is exactly what set comprehension does. For reasons that will become increasingly clear, we’ll assume that |A| and |B| are subsets of a set |X|. We will then consider quantification relative to |X|. The result we get is:

\[\{x \in U \mid P\} \subseteq A \iff \{x \in X \mid x \in U \land P\} \subseteq A \iff P \to U \subseteq A\]

\[U \between B \to Q \iff B \subseteq \{x \in X \mid x \in U \to Q\} \iff B \subseteq \{x \in U \mid \neg Q\}^c\]

The first and last equivalences require additionally assuming |U \subseteq X|. The last equivalence requires classical reasoning. You can already see motivation to limit to subsets of |X| here. First, set complementation, the |(-)^c|, only makes sense relative to some containing set. Next, if we choose |Q \equiv \top|, then the latter formulas state that no matter what |B| is it should be a subset of the expression that follows it. Without constraining to subsets of |X|, this would require a universal set which doesn’t exist in typical set theories.

Choosing |P| as |\top|, |Q| as |\bot|, and |B| as |A^c| leads to the familiar |\neg (U \between A^c) \iff U \subseteq A|, i.e. |U| is a subset of |A| if and only if it doesn’t overlap |A|’s complement.

Incidentally, characterizing |\subseteq| and |\between| in terms of Galois connections, i.e. adjunctions, immediately gives us some properties for free via continuity. We have |U \subseteq \bigcap_{i \in I}A_i \iff \forall i\in I.U \subseteq A_i| and |U \between \bigcup_{i \in I}A_i \iff \exists i \in I.U \between A_i|. This is relative to a containing set |X|, so |\bigcap_{i \in \varnothing}A_i = X|, and |U| and each |A_i| are assumed to be subsets of |X|.

Categorical Perspective

Below I’ll perform a categorical analysis of the situation. I’ll mostly be using categorical notation and perspectives to manipulate normal sets. That said, almost all of what I say will be able to be generalized immediately just by reinterpreting the symbols.

To make things a bit cleaner in the future, and to make it easier to apply these ideas beyond sets, I’ll introduce the concept of a Heyting algebra. A Heyting algebra is a partially ordered set |H| satisfying the following:

  1. |H| has two elements called |\top| and |\bot| satisfying for all |x| in |H|, |\bot \leq x \leq \top|.
  2. We have operations |\land| and |\lor| satisfying for all |x|, |y|, |z| in |H|, |x \leq y \land z| if and only |x \leq y| and |x \leq z|, and similarly for |\lor|, |x \lor y \leq z| if and only |x \leq z| and |y \leq z|.
  3. We have an operation |\to| satisfying for all |x|, |y|, and |z| in |H|, |x \land y \leq z| if and only if |x \leq y \to z|.

For those familiar with category theory, you might recognize this as simply the decategorification of the notion of a bicartesian closed category. We can define the pseudo-complement, |\neg x \equiv x \to \bot|.

Any Boolean algebra is an example of a Heyting algebra where we can define |x \to y| via |\neg x \lor y| where here |\neg| is taken as primitive. In particular, subsets of a given set ordered by inclusion form a Boolean algebra, and thus a Heyting algebra. The |\to| operation can also be characterized by |x \leq y \iff (x \to y) = \top|. This lets us immediately see that for subsets of |X|, |(A \to B) = \{x \in X \mid x \in A \to x \in B\}|. All this can be generalized to the subobjects in any Heyting category.

As the notation suggests, intuitionistic logic (and thus classical logic) is another example of a Heyting algebra.

We’ll write |\mathsf{Sub}(X)| for the partially ordered set of subsets of |X| ordered by inclusion. As mentioned above, this is (classically) a Boolean algebra and thus a Heyting algebra. Any function |f : X \to Y| gives a monotonic function |f^* : \mathsf{Sub}(Y) \to \mathsf{Sub}(X)|. Note the swap. |f^*(U) \equiv f^{-1}(U)|. (Alternatively, if we think of subsets in terms of characteristic functions, |f^*(U) \equiv U \circ f|.) Earlier, we needed a way to turn predicates into sets. In this case, we’ll go the other way and identify truth values with subsets of |1| where |1| stands for an arbitrary singleton set. That is, |\mathsf{Sub}(1)| is the poset of truth values. |1| being the terminal object of |\mathbf{Set}| induces the (unique) function |!_U : U \to 1| for any set |U|. This leads to the important monotonic function |!_U^* : \mathsf{Sub}(1) \to \mathsf{Sub}(U)|. This can be described as |!_U^*(P) = \{x \in U \mid P\}|. Note, |P| cannot contain |x| as a free variable. In particular |!_U^*(\bot) = \varnothing| and |!_U^*(\top) = U|. This monotonic function has left and right adjoints:

\[\exists_U \dashv {!_U^*} \dashv \forall_U : \mathsf{Sub}(U) \to \mathsf{Sub}(1)\]

|F \dashv G| for monotonic functions |F : X \to Y| and |G : Y \to X| means |\forall x \in X. \forall y \in Y.F(x) \leq_Y y \iff x \leq_X G(y)|.

|\exists_U(A) \equiv \exists x \in U. x \in A| and |\forall_U(A) \equiv \forall x \in U. x \in A|. It’s easily verified that each of these functions are monotonic.1

It seems like we should be done. These formulas are the formulas I originally gave for |\between| and |\subseteq| in terms of quantifiers. The problem here is that these functions are only defined for subsets of |U|. This is especially bad for interpreting |U \between A| as |\exists_U(A)| as it excludes most of the interesting cases where |U| partially overlaps |A|. What we need is a way to extend |\exists_U| / |\forall_U| beyond subsets of |U|. That is, we need a suitable monotonic function |\mathsf{Sub}(X) \to \mathsf{Sub}(U)|.

Assume |U \subseteq X| and that we have an inclusion |\iota_U : U \hookrightarrow X|. Then |\iota_U^* : \mathsf{Sub}(X) \to \mathsf{Sub}(U)| and |\iota_U^*(A) = U \cap A|. This will indeed allow us to define |\subseteq| and |\between| as |U \subseteq A \equiv \forall_U(\iota_U^*(A))| and |U \between A \equiv \exists_U(\iota_U^*(A))|. We have:

\[\iota_U[-] \dashv \iota_U^* \dashv U \to \iota_U[-] : \mathsf{Sub}(U) \to \mathsf{Sub}(X)\]

Here, |\iota_U[-]| is the direct image of |\iota_U|. This doesn’t really do anything in this case except witness that if |A \subseteq U| then |A \subseteq X| because |U \subseteq X|.2

We can recover the earlier adjunctions by simply using these two pairs of adjunctions. \[\begin{align} U \between B \to Q & \iff \exists_U(\iota_U^*(B)) \to Q \\ & \iff \iota_U^*(B) \subseteq {!}_U^*(Q) \\ & \iff B \subseteq U \to \iota_U[{!}_U^*(Q)] \\ & \iff B \subseteq \{x \in X \mid x \in U \to Q\} \end{align}\]

Here the |\iota_U[-]| is crucial so that we use the |\to| of |\mathsf{Sub}(X)| and not |\mathsf{Sub}(U)|.

\[\begin{align} P \to U \subseteq A & \iff P \to \forall_U(\iota_U^*(A)) \\ & \iff {!}_U^*(P) \subseteq \iota_U^*(A) \\ & \iff \iota_U[{!}_U^*(P)] \subseteq A \\ & \iff \{x \in X \mid x \in U \land P\} \subseteq A \end{align}\]

In this case, the |\iota_U[-]| is truly doing nothing because |\{x \in X \mid x \in U \land P\}| is the same as |\{x \in U \mid P\}|.

While we have |{!}_U^* \circ \exists_U \dashv {!}_U^* \circ \forall_U|, we see that the inclusion of |\iota_U^*| is what breaks the direct connection between |U \between A| and |U \subseteq A|.

Examples

As a first example, write |\mathsf{Int}A| for the interior of |A| and |\bar A| for the closure of |A| each with respect to some topology on a containing set |X|. One way to define |\mathsf{Int}A| is |x \in \mathsf{Int}A| if and only if there exists an open set containing |x| that’s a subset of |A|. Writing |\mathcal O(X)| for the set of open sets, we can express this definition in symbols: \[x \in \mathsf{Int}A \iff \exists U \in \mathcal O(X). x \in U \land U \subseteq A\] We have a “dual” notion: \[x \in \bar A \iff \forall U \in \mathcal O(X). x \in U \to U \between A\] That is, |x| is in the closure of |A| if and only if every open set containing |x| overlaps |A|.

As another example, here is a fairly unusual way of characterizing a compact subset |Q|. |Q| is compact if and only if |\{U \in \mathcal O(X) \mid Q \subseteq U\}| is open in |\mathcal O(X)| equipped with the Scott topology3. As before, this suggests a “dual” notion characterized by |\{U \in \mathcal O(X) \mid O \between U\}| being an open subset. A set |O| satisfying this is called overt. This concept is never mentioned in traditional presentations of point-set topology because every subset is overt. However, if we don’t require that arbitrary unions of open sets are open (and only require finite unions to be open) as happens in synthetic topology or if we aren’t working in a classical context then overtness becomes a meaningful concept.

One benefit of the intersection-based definition of overlaps is that it is straightforward to generalize to many sets overlapping, namely |\bigcap_{i\in I} A_i \neq \varnothing|. This is also readily expressible using quantifiers as: |\exists x.\forall i \in I. x \in A_i|. As before, having an explicit “universe” set also clarifies this. So, |\exists x \in X.\forall i \in I. x \in A_i| with |\forall i \in I. A_i \subseteq X| would be better. The connection of |\between| to |\subseteq| suggests instead of this fully symmetric presentation, it may still be worthwhile to single out a set producing |\exists x \in U.\forall i \in I. x \in A_i| where |U \subseteq X|. This can be read as “there is a point in |U| that touches/meets/overlaps every |A_i|”. If desired we could notate this as |U \between \bigcap_{i \in I}A_i|. Negating and complementing the |A_i| leads to the dual notion |\forall x \in U.\exists i \in I.x \in A_i| which is equivalent to |U \subseteq \bigcup_{i \in I}A_i|. This dual notion could be read as “the |A_i| (jointly) cover |U|” which is another common and important concept in mathematics.

Conclusion

Ultimately, the concept of two (or more) sets overlapping comes up quite often. The usual circumlocution, |A \cap B \neq \varnothing|, is both notationally and conceptually clumsy. Treating overlapping as a first-class notion via notation and formulating definitions in terms of it can reveal some common and important patterns.


  1. If one wanted to be super pedantic, I should technically write something like |\{\star \mid \exists x \in U. x \in A\}| where |1 = \{\star\}| because elements of |\mathsf{Sub}(1)| are subsets of |1|. Instead, we’ll conflate subsets of |1| and truth values.↩︎

  2. If we think of subobjects as (equivalence classes of) monomorphisms as is typical in category theory, then because |\iota_U| is itself a monomorphism, the direct image, |\iota_U[-]|, is simply post-composition by |\iota_U|, i.e. |\iota_U \circ {-}|.↩︎

  3. The Scott topology is the natural topology on the space of continuous functions |X \to \Sigma| where |\Sigma| is the Sierpinski space.↩︎