Hedonistic Learning

Learning for the fun of it

Introduction

I’ll start by rationalizing an example of “old” umbral calculus from Wikipedia. |\newcommand{pair}[2]{\langle{#1}\mid{#2}\rangle}| |\newcommand{bigpair}[2]{\left\langle{#1}\ \middle|\ {#2}\right\rangle}| |\newcommand{pseq}[2]{\{#1\}_{#2 \in \mathbb N}}| |\newcommand{ucomp}[2]{#1_n(\underline #2(x))}|

We know |(x+y)^n = \sum_{k=0}^n {n \choose k} x^{n-k} y^k|. We then “infer” that |B_n(x+y) = \sum_{k=0}^n {n \choose k} B_{n-k}(x) y^k| where |B_n(x)| are the Bernoulli polynomials. Actually, the “old” style would be even more dubious. You’d have a “rule” like representing |B_n(x+y)| as |(b+y)^n|, then expand like usual and replace |b^k = (b + 0)^k| with |B_k(x)|. The variables like |b| were the “shadow” or “umbral” variables.

Rationalizing it using techniques I’ll describe below. Let |\varepsilon_y| be the linear operator on polynomials satisfying |\varepsilon_y p(x) = p(x + y)|. Since |D_x[\varepsilon_y p(x)] = \varepsilon_y D_x p(x)| for all |y| where |D_x| is differentiation by |x|, |\varepsilon_y| is induced from a formal power series. In particular, |\varepsilon_y = e^{yD_x}|.

Let |T| be the linear operator (the transfer or umbral operator) characterized by mapping |x^n| to |B_n(x)| representing a change of basis on the vector space of polynomials. We can apply |T| to the equation \[ \varepsilon_y(x^n) = (x+y)^n = \sum_{k=0}^n {n \choose k} x^{n-k} y^k \] to get via linearity \[ T(\varepsilon_y(x^n)) = T((x+y)^n) = \sum_{k=0}^n {n \choose k} T(x^{n-k}) y^k = \sum_{k=0}^n {n \choose k} B_{n-k}(x) y^k \]

The key property we then need is |T(\varepsilon_y(x^n)) = \varepsilon_y T(x^n) = B_n(x + y)| which can be reduced to |D_xT(x^n) = T(D_x x^n)|. This is just the statement that |D_x B_n(x) = nB_{n-1}(x) = nT(x^{n-1}) = T(nx^{n-1}) = T(D_x x^n)| using a well-known property of Bernoulli (and other) polynomials. In fact, this relation implies that |T| is itself induced by a formal power series and thus commutes with any other linear operator so induced.

Ultimately, the only properties we needed for this was that we had a (linear) change of basis from the monomial basis to the polynomial sequence, which we’ll have for any polynomial sequence whose |n|th element has degree |n|, and that the change of basis commuted with the |D_x| operator. The latter is more stringent, but there are various ways we can expand the scope of the argument.

First, |D_x x^n = \frac{c_n}{c_{n-1}} x^n| with |c_n = n!| completely characterizes (with |D_x x^0 = 0|) the differentiation operation on polynomials. Choosing a different sequence |c| leads to different notions of “differentiation”. This will change |\varepsilon_y| and lead to different formulas, but they will be structurally similar.

In a different direction, we can ask for formal power series and polynomial sequences that relate to each other the way |D_x| and |x^n| do. We say that a polynomial sequence |s_n(x)| is Sheffer for a pair |(g(t), f(t))| with |\deg g = 0| and |\deg f = 1| when |\pair{g(t)f(t)^k}{s_n(x)} = c_n\delta_k^n|. This has |g(t)f(t)^k| taking the place of |t^k| and |s_n(x)| taking the place of |x^n|. This would let us transfer identities involving the |s_n(x)| to any linear operator that commutes with |g(t)f(t)|. While changing |c| changes our notion of “differentiation”, using a Sheffer sequence allows us to consider other “differential operators” using the same notion of “differentiation”.

This is based primarily off of works by Steven Roman and Gian-Carlo Rota. It closely follows The Theory of the Umbral Calculus. I by Steven Roman (1982). See also The Umbral Calculus by Steven Roman (1984).

I’ll include proofs below to illustrate that each bit of reasoning is fairly straightforward.

Conventions

Fix a field |\mathbb K| of characteristic |0|. I’ll write |\mathscr F = \mathbb K[\![t]\!]| for the |\mathbb K|-algebra of univariate formal power series (with indeterminate |t|) and |P = \mathbb K[x]| for the |\mathbb K|-algebra of univariate polynomials (with indeterminate |x|). Further, |\mathrm{Hom}(X, Y)| will be the |\mathbb K|-vector space of |\mathbb K|-linear maps from a |\mathbb K|-vector space |X| to a |\mathbb K|-vector space |Y|. In particular, |X^* = \mathrm{Hom}(X,\mathbb K)| is the dual space of |X|.

The four main |\mathbb K|-vector spaces we’ll be focused on are |\mathscr F|, |P|, |P^*|, and |\mathrm{Hom}(P, P)|. The first two are additionally |\mathbb K|-algebras, and we’ll find that the third is as well and is, in fact, isomorphic to |\mathscr F| which is arguably a key enabling fact of umbral calculus. In particular, the |\mathbb K|-algebra structure induced on |P^*| via |\mathscr F| is called the umbral algebra.

Given the above, unsurprisingly, we’ll be talking a lot about formal power series and polynomials. To save a bit of space and typing for me, unless otherwise specified, if I say |f| is a formal power series or |f \in \mathscr F|, then that will also defined the sequence |\pseq{f_n}{n}| such that |f(t) = \sum_{k=0}^\infty f_k t^k|. Generally, when I state something is a sequence it will be a function |\mathbb N \to X| for some |X| and the parameter will be written as a subscript. So the formal power series |f| also defines a sequence, also called |f|, from |\mathbb N \to \mathbb K|. (In this case, we could literally identify formal power series with these sequences.) Similarly, stating |p| is a polynomial or |p \in P| defines a sequence also called |p| such that |p(x) = \sum_{n=0}^{\deg p}p_n x^n| where |\deg p| is the degree of the polynomial, i.e. the largest value of |n| such that |p_n \neq 0|. These are also sequences |\mathbb N \to \mathbb K|, and we could identify polynomials with such sequences that are eventually always |0|. We also have the degree of a formal power series written |\deg f| which is the smallest |k| such that |f_k \neq 0|. Note that this is dual to the notion for polynomials. It’s clear that |\deg(fg) = \deg f + \deg g|.

Of course, sometimes I will explicitly state something like |f(t) = \sum_{k=0}^\infty a_k t^k| in which case the sequence |f| is not defined. Usually, this will arise with a formal power series expression, e.g. |(f \circ g)(t)| so there shouldn’t be any ambiguity. As is typical, I’ll often say “the formal power series |f(t)|” as opposed to “the formal power series |f|”. Finally, as has been illustrated, I’ll endeavor to use |k| as the indexing variable for formal power series and |n| for polynomials, but that won’t always be possible.

The Kronecker delta is written |\delta_k^n| and defined by |\delta_n^n = 1| and |\delta_k^n = 0| for |k \neq n|. This should typically be thought of as a way of forcing |n| and |k| to be equal, i.e. |f(n)\delta_k^n = f(k)\delta_k^n| and |\delta_{f(k)}^n = \delta_k^{f^{-1}(n)}| for an invertible function |f|.

Formal Power Series

I’ll assume familiarity with the basic algebra of formal power series. This lecture gives a nice in-depth and more technical overview, though it goes far beyond what we’ll need. I’ll recall a few results and fix the terminology and notation that will be used in this article which largely follows Roman but there are many variations in the literature.

Theorem (id:wewp): |f \in \mathscr F| has a multiplicative inverse, written |f^{-1}| if and only if |\deg f = 0|.

Proof: (click to expand) |f(t)g(t) = \sum_{k=0}^\infty c_n t^k| where |c_k = \sum_{i+j=k} f_i g_j|. Clearly, |c_0 = f_0 g_0| which makes it clear a multiplicative inverse to |f| can only exist if |f_0 \neq 0|. It also makes it clear that for |g| to be |f^{-1}|, |g_0 = 1/f_0|. A simple calculation shows that |g_k = f_0^{-1} \sum_{n=1} f_n g_{k-n}| for |k > 0| which gives a recurrence computing all the remaining coefficients of |f^{-1}|. |\square|

Thus, |f| being invertible will be synonymous with |f| having degree |0|.

It’s worth noting that |f/g| can be defined even when |g| isn’t invertible, e.g. |t/t = 1|. If |\deg f > \deg g|, then we can cancel out common factors of |t| until the denominator is invertible.

Suppose |g : \mathbb N \to \mathscr F|, which we’ll write as |g_k(t) \in \mathscr F|, such that |\deg g_k \to \infty| as |k \to \infty|. Given any |a : \mathbb N \to \mathbb K|, then |\sum_{k=0}^\infty a_k g_k(t)| is a well-defined element of |\mathscr F|. In particular, if |\deg g > 0|, then |g_k(t) = g(t)^k| satisfies the conditions. If we have |\deg g_k = k|, which is the case when |g_k(t) = g(t)^k| for |g| with degree |1| for example, then |g| forms a pseudobasis1 for |\mathscr F| meaning for any |f \in \mathscr F|, there exists a unique sequence |a| such that |f(t) = \sum_{k=0}^\infty a_n g_k(t)|. A series |f| with |\deg f = 1| is called a delta series. Every delta series gives rise to a pseudobasis of |\mathscr F|.

If |f, g \in \mathscr F| and |\deg g > 0|, then we can thus form the composition |f(g(t)) = \sum_{k=0}^\infty f_k g(t)^k|. It’s clear that |\deg(f\circ g) = \deg f\deg g|.

Theorem (id:cjme): A series, |f|, has a compositional inverse, written |\bar f|, meaning |f(\bar f(t)) = t = \bar f(f(t)| if and only if |\deg f = 1|.

Proof: (click to expand) Suppose |g \in \mathscr F| such that |f(g(t)) = t|. By taking degrees, we immediately see that |f| (and |g|) need to be of degree exactly one to have any chance for |g| to be a compositional inverse to |f|. If |g(t)^k = \sum_{n=0}^\infty b_{k,n} t^n|, then clearly we need |f_1 b_{1,1} = 1|. By induction, we have |b_{k+1,n} = \sum_{i+j=n} b_{1,i}b_{k,j}| but note that |b_{k,n} = 0| for all |n < k| so this sum always has |k \leq j < n| and thus |1 \leq i \leq n-k|. This leads to \[\sum_{k=1}^n f_k b_{k,n} = 0 = f_1 b_{1,n} + \sum_{k=2}^n f_k b_{k,n}\] where the last sum only involves |b_{1,i}| for |i < n|. Alternatively, simply note that if |f(t) = th(t)| and |g(t) = tk(t)| then, |h| and |k| have degree |0|, and |t = f(g(t)) = t k(t)h(g(t))|. So |k(t) = h(g(t))^{-1} = h(tk(t))^{-1}|. |h| is invertible and |tk(t)| clearly has degree greater than |0| so the composition is well-defined. Unfolding this expression for |k(t)| shows that the |i|th coefficient only depends on earlier coefficients. |\square|

A useful result linking these two together is if |\deg f = 1|, then \[ 1 = t’ = [\bar f(f(t))]' = \bar f’(f(t))f’(t) \] In other words, |f’(t)^{-1} = \bar f’(f(t))|.

Linear Functionals

For |L \in P^*| and |p \in P|, we’ll write |\pair{L}{p(x)}| for the action of |L| on |p|. Any such |L| is uniquely defined by its values on |x^n| for all |n\in\mathbb N|.

If |c : \mathbb N \to \mathbb K\setminus \{0\}|, we can define for each |f \in \mathscr F| a linear functional which we’ll also write as |f| or |f(t)| via \[\pair{f(t)}{x^n} = c_n f_n\] Really, we should write something like |\pair{f(t)}{p(x)}_c| to indicate the dependence on |c|. This play on notation is unambiguous since |f(t) = g(t)| if and only if |\pair{f(t)}{x^n} = \pair{g(t)}{x^n}| for all |n|, i.e. |f| and |g| are equal as power series if and only if the induced linear functionals are equal.

Notable choices for |c| are:

  • |c_n = n!| is the most traditional case.
  • |c_n = 1|
  • |c_n = 1/{\lambda \choose n}| for |\lambda| not a negative integer.

The definition of the linear functional induced by |f \in \mathscr F| implies that |\pair{t^k}{x^n} = c_n\delta_k^n|. This leads to \[\bigpair{\sum_{n=0}^\infty a_n t^n}{p(x)} = \sum_{n=0}^\infty a_n \pair{t^n}{p(x)} \] where the right-hand side is well-defined because only finitely many of the terms of the sum will be non-zero. (We can generalize to allow Laurent series with only finitely many negative powers on the left and Laurent series with only finitely many positive powers on the right.)

We can articulate L’Hôpital’s rule with this notation as: if |\deg f \geq \deg g > 0|, then \[\pair{f(t)/g(t)}{x^0} = \pair{f’(t)/g’(t)}{x^0} \]

We can explicitly write the formal power series, |f_L|, corresponding to the linear functional, |L|, as \[f_L(t) = \sum_{k=0}^\infty \frac{\pair{L}{x^k}}{c_n}t^k\] It is trivial to verify that the linear functional induced by |f_L| is |L|. This gives an isomorphism |\mathscr F \cong P^*| as |\mathbb K|-vector spaces. However, the algebra structure on |\mathscr F| then induces an algebra structure on |P^*|. We can compute \[\pair{f(t)g(t)}{x^n}\rangle = \sum_{i+j=n}\frac{c_n}{c_i c_j}\pair{f(t)}{x^i}\pair{g(t)}{x^j}\]

We’ll call |L| a delta/invertible functional when it corresponds to a delta/invertible power series.

Properties

If |\deg f > \deg p| then |\pair{f(t)}{p(x)} = 0|.

If |\deg p_n(x) = n| and |\pair{f(t)}{p_n(x)} = 0| for all |n \in \mathbb N|, then |f(t) = 0|.

|\pair{f(at)}{p(x)} = \pair{f(t)}{p(ax)}| [Proof: Follows immediately from |\pair{a^n t^n}{x^n} = a^n c_n = \pair{t^n}{a^n x^n}|. |\square|]

|p(x) = \sum_{n=0}^\infty \frac{\pair{t^n}{p(x)}}{c_n}x^n|

If |\deg f_k = k| and |\pair{f_k(t)}{p(x)} = 0| for all |k \in \mathbb N|, then |p(x) = 0|.

Evaluation Functional

We always have the evaluation functional |\varepsilon_y| for |y \in \mathbb K| defined by \[\pair{\varepsilon_y(t)}{p(x)} = p(y)\] Note that this definition doesn’t depend on the choice of |c|. We quickly compute |\pair{\varepsilon_y(t)}{x^n} = c_n y^n| so \[ \varepsilon_y(t) = \sum_{k=0}^\infty \frac{y^k}{c_k}t^k\]

When |c_n = n!|, then |\varepsilon_y(t) = e^{yt}|.

Formal Derivative

The formal derivative of |f \in \mathscr F|, written |\partial_t f(t)|, is defined as \[\partial_t t^k = \begin{cases} \frac{c_k}{c_{k-1}}t^{k-1}, & k > 0 \\ 0, & k = 0 \end{cases}\] which leads to the key property |\pair{\partial_t f(t)}{p(x)} = \pair{f(t)}{xp(x)}|.

As an example, we immediately compute that |\partial_t\varepsilon_y(t) = y\varepsilon_y(t)|.

We will also use the ordinary derivative of formal power series which we’ll notate with |f’(t)|. The formal derivative and the ordinary derivative coincide when |c_n = n!| as suggested by the previous example.

Linear Operators

We’ve identified formal power series with linear functionals on |P|. Next, we want to identify them with linear operators on |P|. We’re clearly not going to get an isomorphism in this case as multiplication (i.e. composition) of linear operators doesn’t commute in general, while multiplication of formal power series does. Nevertheless, we will derive simple characterizations of which linear operators are of this form.

One of the most important properties we will want is the following adjointness property: \[ \pair{f(t)g(t)}{p(x)} = \pair{f(t)}{g(t)p(x)} \] where |g(t) p(x)| is the action of the linear operator induced by |g(t)| on |p(x)|. We can derive what the induced linear operator must be to satisfy this property.

\[\begin{align} c_n \delta_m^n & = \pair{t^m}{x^n} \\ & = \pair{t^{m-k} t^k}{x^n} \\ & = \pair{t^{m-k}}{t^k x^n} \\ & = \bigpair{t^{m-k}}{\frac{c_n}{c_{n-k}}x^{n-k}} \end{align}\] so |t^k x^n = \frac{c_n}{c_{n-k}} x^{n-k}| for |k \leq n| and |0| otherwise. Thus, \[f(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k}} f_k x^{n-k} \] which can be extended to all polynomials by linearity.

This should look familiar. |tx^n = \frac{c_n}{c_{n-1}}x^{n-1}| which is exactly the same formula as the one for |\partial_t| except this operates on polynomials while |\partial_t| operates on formal power series. In particular, when |c_n = n!|, |t| behaves exactly like the derivative of polynomials with respect to |x|, and we see that the formal power series pick out a special class of differential operators on polynomials.

A simple calculation shows that |(t^j t^k) x^n = t^j (t^k x^n)| which lifts to a general associativity law: |(f(t) g(t)) p(x) = f(t) (g(t) p(x))|. The adjointness property also immediately implies that the induced linear operators commute.

As before, we will say delta/invertible operator when the linear operator is induced by a delta/invertible formal power series.

Define |Dx^n = nx^{n-1}|, |D^{-1}x^n = \frac{1}{n+1}x^{n+1}|, and |x^{-1} x^n = \begin{cases}x^{n-1}, & n > 0 \\ 0, & n = 0\end{cases}|. Then for various choices of |c|, |t| behaves as the following linear operators:

  • |c_n = n!| implies |t = D|
  • |c_n = 1| implies |t = x^{-1}|
  • |c_n = (n!)^{m+1}| implies |t = (Dx)^m D|
  • |c_n = 1/(-\lambda)_{(n)}| implies |t = -(\lambda + xD)^{-1} x^{-1}|
  • |c_n = 1/n| implies |t = x^{-1} D^{-1} x^{-1}|
  • |c_n = 1/{-\lambda \choose n}| implies |t = -(\lambda + xD)^{-1} D|
  • |c_n = 2^{2n}(1+\alpha)^{(n)}/(1 + \alpha + \beta)^{(2n)}| implies |t = 4(1 + \alpha + \beta + 2xD)^{-1} (2 + \alpha + \beta + 2xD)^{-1} x^{-1} (\alpha + xD)|
  • |c_n = (1-q^n)^{-1} \prod_{k=1}^n (1-q^k)| implies |tp(x) = (p(qx) - p(x))/(qx - x)|

A linear operator |T| on |\mathscr F| is continuous if given a sequence of formal power series |\pseq{f_k}{k}| such that |\deg f_k \to \infty| as |k \to \infty|, we have |\deg T(f_k) \to \infty| as |k \to \infty|.

Theorem (id:fqys): If |T| is a continuous linear operator on |\mathscr F|, then \[ T\left(\sum_{k=0}^\infty a_k f_k(t)\right) = \sum_{k=0}^\infty a_k T(f_k(t)) \] for all sequences |\pseq{a_k}{k}| in |\mathbb K| and |\pseq{f_k}{k}| in |\mathscr F| for which |\deg f_k \to \infty| as |k \to \infty|.

Proof: (click to expand) By the assumptions, both |\pair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n}| and |\pair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n}| involve only finitely many terms of the sum. That is, for every |n| there is some |N| such that for all |m > N|, |\pair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n} = \pair{T\left(\sum_{k=0}^m a_k f_k(t)\right)}{x^n}| and |\pair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n} = \pair{\sum_{k=0}^m a_k T(f_k(t))}{x^n}|. But \[\begin{align} \bigpair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n} & = \bigpair{T\left(\sum_{k=0}^m a_k f_k(t)\right)}{x^n} \\ & = \bigpair{\sum_{k=0}^m a_k T(f_k(t))}{x^n} \\ & = \bigpair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n} \end{align}\] so these two linear functionals agree on a pseudobasis and thus are the same which implies the formal power series are the same as well. |\square|

This can be cast as an instance of topological continuity, but I won’t describe that here.

Evaluation Operator

Unlike the linear functional case, the linear operator induced by the formal power series corresponding to the evaluation functional does depend on the choice of |c|. In general, we have: \[\varepsilon_y(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k} c_k} y^k x^{n-k} \]

For |c_n = n!|, |\varepsilon_y(t) p(x) = p(x + y)|. For |c_n = 1|, |\varepsilon_y(t) x^n = \frac{x^{n+1} - y^{n+1}}{x-y}|.

Characterizing Linear Operators Induced from Formal Power Series

Theorem (id:lgtb): Let |U| be a linear operator on |P|. There is an |f \in \mathscr F| such that |Up(x) = f(t) p(x)| for all |p \in P| if and only if |U| commutes with the operator |t|, i.e. |Utp(x) = tUp(x)| for all |p \in P|.

Proof: (click to expand) The only if direction is obvious. For the other direction, first note that |\deg Ux^n \leq n| because |t^k Ux^n = Ut^k x^n = 0| if |k > n| so |\pair{t^k}{Ux^n} = 0| for all |k > n|. Now define \[f(t) = \sum_{k=0}^\infty \frac{\pair{t^0}{Ux^k}}{c_k}t^k \] Then, \[\begin{align} f(t) x^n & = \sum_{k=0}^n \frac{\pair{t^0}{Ux^k}}{c_k}t^k x^n \\ & = \sum_{k=0}^n \frac{c_n}{c_k c_{n-k}} \pair{t^0}{Ux^k} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^0}{Ut^{n-k} x^n}}{c_{n-k}} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^0}{t^{n-k} Ux^n}}{c_{n-k}} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^{n-k}}{Ux^n}}{c_{n-k}} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^k}{Ux^n}}{c_k} x^k \\ & = Ux^n \end{align}\] The last equality relies on the degree of |Ux^n| being less than or equal to |n|. |\square|

Corollary (id:viti): A linear operator on P has the form of |f(t)| for an |f \in \mathscr F| if and only if it commutes with any specific delta operator.

Proof: (click to expand) The sequence of powers of the formal power series associated with the specific delta operator form a pseudobasis which means we can write |t| as an infinite linear combination of them. Thus the linear operator commutes with |t| and we can apply the theorem. |\square|

Corollary (id:qllw): A linear operator on P has the form of |f(t)| for an |f \in \mathscr F| if and only if it commutes with |\varepsilon_y(t)| for all |y \in \mathbb K|.

Proof: |\varepsilon_y(t) - c_0^{-1} t^0| is a delta operator. |\square|

Polynomial Sequences

When we say |\pseq{p_n(x)}{n}| is a (polynomial) sequence, that will always mean that |\deg p_n(x) = n|.

Theorem (id:cgsr): Let |f| be a delta series and |g| be an invertible series, then there is a unique polynomial sequence |\pseq{s_n(x)}{n}| such that \[\pair{g(t)f(t)^k}{s_n(x)} = c_n\delta_k^n \] holds for all |n,k \in \mathbb N|.

Proof: (click to expand)

Uniqueness follows easily by considering |\pair{g(t)f(t)^k}{s_n(x) - r_n(x)} = 0| where |\pseq{r_n(x)}{n}| is another sequence satisfying the same property.

For existence, we can just brute force it. If |g(t)f(t)^k = \sum_{i=k}^\infty b_{k,i} t^i| and we set |s_n(x) = \sum_{j=0}^n a_{n,j} x^j|, then we want to solve for the |a_{n,i}| induced by the following triangular system of linear equations: \[\begin{align} c_n\delta_k^n & = \pair{g(t)f(t)^k}{s_n(x)} \\ & = \bigpair{\sum_{i=k}^\infty b_{k,i} t^i}{\sum_{j=0}^n a_{n,j} x^j} \\ & = \bigpair{\sum_{i=k}^n b_{k,i} t^i}{\sum_{j=0}^n a_{n,j} x^j} \\ & = \sum_{i=k}^n \sum_{j=0}^n b_{k,i} a_{n,j} \pair{t^i}{x^j} \\ & = \sum_{i=k}^n c_i b_{k,i} a_{n,i} \end{align}\] |b_{k,k} \neq 0| since |\deg g(t)f(t)^k = k| and |c_n \neq 0| by assumption. Therefore, the diagonal entries of the triangular matrix corresponding to this system of linear equations are non-zero, and thus the matrix is invertible. |\square|

We’ll say |\pseq{s_n(x)}{n}| is the Sheffer sequence or is Sheffer for the pair |(g, f)|. When |g(t) = 1|, then we say that the corresponding Sheffer sequence is the associated sequence to |f|. When |f(t) = t|, then we say that the corresponding Sheffer sequence is the Appell sequence for |g|. Often I’ll use |\pseq{p_n(x)}{n}| for associated sequences, i.e. when |g(t) = 1|, and reserve |\pseq{s_n(x)}{n}| for the general case. The idea is that if |g(t)f(t)^k| takes the place of |t^k|, then |s_n(x)| takes the place of |x^n|. This is illustrated by the defining property and the following theorems.

Theorem (Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for any |h \in \mathscr F|, \[ h(t) = \sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k \]

Proof: (click to expand) |\pseq{s_n(x)}{n}|, like any polynomial sequence, is a basis for |P|, so two linear functionals are equal if they agree on all |s_n(x)|. Clearly, \[\begin{align} \bigpair{\sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k}{s_n(x)} & = \bigpair{\sum_{k=0}^n \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k}{s_n(x)} \\ & = \sum_{k=0}^n \frac{\pair{h(t)}{s_k(x)}}{c_k}\pair{g(t)f(t)^k}{s_n(x)} \\ & = \pair{h(t)}{s_n(x)} \end{align}\] |\square|

Corollary (Polynomial Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for a |p \in P|, \[ p(x) = \sum_{n=0}^\infty \frac{\pair{g(t)f(t)^n}{p(x)}}{c_n} s_n(x) \]

Proof: (click to expand) Choose |h = \varepsilon_y| in the Expansion Theorem, then apply it as a linear functional to |p|. |\square|

Theorem (Generating Function): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t)) = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \] for all |y \in \mathbb K|.

Proof: (click to expand)

For the forward implication, using the Expansion Theorem we have: \[ \varepsilon_y(t) = \sum_{k=0}^\infty \frac{\pair{\varepsilon_y}{s_k(x)}}{c_k} g(t) f(t)^k = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} g(t) f(t)^k \] Substituting |\bar f(t)| for |t| and dividing both sides by |g(\bar f(t))| gives the result.

For the reverse implication, if |\pseq{r_n(x)}{n}| is the Sheffer sequence for |(g, f)| then we immediately get \[ \sum_{k=0}^\infty \frac{r_k(y)}{c_k} t^k = \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t)) = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \] and applying both sides to |x^n| then gives |s_n(y) = r_n(y)| for all |y \in \mathbb K|. |\square|

Theorem (Conjugate Representation): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ s_n(x) = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} x^k \]

Proof: (click to expand) Applying |\varepsilon_y| to both sides gives \[\begin{align} s_n(y) & = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} y^k \\ & = \bigpair{\sum_{k=0}^n \frac{y^k}{c_k}g(\bar f(t))^{-1}\bar f(t)^k}{x^n} \\ & = \bigpair{\sum_{k=0}^\infty \frac{y^k}{c_k}g(\bar f(t))^{-1}\bar f(t)^k}{x^n} \\ & = \pair{g(\bar f(t))^{-1}\varepsilon_y(\bar f(t))}{x^n} \\ \end{align}\] but \[ s_n(y) = \bigpair{\sum_{k=0}^\infty \frac{s_n(y)}{c_k}t^k}{x^n} \] so we can apply the Generating Function theorem. |\square|

Theorem (Multiplication Theorem): Let |\pseq{s_n(x)}{n}| be Appell for |g|, then \[ s_n(\alpha x) = \alpha^n \frac{g(t)}{g(t/\alpha)} s_n(x) \] for |\alpha\neq 0|.

Proof: (click to expand) \[\begin{align} \pair{t^k}{g(t/\alpha) s_n(\alpha x)} & = \pair{t^k g(t/\alpha)}{s_n(\alpha x)} \\ & = \pair{(\alpha t)^k g(\alpha t/\alpha)}{s_n(x)} \\ & = \alpha^k \pair{g(t) t^k}{s_n(x)} \\ & = \alpha^k c_n \delta_k^n \\ & = \alpha^n c_n \delta_k^n \\ & = \alpha^n \pair{g(t)t^k}{s_n(x)} \\ & = \pair{t^k}{\alpha^n g(t) s_n(x)} \end{align}\] so |g(t/\alpha) s_n(\alpha x) = \alpha^n g(t) s_n(x)|. |\square|

Theorem (id:qqes): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if |\pseq{g(t)s_n(x)}{n}| is the associated sequence for |f|.

Proof: Just apply adjointness to the definition. |\square|

Theorem (id:cutg): A sequence |\pseq{p_n(x)}{n}| is the associated sequence for |f| if and only if 1) |\pair{t^0}{p_n(x)} = c_0 \delta_n^0| for all |n \in \mathbb N|, and 2) |f(t) p_n(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x)| for all |n \in \mathbb N|.

Proof: (click to expand)

|f(t)^0 = t^0| implies the first condition. For the second condition, if |\pseq{p_n(x)}{n}| is associated to |f|, then for |k > 0| \[\begin{align} \bigpair{f(t)^{k-1}}{\frac{c_n}{c_{n-1}}p_{n-1}(x)} & = \frac{c_n}{c_{n-1}}\pair{f(t)^{k-1}}{p_{n-1}(x)} \\ & = c_n \delta_k^n \\ & = \pair{f(t)^k}{p_n(x)} \\ & = \pair{f(t)^{k-1}}{f(t)p_n(x)} \end{align}\] and |\pseq{f(t)^k}{k}| is a pseudobasis.

Conversely, assuming (1) and (2) hold, then \[\begin{align} \pair{f(t)^k}{p_n(x)} & = \pair{t^0}{f(t)^k p_n(x)} \\ & = \frac{c_n}{c_{n-k}} \pair{t^0}{p_{n-k}(x)} \tag{(2) k times} \\ & = \frac{c_n}{c_{n-k}} c_0 \delta_{n-k}^0 \tag{(1)} \\ & = c_n \delta_k^n \end{align}\] |\square|

Theorem (id:hvdt): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if |f(t) s_n(x) = \frac{c_n}{c_{n-1}}s_{n-1}(x)|.

Proof: (click to expand)

For the forward implication, simply apply the previous theorem to |\pseq{g(t)s_n(x)}{n}| which is associated to |f|, then apply |g(t)^{-1}| to the resulting recurrence equation.

For the reverse implication, let |\pseq{p_n(x)}{n}| be the associated sequence for |f| and |U| be the linear operator defined by sending |s_n(x)| to |p_n(x)|. Then we have \[ Uf(t)s_n(x) = \frac{c_n}{c_{n-1}}Us_{n-1}(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x) = f(t)p_n(x) = f(t)Us_n(x) \] Since |\pseq{s_n(x)}{n}| is a basis, we see that |U| commutes with a delta series and thus must be of the form |g(t)| for some |g| which is invertible because |U| preserves degree. Thus |\pseq{g(t)s_n(x)}{n}| is associated to |f| which is equivalent to |\pseq{s_n(x)}{n}| being Sheffer for |(g, f)|. |\square|

Corollary (id:tvdx): \[ ts_n(x) = \sum_{k=0}^{n-1} \frac{c_n}{c_k c_{n-k}} \pair{t}{p_{n-k}(x)} s_k(x) \]

Proof: (click to expand) Start by expanding |ts_n(x)| via the Polynomial Expansion Theorem \[\begin{align} ts_n(x) & = \sum_{k=0}^\infty \frac{\pair{g(t)f(t)^k}{ts_n(x)}}{c_k} s_k(x) \\ & = \sum_{k=0}^{n-1} \frac{\pair{t}{g(t)f(t)^k s_n(x)}}{c_k} s_k(x) \\ & = \sum_{k=0}^{n-1} \frac{c_k}{c_{n-k} c_k} \pair{t}{g(t) s_{n-k}(x)} s_k(x) \tag{theorem (id:hvdt)} \\ & = \sum_{k=0}^{n-1} \frac{c_k}{c_{n-k} c_k} \pair{t}{p_{n-k}(x)} s_k(x) \tag{theorem (id:qqes)} \end{align}\] |\square|

Theorem (Sheffer Identity): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if \[ \varepsilon_y(t) s_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) s_j(x) \] for all |y \in \mathbb K| where |\pseq{p_n(x)}{n}| is associated to |f|.

Proof: (click to expand)

First, we’ll establish that \[ \varepsilon_y(t) p_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) p_j(x) \]

Applying |f(t)^k| to both sides of the equation leads to \[\begin{align} \pair{f(t)^k}{\varepsilon_y(t) p_n(x)} & = \pair{\varepsilon_y(t)}{f(t)^k p_n(x)} \\ & = \frac{c_n}{c_{n-k}} \pair{\varepsilon_y(t)}{p_{n-k}(x)} \\ & = \frac{c_n}{c_{n-k}} p_{n-k}(y) \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_{j-k}} p_i(y) \pair{t^0}{p_{j-k}(x)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) \pair{t^0}{f(t)^k p_j(x)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) \pair{f(t)^k}{p_j(x)} \\ & = \bigpair{f(t)^k}{\sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) p_j(x)} \end{align}\] which is most easily read from outwards in.

Doing the same trick as the previous proof, we let |U| be a linear operator defined by sending |s_n(x)| to |p_n(x)|. If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some |g|, we can choose |U = g(t)| and applying |U| to both sides of the above equation gives the forward direction since |g(t)| commutes with |\varepsilon_y(t)|. Conversely, if we assume the equation then we’ll see that |U| must commute with |\varepsilon_y(t)| which implies its of the form |g(t)|. |\square|

As an example, we see that the Bernoulli polynomials |B_n(x)| are Sheffer for |t| with |c_n = n!| for which the associated polynomials are |\pseq{x^n}{n}|. The Sheffer Identity is thus the one mentioned in the Introduction.

Theorem (id:jtbs): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and |\pseq{p_n(x)}{n}| be associated to |f|. For all |h, l \in \mathscr F|, \[ \pair{h(t)l(t)}{s_n(x)} = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{h(t)}{p_i(x)} \pair{l(t)}{s_j(x)} \]

Proof: (click to expand) Use the Expansion Theorem on |h| with respect to |\pseq{p_n(x)}{n}| and |l| with respect to |\pseq{s_n(x)}{n}|. |h(t)l(t)| will then be \[ \sum_{i+j=k} \left(\frac{\pair{h(t)}{p_i(x)}}{c_i} f(t)^i\right) \left(\frac{\pair{l(t)}{s_j(x)}}{c_j} g(t)f(t)^j\right) = \left[\sum_{i+j=k} \frac{\pair{h(t)}{p_i(x)}\pair{l(t)}{s_j(x)}}{c_i c_j}\right] g(t)f(t)^k \] Applying this to |s_n(x)| gives the result. |\square|

See the paper for an interesting alternative proof of this result.

Recurrence Formulas

Given a linear operator |\mu| on |P|, the adjoint |\mu^*| is a linear operator on |\mathscr F| characterized by: \[ \pair{\mu^* f(t)}{p(x)} = \pair{f(t)}{\mu p(x)} \]

We can readily compute that: \[ \mu^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{\mu x^k}}{c_k} t^k \]

Theorem (id:ahdu): The adjoint to a linear operator on |P| is continuous.

Proof: (click to expand) Let |\pseq{f_k(t)}{k}| where |\deg f_k \to \infty| as |k \to \infty|. Let |K_n| be an index such that |\deg f_{K_n} > \sup_{i=0}^n \deg \mu x^i|. |\pair{f_{K_n}(t)}{\mu x^m} = 0| for all |m \leq n|, so, using the formula above, |\deg \mu^* f_{K_n} \geq n|. |\square|

In fact, a linear operator on |\mathscr F| is an adjoint to one on |P| if and only if it is continuous.

Theorem (id:fdui): If |T| is a continuous linear operator on |\mathscr F|, then there exists a linear operator |\mu| on |P| such that |T = \mu^*|.

Proof: (click to expand) Define |\mu x^n = \sum_{k=0}^\infty \frac{\pair{Tt^k}{x^n}}{c_k} x^k| which is well-defined because |T| being continuous means only finitely many |\pair{Tt^k}{x^n}| are non-zero. By construction, we see that |\pair{t^k}{\mu x^n} = \pair{Tt^k}{x^n}|. |\square|

If |\pseq{p_n(x)}{n}| is associated to |f|, then the umbral shift |\theta_f| associated to |f| is the linear operator on |P| defined by \[ \theta_f p_n(x) = \frac{(n+1)c_n}{c_{n+1}} p_{n+1}(x) \] for all |n \in \mathbb N|. In the case where |p_n(x) = x^n| and |c_n = n!| the umbral shift is just multiplication by |x|. Since, famously, multiplication by |x| does not commute with differentiation, |Dx - xD = 1| as operators, the umbral shift isn’t induced as a linear operator by a formal power series. We’ll see that this is generally the case below.

A derivation on an algebra |A| is a linear operator |\partial| on |A| satisfying \[ \partial(ab) = (\partial a)b + a\partial b \] for all |a, b \in A|.

Lemma (id:lxnr): A continuous linear operator |\partial| on |\mathscr F| is a continuous derivation if and only if |\partial 1 = 0| and for any delta series |f|, |\partial f(t)^k = kf(t)^{k-1}g(t)| for all |k \in \mathbb N| for some |g|.

Proof: (click to expand) A continuous derivation satisfies |\partial h(t) = h’(t)\partial t| from which the result follows with |g(t) = f’(t)\partial t|. A continuous linear operator satisfying these laws satisfies |\partial h(t) = h’(t)f’(t)^{-1}g(t)| which we can see by expanding |h(t)| in terms of the pseudobasis |\pseq{f(t)^k}{k}| and using continuity to push the |\partial| into the sum. Given this \[\begin{align} \partial (h(t)k(t)) & = (h(t)k(t))'f’(t)^{-1}g(t) \\ & = k(t)h’(t)f’(t)^{-1}g(t) + h(t)k’(t)f’(t)^{-1}g(t) \\ & = (\partial h(t))k(t) + h(t)\partial k(t) \end{align}\] |\square|

Theorem (id:oqyq): An operator |\theta| on |P| is the umbral shift for the delta series |f| if and only if its adjoint |\theta^*| is a derivation on |\mathscr F| and \[ \theta^* f(t)^k = kf(t)^{k-1} \] for all |k \in \mathbb N|.

Proof: (click to expand)

If |\theta| is the umbral shift for |f| associated to |\pseq{p_n(x)}{n}|, then \[\begin{align} \pair{\theta^* f(t)^k}{p_n(x)} & = \pair{f(t)^k}{\theta p_n(x)} \\ & = \frac{(n+1)c_n}{c_{n+1}}\pair{f(t)^k}{p_{n+1}(x)} \\ & = (n+1)c_n \delta_k^{n+1} \\ & = kc_n \delta_k^{n+1} \\ & = kc_n \delta_{k-1}^n \\ & = \pair{kf(t)^{k-1}}{p_n(x)} \end{align}\] We of course have |\pair{\theta^* t^0}{p_n(x)} = \pair{t^0}{\theta p_n(x)} = \frac{(n+1)c_n c_0}{c_{n+1}} \pair{f(t)^0}{p_{n+1}(x)} = \frac{(n+1)c_n c_0}{c_{n+1}} \delta_0^{n+1} = 0| since |n+1| is never |0|. Since |\theta^*| is an adjoint, it’s continuous, and we can apply the previous lemma to conclude that it is a derivation.

If |\theta^*| is a derivation satisfying |\theta^* f(t)^k = kf(t)^{k-1}| then we can rearrange the equations of the first result to get: \[\begin{align} \pair{f(t)^k}{\theta p_n(x)} & = \pair{\theta^* f(t)^k}{p_n(x)} \\ & = \pair{kf(t)^{k-1}}{p_n(x)} \\ & = kc_n \delta_{k-1}^n \\ & = kc_n \delta_k^{n+1} \\ & = (n+1)c_n \delta_k^{n+1} \\ & = \frac{(n+1)c_n}{c_{n+1}}\pair{f(t)^k}{p_{n+1}(x)} \end{align}\] |\square|

In the particular case where |f(t) = t|, the above states that |\theta_t^* t^k = kt^{k-1}|, i.e. |\theta_t^* f(t) = f’(t)|. (Not to be confused with |\partial_t| which is only the true derivative when |c_n = n!|.) We can easily compute that |\theta_t t = xD|. Notably, this does not depend on the choice of |c|.

Theorem (id:kscz): If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)|, then \[ \theta_t s_n(x) = \sum_{k=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}} \pair{g’(t)}{s_{n-k}(x)} + \frac{kc_n}{c_k c_{n-k+1}} \pair{g(t)f’(t)}{s_{n-k+1}(x)} \right] s_k(x) \]

Proof: (click to expand) Start with the Polynomial Expansion Theorem. \[\begin{align} \theta_t s_n(x) & = \sum_{n=0}^{n+1} \frac{\pair{g(t)f(t)^k}{\theta_t s_n(x)}}{c_k} s_k(x) \\ & = \sum_{n=0}^{n+1} \frac{\pair{\theta_t^*(g(t)f(t)^k)}{s_n(x)}}{c_k} s_k(x) \\ & = \sum_{n=0}^{n+1} \left[\frac{\pair{g’(t)f(t)^k + kg(t)f(t)^{k-1}f’(t)}{s_n(x)}}{c_k}\right] s_k(x) \\ & = \sum_{n=0}^{n+1} \left[\frac{\pair{g’(t)}{f(t)^k s_n(x)} + \pair{kg(t)f’(t)}{f(t)^{k-1} s_n(x)}}{c_k}\right] s_k(x) \\ & = \sum_{n=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}}\pair{g’(t)}{s_{n-k}(x)} + \frac{kc_n}{c_{n-k+1}c_k}\pair{g(t)f’(t)}{s_{n-k+1}(x)}\right] s_k(x) \end{align}\] |\square|

Lemma (id:golc): A surjective derivation on |\mathscr F| is continuous.

Proof: (click to expand) For any derivation |\partial|, we have |\partial 1 = \partial 1 + \partial 1| so |\partial 1 = 0|. We also have |\partial t^k = kt^{k-1}\partial t|. Since |\deg \partial t^0 = \infty| and |\deg \partial t^k = \deg \partial t + k - 1|, the only way to get something of degree |0| and have |\partial| be surjective is if |\deg \partial t = 0|. In general, if |\deg f = k| then |f(t) = t^k g(t)| where |\deg g = 0|. Therefore, |\deg \partial f(t) = \deg (t^k \partial g(t) + kt^{k-1}g(t)\partial t)| and the right-hand side has degree |k-1| implying |\partial| is continuous. |\square|

Theorem (id:gxkw): A surjective derivation on |\mathscr F| is adjoint to an umbral shift and vice versa.

Proof: (click to expand)

For the reverse direction, since |f| is a delta series in the previous theorem and |\theta^* f(t)^k = kf(t)^{k-1}|, |\pseq{\theta^* f(t)^k}{k}| is a pseudobasis and so |\theta^*| is surjective.

For the forward direction, if |\partial| is a surjective derivation, then it is an adjoint of a linear operator on |P| because it is continuous. Generally, we have |\partial f(t) = f’(t)\partial t| so we can solve |f’(t) = (\partial t)^{-1}| with |f(0) = 0|, i.e. |\deg f = 1|. Then |\partial f(t)^k = kf(t)^{k-1} f’(t)\partial t| and |f’(t) = (\partial t)^{-1}| so we end up with just |\partial f(t)^k = k f(t)^{k-1}| as desired. |\square|

Lemma (id:woxc): If |f| and |g| are delta series, then \[\theta_f^* = (\theta_f^* g(t))\theta_g^* \]

Proof: (click to expand) For any derivation |\partial|, we have |\partial a^k = (\partial a)ka^{k-1}| and |\theta_f^*| is a derivation. Therefore \[ \theta_f^* g(t)^k = (\theta_f^* g(t))kg(t)^{k-1} = (\theta_f^* g(t)) \theta_g^* g(t)^k \] and |g(t)^k| for |k\in \mathbb N| is a pseudobasis so this suffices to show the operators are equal. |\square|

Theorem (id:dxii): If |\theta_f| and |\theta_g| are umbral shifts, then \[ \theta_f = \theta_g \circ (\theta_g^* f(t))^{-1} \]

Proof: (click to expand) \[\begin{align} \pair{t^k}{\theta_f p(x)} & = \pair{\theta_f^* t^k}{p(x)} \\ & = \pair{(\theta_g^* f(t))^{-1}\theta_g^* t^k}{p(x)} \\ & = \pair{\theta_g^* t^k}{(\theta_g^* f(t))^{-1} p(x)} \\ & = \pair{t^k}{\theta_g (\theta_g^* f(t))^{-1} p(x)} \end{align}\] |\square|

Theorem (id:ouma): If |\pseq{p_n(x)}{n}| is associated to |f|, then \[ p_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n} \theta_t(f’(t))^{-1} p_n(x) \]

Proof: This is just the previous theorem applied to |p_n(x)| with |g(t) = t|. |\square|

Lemma (id:xnvj): Let |\theta_f| be the umbral shift for |f|. Then \[ \theta_f^*(h(t)) = h(t) \theta_f - \theta_f h(t) \] for all |h \in \mathscr F|. The left-hand side is the linear operator on |P| induced by the formal power series that is the output of |\theta_f^*|.

Proof: (click to expand) \[\begin{align} \pair{t^k}{\theta_f^*(h(t))x^n) + \theta_f h(t) x^n)} & = \pair{t^k}{\theta_f^*(h(t))x^n)} + \pair{t^k}{\theta_f h(t) x^n)} \\ & = \pair{\theta_f^*(h(t)) t^k}{x^n)} + \pair{h(t)\theta_f^*(t^k)}{x^n)} \\ & = \pair{\theta_f^*(h(t)) t^k + h(t)\theta_f^*(t^k)}{x^n)} \\ & = \pair{\theta_f^*(h(t) t^k)}{x^n)} \\ & = \pair{t^k}{h(t) \theta_f x^n)} \end{align}\] |\square|

This lemma shows that no umbral shift has the form |g(t)| for a formal power series |g|.

Theorem (id:omlq): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. Then if |\theta_f| is the umbral shift for |f|, \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]

Proof: (click to expand) We have that |\pseq{g(t)s_n(x)}{n}| is associated to |f| leading to \[ g(t)s_n(x) = \frac{c_{n+1}}{(n+1)c_n} \theta_f g(t)s_{n+1}(x) \] The previous lemma leads to \[\begin{align} s_n(x) & = \frac{c_{n+1}}{(n+1)c_n} g(t)^{-1}\theta_f g(t)s_{n+1}(x) \\ & = \frac{c_{n+1}}{(n+1)c_n} g(t)^{-1}(g(t)\theta_f - \theta_f^*(g(t)))s_{n+1}(x) \\ & = \frac{c_{n+1}}{(n+1)c_n} (\theta_f - g(t)^{-1}\theta_f^*(g(t)))s_{n+1}(x) \\ & = \frac{c_{n+1}}{(n+1)c_n} (\theta_f + g(t)\theta_f^*(g(t)^{-1}))s_{n+1}(x) \end{align}\] where the final equality comes from \[ 0 = \theta_f^*(1) = \theta_f^*(g(t)g(t)^{-1}) = g(t)^{-1}\theta_f^*(g(t)) + g(t)\theta_f^*(g(t)^{-1}) \] |\square|

Theorem (id:kusb): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}\left(\theta_t - \frac{g’(t)}{g(t)}\right) \frac{1}{f’(t)} s_n(x) \]

Proof: (click to expand)

Starting from the previous theorem \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]

Theorem (id:dxii) gives |\theta_f = \theta_t f’(t)^{-1}|. Since |\theta_f^* h(t) = h’(t)\theta_f^* t| for any |h|, we first note that using theorem (id:oqyq), |1 = \theta_f^* f(t) = f’(t) \theta_f^* t| or |\theta_f^* t = f’(t)^{-1}|. Then \[ g(t)\theta_f^*(g(t)^{-1}) = -\frac{g(t) g’(t)}{g(t)^2} \theta_f^* t = -\frac{g’(t)}{g(t)} \theta_f^* t = -\frac{g’(t)}{g(t)} f’(t)^{-1} \] |\square|

Theorem (id:vxhh): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. If \[ T = \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{f(t)}{f’(t)} = \left(xD - \frac{tg’(t)}{g(t)}\right)\frac{f(t)}{tf’(t)} \] then \[ Ts_n(x) = ns_n(x) \] In other words, |s_n(x)| is an eigenfunction for |T| with eigenvalue |n|.

Proof: (click to expand) The equality for |T| just involves inserting a |t/t| in the middle. \[ Ts_n(x) = \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{1}{f’(t)}f(t)s_n(x) = \frac{c_n}{c_{n-1}}\left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{1}{f’(t)}s_{n-1}(x) = n s_n(x) \] where theorem (id:kusb) and theorem (id:hvdt) have been used. |\square|

Transfer Formulas

Theorem (Transfer Formula): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \] for all |n \in \mathbb N|.

Proof: (click to expand)

We verify that the right-hand side meets the conditions of theorem (id:cutg). Condition (2) is easily verified: \[\begin{align} f(t) p_n(x) & = f(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \\ & = f’(t)\left(\frac{t}{f(t)}\right)^n t x^n \\ & = \frac{c_n}{c_{n-1}} f’(t)\left(\frac{t}{f(t)}\right)^n x^{n-1} \\ & = \frac{c_n}{c_{n-1}} p_{n-1}(x) \end{align}\]

For condition (1), we start with a small trick by writing |f’(t) = [t(f(t)/t)]’|. \[\begin{align} \bigpair{t^0}{f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n} & = \bigpair{\left(t\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \\ & = \bigpair{\left(\frac{t}{f(t)}\right)^n + t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \tag{product rule} \\ & = \bigpair{\left(\frac{t}{f(t)}\right)^n}{x^n} + \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \end{align}\]

We proceed from there by cases. In the |n = 0| case, we have \[ \bigpair{t^0}{x^0} + \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)}{x^0} = \pair{t^0}{x^0} = c_0 \]

For the |n > 0| case, to simplify the expressions we’ll show that \[ \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} = -\bigpair{\left(\frac{t}{f(t)}\right)^n}{x^n} \] We proceed as follows \[\begin{align} \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} & = \bigpair{\left(\frac{f(t)}{t}\right)'\left(\frac{f(t)}{t}\right)^{-n-1}}{tx^n} \\ & = -\frac{1}{n}\bigpair{\left[\left(\frac{f(t)}{t}\right)^{-n}\right]'}{tx^n} \\ & = -\frac{1}{n}\bigpair{\theta_t^*\left[\left(\frac{f(t)}{t}\right)^{-n}\right]}{tx^n} \\ & = -\frac{1}{n}\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{\theta_t tx^n} \\ & = -\frac{1}{n}\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{xDx^n} \\ & = -\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{x^n} \end{align}\] |\square|

Theorem (Transfer Formula, alternate form): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = \frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1} \] for all |n \geq 1|.

Proof: (click to expand) \[\begin{align} \bigpair{f(t)^k}{\frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1}} & = \frac{c_n}{nc_{n-1}}\bigpair{\theta_t^*[f(t)^k]}{\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\ & = \frac{kc_n}{nc_{n-1}}\bigpair{f(t)^{k-1}f’(t)}{\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\ & = \frac{kc_n}{nc_{n-1}}\bigpair{f(t)^{k-1}}{f’(t)\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\ & = \frac{k}{n}\bigpair{f(t)^{k-1}}{\frac{c_n}{c_{n-1}}p_{n-1}(x)} \tag{Transfer Formula} \\ & = \frac{k}{n}\pair{f(t)^{k-1}}{f(t)p_n(x)} \tag{theorem (id:cutg)} \\ & = \frac{k}{n}\pair{f(t)^k}{p_n(x)} \\ & = \frac{k}{n}c_n\delta_k^n \\ & = c_n\delta_k^n \\ & = \pair{f(t)^k}{p_n(x)} \end{align}\] |\square|

Corollary (id:hcem): Let |\pseq{p_n(x)}{n}| is associated to |f| and |\pseq{q_n(x)}{n}| is associated to |gf| with |g| invertible. Then \[ q_n(x) = \theta_t g(t)^{-n} \theta_t^{-1} p_n(x) \] where |\theta_t^{-1} x^{n+1} = (c_{n+1}/((n+1)c_n))x^n| and |\theta_t^{-1} 1 = 0|.

Proof: Just write |q_n(x)| using the alternate form of the Transfer Formula. |\square|

Theorem (id:jyhq): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|, and let |h| and |l| be invertible series. Then the sequence |r_n(x) = h(t)l(t)^n s_n(x)| is Sheffer for \[ \left(\frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)}l(t)g(t), l(t)^{-1} f(t) \right) \]

Proof: (click to expand) Apply |g(t)| to both sides getting \[ g(t)r_n(x) = h(t)l(t)^n g(t)s_n(x) \] and noting that |\pseq{g(t)s_n(x)}{n}| is associated to |f| and applying the Transfer Formula, we get: \[\begin{align} g(t)r_n(x) & = h(t)l(t)^n f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \\ & = h(t)\frac{f’(t)}{l(t)} l(t)^{n+1} \left(\frac{t}{f(t)}\right)^{n+1} x^n \\ & = h(t)\frac{f’(t)}{l(t)} \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\ & = h(t)\frac{f’(t)}{l(t)} \frac{[l(t)^{-1} f(t)]'}{[l(t)^{-1} f(t)]'} \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\ & = \frac{h(t)f’(t)}{[l(t)^{-1} f(t)]'l(t)} [l(t)^{-1} f(t)]' \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\ \end{align} \] So we get \[ \frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)} l(t)g(t)r_n(x) = [l(t)^{-1} f(t)]' \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \] which by the Transfer Formula means these are associated to |l(t)^{-1}f(t)| and thus |\pseq{r_n(x)}{n}| is Sheffer for \[ \left(\frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)}l(t)g(t), l(t)^{-1} f(t) \right) \] |\square|

Umbral Composition and Transfer Operators

Let |\pseq{p_n(x)}{n}| be the associated sequence to |f|. The transfer or umbral2 operator for |\pseq{p_n(x)}{n}| (or |f|) is the linear operator |\lambda_f| on |P| defined by \[ \lambda_f x^n = p_n(x) \] This implies the adjoint operator is \[ \lambda_f^* g(t) = \sum_{k=0}^\infty \frac{\pair{g(t)}{p_k(x)}}{c_k} t^k \]

Lemma (id:oony): A |\mathbb K|-algebra homomorphism of |\mathscr F| is an automorphism if and only if it is preserves degree.

Proof: (click to expand)

For the forward direction, assume |T| is an automorphism. Let |f(t) = T^{-1}(t)| with |\deg f = k| so |f(t) = t^kg(t)|. |k > 0| since |\mathbb K|-algebra homomorphisms send constants to constants but then |T(f(t)) = t = T(t)^k T(g(t))| and the degrees can only line up if |k = 1| and |\deg T(t) = 1|. |T| thus can’t reduce degree and by the same logic neither can |T^{-1}|, so |T| must preserve degree.

For the reverse direction, a degree preserving linear operator is continuous. We have that |\pseq{T(t)^k}{k}| is a pseudobasis, so for any |f \in \mathscr F|, we can write \[ f(t) = \sum_{k=0}^\infty a_k T(t)^k = T\left(\sum_{k=0}^\infty a_k t^k\right) \] so |g(t) = \sum_{k=0}^\infty a_k t^k| satisfies |f = T(g)| and for every |f| we can find such a |g| making |T| surjective. The uniqueness of the |\pseq{a_k}{k}| implies |T| is injective and thus bijective. For abstract nonsense reasons this is enough for it to be an automorphism. |\square|

Lemma (id:lfum): If |T| is a continuous |\mathbb K|-algebra homomorphism on |\mathscr F| and |f, g \in \mathscr F| with |\deg f > 0|, then |T(g(t)) = g(T(t))| and |(Tg)(f(t)) = T(g(f(t)))|.

Proof: (click to expand) \[\begin{align} T(g(t)) & = T\left(\sum_{k=0}^\infty g_k t^k\right) \\ & = \sum_{k=0}^\infty g_k T(t)^k) \\ & = g(T(t)) \end{align}\] and |(Tg)(f(t)) = g(T(f(t))) = T(g(f(t)))|. |\square|

Theorem (id:stji): A linear operator |\lambda| on |P| is the transfer operator for |f \in \mathscr F| if and only if its adjoint |\lambda^*| is a |\mathbb K|-algebra automorphism of |\mathscr F| for which |\lambda^ f(t) = t|. This makes |\lambda_f(g(t)) = g(\bar f(t))|.*

Proof: (click to expand)

For the forward direction, if |\lambda| is a transfer operator for |f| then we immediately get \[ \lambda^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{p_k(x)}}{c_k} t^k = \sum_{k=0}^\infty \delta_1^k t^k = t \] and \[\begin{align} \pair{\lambda^*(g(t)h(t))}{x^n} & = \pair{g(t)h(t)}{\lambda x^n} \\ & = \pair{g(t)h(t)}{p_n(x)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{g(t)}{p_i(x)}\pair{h(t)}{p_j(x)} \tag{theorem (id:jtbs)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{\lambda^* g(t)}{x^i}\pair{\lambda^* h(t)}{x^j} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{\lambda^* g(t)}{x^i}\pair{\lambda^* h(t)}{x^j} \\ & = \pair{(\lambda^* g(t))(\lambda^* h(t))}{x^n} \end{align}\]

For the reverse direction, \[\begin{align} \pair{f(t)^k}{\lambda x^n} & = \pair{\lambda^*(f(t)^k)}{x^n} \\ & = \pair{\lambda^*(f(t))^k}{x^n} \\ & = \pair{t^k}{x^n} \\ & = c_n \delta_k^n \end{align}\] so |\lambda x^n| has the characteristic property of |p_n(x)|. |\square|

Corollary (id:chiw): A continuous |\mathbb K|-algebra automorphism on |\mathscr F| is the adjoint of a transfer operator.

Proof: By theorem (id:fdui) a continuous linear operator is adjoint to some linear operator on |P|, and being an automorphism there is some |f \in \mathscr F| that gets mapped to |t| by the automorphism so the previous theorem applies. |\square|

Summarizing some results of this form: There’s a bijection between continuous linear operators on |\mathscr F| and linear operators on |P|. Further, there’s a bijection between continuous surjective derivations on |\mathscr F| and umbral shifts, and a bijection between continuous |\mathbb K|-algebra automorphisms on |\mathscr F| and transfer operators.

Corollary (id:ocfq): Transfer operators form a group with |(\lambda_f^*){-1} = \lambda_{\bar f}^*|, |\lambda_f^* \circ \lambda_g^* = \lambda_{g\circ f}^*|.

Proof: This readily follows from |\lambda_f^* g(t) = g(\bar f(t))|. |\square|

Theorem (id:ydci):

  1. An transfer operator maps associated sequences to associated sequences.
  2. If |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| are associated to |f| and |g| respectively, and |\lambda| is a linear operator which maps |p_n(x)| to |q_n(x)|, then |\lambda^* g(t) = f(t)|.
  3. Additionally, |\lambda| is a transfer operator.
Proof: (click to expand)

For all of the following let |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| be associated to |f| and |g| respectively and |\lambda_f|, |\lambda_g| be the transfer operators.

For 1, \[ \pair{(\lambda_g^*)^{-1}(g(t)^k)}{\lambda_g q_n(x)} = \pair{\lambda_g(\lambda_g^*)^{-1}(g(t)^k)}{q_n(x)} = \pair{g(t)^k}{q_n(x)} = c_n \delta_k^n \] so |\lambda_g q_n(x)| is associated to |(\lambda_g^*)^{-1}(g(t))|.

For 2, \[ \pair{\lambda^*(g(t))}{p_n(x)} = \pair{g(t)}{\lambda p_n(x)} = \pair{g(t)}{q_n(x)} = c_n \delta_1^n = \pair{f(t)}{p_n(x)} \]

For 3, by lemma (id:lfum), we can just precompose with |\bar f| to get |\lambda(g(\bar f(t))) = t| and do the same logic as 2 with |t^k| and |x^n| implying that |\lambda| is a transfer operator for |g(\bar f(t))|. |\square|

Let |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| be two polynomial sequences. The umbral composition of |q| with |p| is written and defined as \[ \ucomp{q}{p} = \sum_{k=0}^n q_{n,k}p_k(x) \] If |\lambda| is the transfer operator for |\pseq{p_n(x)}{n}|, then we have \[ \ucomp{q}{p} = \lambda q_n(x) \]

Theorem (id:xvse): If |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| are associated to |f| and |g| respectively, then |\pseq{\ucomp{q}{p}}{n}| is associated to |g(f(t))|.

Proof: (click to expand) \[\begin{align} \pair{g(f(t))^k}{\ucomp{q}{p}} & = \pair{g(f(t))^k}{\lambda_f q_n(x)} \\ & = \pair{\lambda_f^*(g(f(t)))^k}{q_n(x)} \\ & = \pair{g(\lambda_f^*(f(t)))^k}{q_n(x)} \\ & = \pair{g(t)^k}{q_n(x)} \\ & = c_n \delta_k^n \\ \end{align}\] |\square|

Corollary (id:xajh): Umbral composition makes the set of associated sequences into a group.

Proof: It follows the group structure of transfer operators. |\square|

A Sheffer operator is the linear operator |\mu_{g,f}| defined by |\mu_{g,f}x^n = s_n(x)| where |\pseq{s_n(x)}{n}| is Sheffer for |(g,f)|. By considering the associated sequence induced by a Sheffer sequence, i.e. |\pseq{g(t)s_n(x)}{n}|, we readily get |\mu_{g,f} = g(t)^{-1}\lambda_f| so Sheffer operators can be reduced to transfer operators.

Theorem (id:pnci): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and let |\pseq{r_n(x)}{n}| be Sheffer for |(h, l)|. Then |\ucomp{r}{s}| is Sheffer for the pair |(g(t)h(f(t)), l(f(t)))|.

Proof: (click to expand) \[\begin{align} \pair{g(t)h(f(t))l(f(t))^k}{\ucomp{r}{s}} & = \pair{g(t)h(f(t))l(f(t))^k}{\mu_{g,f} r_n(x)} \\ & = \pair{h(f(t))l(f(t))^k}{g(t)\mu_{g,f} r_n(x)} \\ & = \pair{h(f(t))l(f(t))^k}{\lambda_f r_n(x)} \\ & = \pair{\lambda_f^*(h(f(t))l(f(t))^k)}{r_n(x)} \\ & = \pair{h(\lambda_f^*(f(t)))l(\lambda_f^*(f(t)))^k)}{r_n(x)} \\ & = \pair{h(t)l(t)^k)}{r_n(x)} \\ & = c_n \delta_k^n \end{align}\] |\square|

Corollary (id:fony): \[ \pair{h(t)}{\mu_{g,f} q_n(x)} = \pair{\mu_{g,f}^*(h(t))}{q_n(x)} = \pair{g(\bar f(t))^{-1} h(\bar f(t))}{q_n(x)} \]

Proof: Immediate from definition of |\mu_{g,f}| and theorem (id:stji). |\square|

Given two polynomial sequences |\pseq{r_n(x)}{n}| and |\pseq{s_n(x)}{n}| related by \[ r_n(x) = \sum_{k=0}^n a_{n,k} s_k(x) \] the connection-constants problem is to determine the constants |a_{n,k}|. When |\pseq{r_n(x)}{n}| and |\pseq{s_n(x)}{n}| are Sheffer for given formal power series pairs, we can solve this problem as follows.

Theorem (id:hqtd): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and |\pseq{r_n(x)}{n}| be Sheffer for |(h, l)|. If \[ r_n(x) = \sum_{k=0}^n a_{n,k} s_k(x) \] then the sequence |t_n(x) = \sum_{k=0}^n a_{n,k} x^k| is Sheffer for \[ \left(\frac{h(\bar f(t))}{g(\bar f(t))}, l(\bar f(t)) \right) \]

Proof: (click to expand) Clearly, |r_n(x) = \ucomp{t}{s}|, so we just apply <a href=“#pnci>theorem (id:pnci) and solve for the |u, v \in \mathscr F| such that |\pseq{t_n(x)}{n}| is Sheffer for |(u, v)|. |\square|

Corollary (id:ixkb): Let |\pseq{p_n(x)}{n}| be associated to |f| and |\pseq{q_n(x)}{n}| be associated to |l| and \[ q_n(x) = \sum_{k=0}^n a_{n,k} p_k(x) \] then |t_n(x) = \sum_{k=0}^n a_{n,k} x^k| is associated to |l(\bar f(t))|.

Proof: Immediate from the previous theorem. |\square|

Transfer operators give us a concise proof of the Lagrange Inversion Formula used for the compositional inverse. The usual formula would arise from |g(t) = t| in the below.

Corollary (Lagrange Inversion Formula): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \] Of course, |\pair{g(\bar f(t))}{x^0} = \pair{g(t)}{x^0}| since |\deg \bar f = 1|.

Proof: (click to expand)

First we characterize the action of |\lambda_f^*|. |\lambda_f^*(g(f(t))) = g(\lambda_f^*(f(t))) = g(t)| so |\lambda_f^*(g(t)) = g(\bar f(t))|.

We conclude with a use of the alternate form of the Transfer Formula with |\pseq{p_n(x)}{n}| being associated to |f|. \[\begin{align} \pair{g(\bar f(t))}{x^n} & = \pair{\lambda_f^*(g(t))}{x^n} \\ & = \pair{g(t)}{\lambda_f x^n} \\ & = \pair{g(t)}{p_n(x)} \\ & = \bigpair{g(t)}{\theta_t\left(\frac{t}{f(t)}\right)^n x^{n-1}} \tag{Transfer Formula, alternate form} \\ & = \bigpair{\theta_t^* g(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \\ & = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \end{align}\] |\square|

Corollary (Lagrange Inversion Formula, alternate form): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \]

Proof: Do the same proof as the previous corollary just with the first form of the Transfer Formula. |\square|

Corollary (Lagrange Inversion Formula, Hermite form): Let |f, h \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \bigpair{\frac{th(\bar f(t))}{\bar f(t)f’(\bar f(t))}}{x^n} = \bigpair{h(t)\left(\frac{t}{f(t)}\right)^n}{x^n} \]

Proof: Apply the previous corollary with |g(t) = h(t)\frac{f(t)}{tf’(t)}|. |\square|

Example: Chebyshev Polynomials

While the “classical” umbral calculus generally used |c_n = n!|, one interesting (orthogonal) polynomial sequence that benefits from the extra flexibility is Chebyshev polynomials where we’ll use |c_n = (-1)^n|. (This can be viewed as a special case of Gegenbauer polynomials which use |c_n = {-\lambda \choose n}^{-1}| which reduces to the Chebyshev case when |\lambda = 1|.)

The book “The Umbral Calculus” mentioned in the introduction primarily covers the “classical” case and has many examples of those. It also covers the “non-classical” case as well albeit as a bit of a second thought.

|tx^n = -x^{n-1}| so |tp(x) = -x^{-1} p(x)|

|\pseq{T_n(x)}{n}| is Sheffer for |(g, f)| where |g(t) = (1-t^2)^{-2}| and |f(t) = \frac{\sqrt{1-t^2} - 1}{t} = \frac{-t}{1 + \sqrt{1 - t^2}}|. |\bar f(t) = \frac{-2t}{1+t^2}| and |f’(t) = \frac{-1}{\sqrt{1-t^2}(1 + \sqrt{1 - t^2})} = \frac{f(t)}{t\sqrt{1-t^2}}|.

|f(t)s_n(x) = \frac{c_n}{c_{n-1}}s_n(x)| from theorem (id:hvdt) gives the recurrence \[\frac{c_n}{c_{n+1}ts_{n+1}(x) + \frac{c_n}{c_{n-1}}ts_{n-1}(x)} + 2s_n(x) = 0 \] for any Sheffer sequence with |f(t)| as its delta series. For the Chebyshev polynomials, this simplifies to \[ 2xT_n(x) + T_{n+1}(x) + T_{n-1}(x) = 0 \]

|\theta_t x^n = -(n+1)x^{n+1}| which we can compute from |\theta_t t = xD| which takes the form |\theta_t t x^n = -\theta_t x^{n-1} = nx^n|. This leads to |\theta_t = -x(1 + xD)|.

From theorem (id:vxhh), we get |TT_n(x) = nT_n(x)| where

\[ T = \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{f(t)}{f’(t)} = \left(xD - \frac{tg’(t)}{g(t)}\right)\frac{f(t)}{tf’(t)} \]

This leads to |nT_n(x) = (xD - (1-t^2)^{-1})\sqrt{1-t^2}T_n(x)|.

Summary

Defining property of |\pseq{s_n(x)}{n}| being Sheffer for |(g, f)| where |g| is an invertible series and |f| is a delta series: \[ \pair{g(t)f(t)^k}{s_n(x)} = \pair{t^k}{x^n} = c_n \delta_k^n \]

If |g(t) = 1|, then we say |\pseq{s_n(x)}{n}| is the associated sequence for |f|, and usually we’ll use |\pseq{p_n(x)}{n}| instead.

If |f(t) = t|, then we say |\pseq{s_n(x)}{n}| is the Appell sequence for |g|.

Formal power series as operators \[ t^k x^n = \frac{c_n}{c_{n-k}} x^{n-k} \] and generally, \[f(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k}} f_k x^{n-k} \]

Given a linear operator |\mu| on |P|, the adjoint |\mu^*| is a linear operator on |\mathscr F| characterized by: \[ \pair{\mu^* f(t)}{p(x)} = \pair{f(t)}{\mu p(x)} \]

It’s adjoint expansion is: \[ \mu^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{\mu x^k}}{c_k} t^k \]

This applies generally, but, in particular for umbral shifts and transfer operators.

Theorem (id:ahdu): The adjoint to a linear operator on |P| is continuous.

Theorem (id:fdui): If |T| is a continuous linear operator on |\mathscr F|, then there exists a linear operator |\mu| on |P| such that |T = \mu^*|.

Theorem (Generating Function): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t)) = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \] for all |y \in \mathbb K|.*

Theorem (Sheffer Identity): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if \[ \varepsilon_y(t) s_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) s_j(x) \] for all |y \in \mathbb K| where |\pseq{p_n(x)}{n}| is associated to |f|.

Theorem (id:cutg): A sequence |\pseq{p_n(x)}{n}| is the associated sequence for |f| if and only if 1) |\pair{t^0}{p_n(x)} = c_0 \delta_n^0| for all |n \in \mathbb N|, and 2) |f(t) p_n(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x)| for all |n \in \mathbb N|.

Theorem (Transfer Formula): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \] for all |n \in \mathbb N|.

Theorem (Transfer Formula, alternate form): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = \frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1} \] for all |n \geq 1|.

Theorem (id:qqes): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if |\pseq{g(t)s_n(x)}{n}| is the associated sequence for |f|.

Theorem (id:hvdt): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if |f(t) s_n(x) = \frac{c_n}{c_{n-1}}s_{n-1}(x)|.

Theorem (id:omlq): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. Then if |\theta_f| is the umbral shift for |f|, \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]

Theorem (Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for any |h \in \mathscr F|, \[ h(t) = \sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k \]

Corollary (Polynomial Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for a |p \in P|, \[ p(x) = \sum_{n=0}^\infty \frac{\pair{g(t)f(t)^n}{p(x)}}{c_n} s_n(x) \]

Theorem (id:stji): A linear operator |\lambda| on |P| is the transfer operator for |f \in \mathscr F| if and only if its adjoint |\lambda^*| is a |\mathbb K|-algebra automorphism of |\mathscr F| for which |\lambda^ f(t) = t|. This makes |\lambda_f(g(t)) = g(\bar f(t))|.*

Theorem (id:pnci): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and let |\pseq{r_n(x)}{n}| be Sheffer for |(h, l)|. Then |\ucomp{r}{s}| is Sheffer for the pair |(g(t)h(f(t)), l(f(t)))|.

Corollary (id:fony): \[ \pair{h(t)}{\mu_{g,f} q_n(x)} = \pair{\mu_{g,f}^*(h(t))}{q_n(x)} = \pair{g(\bar f(t))^{-1} h(\bar f(t))}{q_n(x)} \]

Theorem (id:kscz): If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)|, then \[ \theta_t s_n(x) = \sum_{k=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}} \pair{g’(t)}{s_{n-k}(x)} + \frac{kc_n}{c_k c_{n-k+1}} \pair{g(t)f’(t)}{s_{n-k+1}(x)} \right] s_k(x) \]

Corollary (id:tvdx): \[ ts_n(x) = \sum_{k=0}^{n-1} \frac{c_n}{c_k c_{n-k}} \pair{t}{p_{n-k}(x)} s_k(x) \]

Theorem (Conjugate Representation): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ s_n(x) = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} x^k \]

Theorem (Multiplication Theorem): Let |\pseq{s_n(x)}{n}| be Appell for |g|, then \[ s_n(\alpha x) = \alpha^n \frac{g(t)}{g(t/\alpha)} s_n(x) \] for |\alpha\neq 0|.

Corollary (Lagrange Inversion Formula): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \] Of course, |\pair{g(\bar f(t))}{x^0} = \pair{g(t)}{x^0}| since |\deg \bar f = 1|.

Corollary (Lagrange Inversion Formula, alternate form): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \]

Corollary (Lagrange Inversion Formula, Hermite form): Let |f, h \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \bigpair{\frac{th(\bar f(t))}{\bar f(t)f’(\bar f(t))}}{x^n} = \bigpair{h(t)\left(\frac{t}{f(t)}\right)^n}{x^n} \]


  1. This isn’t a basis because we’re allowing countably infinite sums of the elements of the pseudobasis, whereas for a basis, even with infinite elements, we’d be claiming that every element is a finite sum of the basis elements.↩︎

  2. As far as I can tell, “umbral operator” is used in the “classical” |c_n = n!| case while “transfer operator” is more general. I’m sure people use “umbral operator” generally too, though.↩︎

Introduction

I want to talk about one of the many pretty areas of number theory. This involves the notion of an arithmetic function and related concepts. A few relatively simple concepts will allow us to produce a variety of useful functions and theorems. This provides only a glimpse of the start of the field of analytic number theory, though many of these techniques are used in other places as we’ll also start to see.

(See the end for a summary of identities and results.)

Prelude

As some notation, I’ll write |\mathbb N_+| for the set of positive naturals, and |\mathbb P| for the set of primes. |\mathbb N| will contain |0|. Slightly atypically, I’ll write |[n]| for the set of numbers from |1| to |n| inclusive, i.e. |a \in [n]| if and only if |1 \leq a \leq n|.

I find that the easiest way to see results in number theory is to view a positive natural number as a multiset of primes which is uniquely given by factorization. Coprime numbers are ones where these multisets are disjoint. Multiplication unions the multisets. The greatest common divisor is multiset intersection. |n| divides |m| if and only if |n| corresponds to a sub-multiset of |m|, in which case |m/n| corresponds to the multiset difference. The multiplicity of an element of a multiset is the number of occurrences. For a multiset |P|, |\mathrm{dom}(P)| is the set of elements of the multiset |P|, i.e. those with multiplicity greater than |0|. For a finite multiset |P|, |\vert P\vert| will be the sum of the multiplicities of the distinct elements, i.e. the number of elements (with duplicates) in the multiset.

We can represent a multiset of primes as a function |\mathbb P \to \mathbb N| which maps an element to its multiplicity. A finite multiset would then be such a function that is |0| at all but finitely many primes. Alternatively, we can represent the multiset as a partial function |\mathbb P \rightharpoonup \mathbb N_+|. It will be finite when it is defined for only finitely many primes. Equivalently, when it is a finite subset of |\mathbb P\times\mathbb N_+| (which is also a functional relation).

Unique factorization provides a bijection between finite multisets of primes and positive natural numbers. Given a finite multiset |P|, the corresponding positive natural number is |n_P = \prod_{(p, k) \in P} p^k|.

I will refer to this view often in the following.

Arithmetic Functions

An arithmetic function is just a function defined on the positive naturals. Usually, they’ll land in (not necessarily positive) natural numbers, but that isn’t required.

In most cases, we’ll be interested in the specific subclass of multiplicative arithmetic functions. An arithmetic function, |f|, is multiplicative if |f(1) = 1| and |f(ab) = f(a)f(b)| whenever |a| and |b| are coprime. We also have the notion of a completely multiplicative arithmetic function for which |f(ab) = f(a)f(b)| always. Obviously, completely multiplicative functions are multiplicative. Analogously, we also have a notion of (completely) additive where |f(ab) = f(a) + f(b)|. Warning: In other mathematical contexts, “additive” means |f(a+b)=f(a)+f(b)|. An obvious example of a completely additive function being the logarithm. Exponentiating an additive function will produce a multiplicative function.

For an additive function, |f|, we automatically get |f(1) = 0| since |f(1) = f(1\cdot 1) = f(1) + f(1)|.

Lemma: The product of two multiplicative functions |f| and |g| is multiplicative.
Proof: For |a| and |b| coprime, |f(ab)g(ab) = f(a)f(b)g(a)g(b) = f(a)g(a)f(b)g(b)|. |\square|

A parallel statement holds for completely multiplicative functions.

It’s also clear that a completely multiplicative function is entirely determined by its action on prime numbers. Since |p^n| is coprime to |q^n| whenever |p| and |q| are coprime, we see that a multiplicative function is entirely determined by its action on powers of primes. To this end, I’ll often define multiplicative/additive functions by their action on prime powers and completely multiplicative/additive functions by their action on primes.

Multiplicative functions aren’t closed under composition, but we do have that if |f| is completely multiplicative and |g| is multiplicative, then |f \circ g| is multiplicative when that composite makes sense.

Here are some examples. Not all of these will be used in the sequel.

  • The power function |({-})^z| for any |z|, not necessarily an integer, is completely multiplicative.
  • Choosing |z=0| in the previous, we see the constantly one function |\bar 1(n) = 1| is completely multiplicative.
  • The identity function is clearly completely multiplicative and is also the |z=1| case of the above.
  • The Kronecker delta function |\delta(n) = \begin{cases}1, & n = 0 \\ 0, & n \neq 0\end{cases}| is completely multiplicative. Often written |\varepsilon| in this context.
  • Define a multiplicative function via |\mu(p^n) = \begin{cases} -1, & n = 1 \\ 0, & n > 1\end{cases}| where |p| is prime. This is the Möbius function. More holistically, |\mu(n)| is |0| if |n| has any square factors, otherwise |\mu(n) = (-1)^k| where |k| is the number of (distinct) prime factors.
  • Define a completely multiplicative function via |\lambda(p) = -1|. |\lambda(n) = \pm 1| depending on whether there is an even or odd number of prime factors (including duplicates). This function is known as the Liouville function.
  • |\lambda(n) = (-1)^{\Omega(n)}| where |\Omega(n)| is the completely additive function which counts the number of prime factors of |n| including duplicates. |\Omega(n_P) = \vert P\vert|.
  • Define a multiplicative function via |\gamma(p^n) = -1|. |\gamma(n) = \pm 1| depending on whether there is an even or odd number of distinct prime factors.
  • |\gamma(n) = (-1)^{\omega(n)}| where |\omega(n)| is the additive function which counts the number of distinct prime factors of |n|. See Prime omega function. We also see that |\omega(n_P) = \vert\mathrm{dom}(P)\vert|.
  • The completely additive function for |q\in\mathbb P|, |\nu_q(p) = \begin{cases}1,&p=q\\0,&p\neq q\end{cases}| is the p-adic valuation.
  • It follows that the |p|-adic absolute value |\vert r\vert_p = p^{-\nu_p(r)}| is completely multiplicative. It can be characterized on naturals by |\vert p\vert_q = \begin{cases}p^{-1},&p=q\\1,&p\neq q\end{cases}|.
  • |\gcd({-}, k)| for a fixed |k| is multiplicative. Given any multiplicative function |f|, |f \circ \gcd({-},k)| is multiplicative. This essentially “restricts” |f| to only see the prime powers that divide |k|. Viewing the finite multiset of primes |P| as a function |\mathbb P\to\mathbb N|, |f(\gcd(p^n,n_P)) = \begin{cases}f(p^n),&n\leq P(p)\\f(p^{P(p)}),&n>P(p)\end{cases}|.
  • The multiplicative function characterized by |a(p^n) = p(n)| where |p(n)| is the partition function counts the number of abelian groups the given order. That this function is multiplicative is a consequence of the fundamental theorem of finite abelian groups.
  • The Jacobi symbol |\left(\frac{a}{n}\right)| where |a\in\mathbb Z| and |n| is an odd positive integer is a completely multiplicative function with either |a| or |n| fixed. When |n| is an odd prime, it reduces to the Legendre symbol. For |p| an odd prime, we have |(\frac{a}{p}) = a^{\frac{p-1}{2}} \pmod p|. This will always be in |\{-1, 0, 1\}| and can be alternately defined as |\left(\frac{a}{p}\right) = \begin{cases}0,&p\mid a\\1,&p\nmid a\text{ and }\exists x.x^2\equiv a\pmod p\\-1,&\not\exists x.x^2\equiv a\pmod p\end{cases}|. Therefore, |\left(\frac{a}{p}\right)=1| (|=0|) when |a| is a (trivial) quadratic residue mod |p|.
  • An interesting example which is not multiplicative nor additive is the arithmetic derivative. Let |p\in\mathbb P|. Define |\frac{\partial}{\partial p}(n)| via |\frac{\partial}{\partial p}(p) = 1|, |\frac{\partial}{\partial p}(q) = 0| for |q\neq p| and |q\in\mathbb P|, and |\frac{\partial}{\partial p}(nm) = \frac{\partial}{\partial p}(n)m + n\frac{\partial}{\partial p}(m)|. We then have |D_S = \sum_{p\in S}\frac{\partial}{\partial p}| for non-empty |S\subseteq\mathbb P| which satisfies the same product rule identity. This perspective views a natural number (or, more generally, a rational number) as a monomial in infinitely many variables labeled by prime numbers.
  • A Dirichlet character of modulus |m| is, by definition, a completely multiplicative function |\chi| satisfying |\chi(n + m) = \chi(n)| and |\chi(n)| is non-zero if and only if |n| is coprime to |m|. The Jacobi symbol |\left(\frac{({-})}{m}\right)| is a Dirichlet character of modulus |m|. |\bar 1| is the Dirichlet character of modulus |1|.

Dirichlet Series

Given an arithmetic function |f|, we define the Dirichlet series:

\[\mathcal D[f](s) = \sum_{n=1}^\infty \frac{f(n)}{n^s} = \sum_{n=1}^\infty f(n)n^{-s}\]

When |f| is a Dirichlet character, |\chi|, this is referred to as the (Dirichlet) |L|-series of the character, and the analytic continuation is the (Dirichlet) |L|-function and is written |L(s, \chi)|.

We’ll not focus much on when such a series converges. See this section of the above Wikipedia article for more details. Alternatively, we could talk about formal Dirichlet series. We can clearly see that if |s = 0|, then we get the sum |\sum_{n=1}^\infty f(n)| which clearly won’t converge for, say, |f = \bar 1|. We can say that if |f| is asymptotically bounded by |n^k| for some |k|, i.e. |f \in O(n^k)|, then the series will converge absolutely when the real part of |s| is greater than |k+1|. For |\bar 1|, it follows that |\mathcal D[\bar 1](x + iy)| is defined when |x > 1|. We can use analytic continuation to go beyond these limits.

See A Catalog of Interesting Dirichlet Series for a more reference-like listing. Beware differences in notation.

Dirichlet Convolution

Why is this interesting in this context? Let’s consider two arithmetic functions |f| and |g| and multiply their corresponding Dirichlet series. We’ll get:

\[\mathcal D[f](s)\mathcal D[g](s) = \sum_{n=1}^\infty h(n)n^{-s} = \mathcal D[h](s)\]

where now we need to figure out what |h(n)| is. But |h(n)| is going to be the sum of all the terms of the form |f(a)a^{-s}g(b)b^{-s} = f(a)g(b)(ab)^{-s}| where |ab = n|. We can thus write: \[h(n) = \sum_{ab=n} f(a)g(b) = \sum_{d\mid n} f(d)g(n/d)\] We’ll write this more compactly as |h = f \star g| which we’ll call Dirichlet convolution. We have thus shown a convolution theorem of the form \[\mathcal D[f]\mathcal D[g] = \mathcal D[f \star g]\]

The Kronecker delta serves as a unit to this operation which is reflected by |\mathcal D[\delta](s) = 1|.

In the same way we can view a sum of the form |\sum_{a+b=n}f(a)g(b)| that arises in “normal” convolution as a sum along the line |y = n - x|, we can view the sum |\sum_{ab=n}f(a)g(b)| as a sum along a hyperbola of the form |y = n/x|. For all of |\sum_{n=1}^\infty\sum_{k=1}^\infty f(n)g(k)|, |\sum_{n=1}^\infty\sum_{k=1}^n f(k)g(n-k)|, and |\sum_{n=1}^\infty\sum_{k\mid n}f(k)g(n/k)| we’re including |f(a)g(b)| for every |(a,b)\in\mathbb N_+\times\mathbb N_+| in the sum exactly once. The difference is whether we’re grouping the internal sum by rows, diagonals, or hyperbolas. This idea of summing hyperbolas can be expanded to a computational technique for sums of multiplicative functions called the Dirichlet hyperbola method.

Since we will primarily be interested in multiplicative functions, we should check that |f \star g| is a multiplicative function when |f| and |g| are.

Lemma: Assume |a| and |b| are coprime, and |f| and |g| are multiplicative. Then |(f \star g)(ab) = (f \star g)(a)(f \star g)(b)|.

Proof: Since |a| and |b| are coprime, they share no divisors besides |1|. This means every |d| such that |d \mid ab| factors as |d = d_a d_b| where |d_a \mid a| and |d_b \mid b|. More strongly, write |D_n = \{ d \in \mathbb N_+ \mid d \mid n\}|, then for any coprime pair of numbers |i| and |j|, we have |D_{ij} \cong D_i \times D_j| and that every pair |(d_i, d_j) \in D_i \times D_j| are coprime1. Thus,

\[\begin{flalign} (f \star g)(ab) & = \sum_{d \in D_{ab}} f(d)g((ab)/d) \tag{by definition} \\ & = \sum_{(d_a, d_b) \in D_a \times D_b} f(d_a d_b)g((ab)/(d_a d_b)) \tag{via the bijection} \\ & = \sum_{(d_a, d_b) \in D_a \times D_b} f(d_a)f(d_b)g(a/d_a)g(b/d_b) \tag{f and g are multiplicative} \\ & = \sum_{d_a \in D_a} \sum_{d_b \in D_b} f(d_a)f(d_b)g(a/d_a)g(b/d_b) \tag{sum over a Cartesian product} \\ & = \sum_{d_a \in D_a} f(d_a)g(a/d_a) \sum_{d_b \in D_b} f(d_b)g(b/d_b) \tag{undistributing} \\ & = \sum_{d_a \in D_a} f(d_a)g(a/d_a) (f \star g)(b) \tag{by definition} \\ & = (f \star g)(b) \sum_{d_a \in D_a} f(d_a)g(a/d_a) \tag{undistributing} \\ & = (f \star g)(b) (f \star g)(a) \tag{by definition} \\ & = (f \star g)(a) (f \star g)(b) \tag{commutativity of multiplication} \end{flalign}\] |\square|

It is not the case that the Dirichlet convolution of two completely multiplicative functions is completely multiplicative.

We can already start to do some interesting things with this. First, we see that |\mathcal D[\bar 1] = \zeta|, the Riemann zeta function. Now consider |(\bar 1 \star \bar 1)(n) = \sum_{k \mid n} 1 = d(n)|. |d(n)| is the divisor function which counts the number of divisors of |n|. We see that |\mathcal D[d](s) = \zeta(s)^2|. A simple but useful fact is |\zeta(s - z) = \mathcal D[(-)^z](s)|. This directly generalizes the result for |\mathcal D[\bar 1]| and also implies |\mathcal D[\operatorname{id}](s) = \zeta(s - 1)|.

Generalizing in a different way, we get the family of functions |\sigma_k = ({-})^k \star \bar 1|. |\sigma_k(n) = \sum_{d \mid n} d^k|. From the above, we see |\mathcal D[\sigma_k](s) = \zeta(s - k)\zeta(s)|.

Lemma: Given a completely multiplicative function |f|, we get |f(n)(g \star h)(n) = (fg \star fh)(n)|.
Proof: \[\begin{flalign} (fg \star fh)(n) & = \sum_{d \mid n} f(d)g(d)f(n/d)h(n/d) \\ & = \sum_{d \mid n} f(d)f(n/d)g(d)h(n/d) \\ & = \sum_{d \mid n} f(n)g(d)h(n/d) \\ & = f(n)\sum_{d \mid n} g(d)h(n/d) \\ & = f(n)(g \star h)(n) \end{flalign}\] |\square|

As a simple corollary, for a completely multiplicative |f|, |f \star f = f(\bar 1 \star \bar 1) = fd|.

Euler Product Formula

However, the true power of this is unlocked by the following theorem:

Theorem (Euler product formula): Given a multiplicative function |f| which doesn’t grow too fast, e.g. is |O(n^k)| for some |k > 0|, \[\mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s} = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-ns} = \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty f(p^n)p^{-ns}\right) \] where the series converges.

Proof: The last equality is simply using the fact that |f(p^0)p^0 = f(1) = 1| because |f| is multiplicative. The idea for the main part is similar to how we derived Dirichlet convolution. When we start to distribute out the infinite product, each term will correspond to the product of selections of a term from each series. When all but finitely many of those selections select the |1| term, we get |\prod_{(p, k) \in P}f(p^k)(p^k)^{-s}| where |P| is some finite multiset of primes induced by those selections. Therefore, |\prod_{(p, k) \in P}f(p^k)(p^k)^{-s} = f(n_P)n_P^{-s}|. Thus, by unique factorization, |f(n)n^{-s}| for every positive natural occurs in the sum produced by distributing the right-hand side exactly once.

In the case where |P| is not a finite multiset, we’ll have \[ \frac{\prod_{(p, k) \in P}f(p^k)}{\left(\prod_{(p, k) \in P}p^k\right)^s}\]

The denominator of this expression goes to infinity when the real part of |s| is greater than |0|. As long as the numerator doesn’t grow faster than the denominator (perhaps after restricting the real part of |s| to be greater than some bound), then this product goes to |0|. Therefore, the only terms that remain are these corresponding to the Dirichlet series on the left-hand side. |\square|

If we assume |f| is completely multiplicative, we can further simplify Euler’s product formula via the usual sum of a geometric series, |\sum_{n=0}^\infty x^n = (1-x)^{-1}|, to:

\[ \sum_{n=1}^\infty f(n)n^{-s} = \prod_{p \in \mathbb P}\sum_{n=0}^\infty (f(p)p^{-s})^n = \prod_{p \in \mathbb P}(1 - f(p)p^{-s})^{-1} \]

Now let’s put this to work. The first thing we can see is |\zeta(s) = \mathcal D[\bar 1](s) = \prod_{p\in\mathbb P}(1 - p^{-s})^{-1}|. But this lets us write |1/\zeta(s) = \prod_{p\in\mathbb P}(1 - p^{-s})|. If we look for a multiplicative function that would produce the right-hand side, we see that it must send a prime |p| to |-1| and |p^n| for |n > 1| to |0|. In other words, it’s the Möbius function |\mu| we defined before. So |\mathcal D[\mu](s) = 1/\zeta(s)|.

Using |\mathcal D[d](s) = \zeta(s)^2|, we see that \[\begin{flalign} \zeta(s)^2 & = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty p^{-ns}\right)^{-2} \\ & = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty (n+1)p^{-ns}\right)^{-1} \\ & = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty d(p^n)p^{-ns}\right)^{-1} \\ & = \mathcal D[d](s) \end{flalign}\] Therefore, |d(p^n) = n + 1|. This intuitively makes sense because the only divisors of |p^n| are |p^k| for |k = 0, \dots, n|, and for |a| and |b| coprime |d(ab) = \vert D_{ab} \vert = \vert D_a \times D_b\vert = \vert D_a\vert\vert D_b\vert = d(a)d(b)|.

Another result leveraging the theorem is given any multiplicative function |f|, we can define a new multiplicative function via |f^{[k]}(p^n) = \begin{cases}f(p^m), & km = n\textrm{ for }m\in\mathbb N \\ 0, & k \nmid n\end{cases}|.

Lemma: The operation just defined has the property that |\mathcal D[f^{[k]}](s) = \mathcal D[f](ks)|.
Proof: \[\begin{flalign} \mathcal D[f^{[k]}](s) & = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f^{[k]}(p^n)p^{-ns} \\ & = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f^{[k]}(p^{kn})p^{-nks} \\ & = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-nks} \\ & = \mathcal D[f](ks) \end{flalign}\] |\square|

Möbius Inversion

We can write a sum over some function, |f|, of the divisors of a given natural |n| as |(f \star \bar 1)(n) = \sum_{d \mid n} f(d)|. Call this |g(n)|. But then we have |\mathcal D[f \star \bar 1] = \mathcal D[f]\mathcal D[\bar 1] = \mathcal D[f]\zeta| and thus |\mathcal D[f] = \mathcal D[f]\zeta/\zeta = \mathcal D[(f \star \bar 1) \star \mu]|. Therefore, if we only have the sums |g(n) = \sum_{d \mid n} f(d)| for some unknown |f|, we can recover |f| via |f(n) = (g \star \mu)(n) = \sum_{d\mid n}g(d)\mu(n/d)|. This is Möbius inversion.

Formally:

\[g(n) = \sum_{d\mid n} f(d) \iff f(n) = \sum_{d \mid n} \mu(d)g(n/d)\]

As a simple example, we clearly have |\zeta(s)/\zeta(s) = 1 = \mathcal D[\delta](s)| so |\bar 1 \star \mu = \delta| or |\sum_{d \mid n}\mu(d) = 0| for |n > 1| and |1| when |n = 1|.

We also get generalized Möbius inversion via |\delta(n) = \delta(n)n^k = (\mu\star\bar 1)(n)n^k = (({-})^k\mu\star({-})^k)(n)|. Which is to say if |g(n) = \sum_{d\mid n}d^k f(n/d)| then |f(n) = \sum_{d\mid n} \mu(d)d^kg(n/d)|.

By considering logarithms, we also get a multiplicative form of (generalized) Möbius inversion: \[g(n) = \prod_{d\mid n}f(n/d)^{d^k} \iff f(n) = \prod_{d\mid n}g(n/d)^{\mu(d)d^k}\]

Theorem: As another guise of Möbius inversion, given any completely multiplicative function |h|, let |g(m) = \sum_{n=1}^\infty f(mh(n))|. Assuming these sums make sense, we can recover |f(k)| via |f(k) = \sum_{m=1}^\infty \mu(m)g(kh(m))|.

Proof: \[\begin{align} \sum_{m=1}^\infty \mu(m)g(kh(m)) & = \sum_{m=1}^\infty \mu(m)\sum_{n=1}^\infty f(kh(m)h(n)) \\ & = \sum_{N=1}^\infty \sum_{N=mn} \mu(m)f(kh(N)) \\ & = \sum_{N=1}^\infty f(kh(N)) \sum_{N=nm} \mu(m) \\ & = \sum_{N=1}^\infty f(kh(N)) (\mu\star\bar 1)(N) \\ & = \sum_{N=1}^\infty f(kh(N)) \delta(N) \\ & = f(k) \end{align}\] |\square|

This will often show up in the form of |r(x^{1/n})| or |r(x^{1/n})/n|, i.e. with |h(n)=n^{-1}| and |f_x(k) = r(x^k)| or |f_x(k) = kr(x^k)|. Typically, we’ll then be computing |f_x(1) = r(x)|.

Lambert Series

As a brief aside, it’s worth mentioning Lambert Series.

Given an arithmetic function |a|, these are series of the form: \[ \sum_{n=1}^\infty a(n) \frac{x^n}{1-x^n} = \sum_{n=1}^\infty a(n) \sum_{k=1}^\infty x^{kn} = \sum_{n=1}^\infty (a \star \bar 1)(n) x^n \]

This leads to: \[\sum_{n=1}^\infty \mu(n) \frac{x^n}{1-x^n} = x\] and: \[\sum_{n=1}^\infty \varphi(n) \frac{x^n}{1-x^n} = \frac{x}{(1-x)^2}\]

Inclusion-Exclusion

The Möbius and |\zeta| functions can be generalized to incidence algebras where this form is from the incidence algebra induced by the divisibility order2. A notable and relevant example of a Möbius functions for another, closely related, incidence algebra is when we consider the incidence algebra induced by finite multisets with the inclusion ordering. Let |T| be a finite multiset, we get |\mu(T) = \begin{cases}0,&T\text{ has repeated elements}\\(-1)^{\vert T\vert},&T\text{ is a set}\end{cases}|. Since we can view a natural number as a finite multiset of primes, and we can always relabel the elements of a finite multiset with distinct primes, this is equivalent to the Möbius function we’ve been using.

This leads to a nice and compact way of describing the principle of inclusion-exclusion. Let |A| and |S| be (finite) multisets with |S \subseteq A| and assume we have |f| and |g| defined on the set of sub-multisets of |A|. If \[g(A) = \sum_{S\subseteq A} f(S)\] then \[f(A) = \sum_{S\subseteq A}\mu(A\setminus S)g(S)\] and this is Möbius inversion for this notion of Möbius function. We can thus take a different perspective on Möbius inversion. If |P| is a finite multiset of primes, then \[g(n_P) = \sum_{Q\subseteq P}f(n_Q) \iff f(n_P) = \sum_{Q\subseteq P}\mu(P\setminus Q)g(n_Q)\] recalling that |Q\subseteq P \iff n_Q \mid n_P| and |n_{P\setminus Q} = n_P/n_Q| when |Q\subseteq P|.

We get traditional inclusion-exclusion by noting that |\mu(T)=(-1)^{\vert T\vert}| when |T| is a set, i.e. all elements have multiplicity at most |1|. Let |I| be a finite set and assume we have a family of finite sets, |\{T_i\}_{i\in I}|. Write |T = \bigcup_{i\in I}T_i| and define |\bigcap_{i\in\varnothing}T_i = T|.

Define \[f(J) = \left\vert\bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i\right\vert\] for |J\subseteq I|. In particular, |f(I) = 0|. |f(J)| is then the number of elements shared by all |T_i| for |i\notin J| and no |T_j| for |j\in J|. Every |x \in \bigcup_{i\in I}T_i| is thus associated to exactly one such subset of |I|, namely |\{j\in I\mid x\notin T_j\}|. Formally, |x \in \bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i \iff J = \{j\in I\mid x\notin T_j\}| so each |\bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i| is disjoint and \[g(J) = \sum_{S\subseteq J}f(S) = \left\vert\bigcup_{S\subseteq J}\left(\bigcap_{i\in I\setminus S}T_i\setminus\bigcup_{i \in S}T_i\right)\right\vert = \left\vert\bigcap_{i\in I\setminus J}T_i\right\vert \] for |J \subseteq I|. In particular, |g(I) = \vert\bigcup_{i\in I}T_i\vert|.

By the Möbius inversion formula for finite sets, we thus have: \[f(J) = \sum_{S\subseteq J}(-1)^{\vert J\vert - \vert S\vert}g(S)\] which for |J = I| gives: \[ 0 = \sum_{J\subseteq I}(-1)^{\vert I\vert - \vert J\vert}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert = \left\vert\bigcup_{i\in I}T_i\right\vert + \sum_{J\subsetneq I}(-1)^{\vert I\vert - \vert J\vert}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert \] which is equivalent to the more usual form: \[\left\vert\bigcup_{i\in I}T_i\right\vert = \sum_{J\subsetneq I}(-1)^{\vert I\vert - \vert J\vert - 1}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert = \sum_{\varnothing\neq J\subseteq I}(-1)^{\vert J\vert + 1}\left\vert\bigcap_{i\in J}T_i\right\vert \]

|\varphi|

An obvious thing to explore is to apply Möbius inversion to various arithmetic functions. A fairly natural first start is applying Möbius inversion to the identity function. From the above results, we know that this unknown function |\varphi| will satisfy |\mathcal D[\varphi](s) = \zeta(s-1)/\zeta(s) = \mathcal D[\operatorname{id}\star\mu](s)|. We also immediately have the property that |n = \sum_{d \mid n}\varphi(d)|. Using Euler’s product formula we have: \[\begin{flalign} \zeta(s-1)/\zeta(s) & = \prod_{p \in \mathbb P} \frac{1 - p^{-s}}{1 - p^{-s+1}} \\ & = \prod_{p \in \mathbb P} \frac{1 - p^{-s}}{1 - pp^{-s}} \\ & = \prod_{p \in \mathbb P} (1 - p^{-s})\sum_{n=0}^\infty p^n p^{-ns} \\ & = \prod_{p \in \mathbb P} \left(\sum_{n=0}^\infty p^n p^{-ns}\right) - \left(\sum_{n=0}^\infty p^n p^{-s} p^{-ns}\right) \\ & = \prod_{p \in \mathbb P} \left(\sum_{n=0}^\infty p^n p^{-ns}\right) - \left(\sum_{n=0}^\infty p^n p^{-(n + 1)s}\right) \\ & = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty p^n p^{-ns}\right) - \left(\sum_{n=1}^\infty p^{n-1} p^{-ns}\right) \\ & = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty (p^n - p^{n-1}) p^{-ns}\right) \\ & = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty \varphi(p^n) p^{-ns}\right) \\ & = \mathcal D[\varphi](s) \end{flalign}\]

So |\varphi| is the multiplicative function defined by |\varphi(p^n) = p^n - p^{n-1}|. For |p^n|, we can see that this counts the number of positive integers less than or equal to |p^n| which are coprime to |p^n|. There are |p^n| positive integers less than or equal to |p^n|, and every |p|th one is a multiple of |p| so |p^n/p = p^{n-1}| are not coprime to |p^n|. All the remainder are coprime to |p^n| since they don’t have |p| in their prime factorizations and |p^n| only has |p| in its. We need to verify that this interpretation is multiplicative. To be clear, we know that |\varphi| is multiplicative and that this interpretation works for |p^n|. The question is whether |\varphi(n)| for general |n| meets the above description, i.e. whether the number of coprime numbers less than |n| is multiplicative.

Theorem: The number of coprime numbers less than |n| is multiplicative and is equal to |\varphi(n)|.

Proof: |\varphi = \mu\star\operatorname{id}|. We have:

\[\begin{flalign} \varphi(n_P) & = \sum_{d\mid n_P}\mu(d)\frac{n_P}{d} \\ & = \sum_{Q\subseteq P}\mu(Q)\frac{n_P}{n_Q} \\ & = \sum_{Q\subseteq \mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{n_Q} \end{flalign}\]

We can see an inclusion-exclusion pattern. Specifically, let |C_k = \{ c \in [k] \mid \gcd(c, k) = 1\}| be the numbers less than or equal to |k| and coprime to |k|. Let |S_{k,m} = \{ c \in [k] \mid m \mid c\}|. We have |S_{k,a} \cap S_{k,b} = S_{k,\operatorname{lcm}(a,b)}|. Also, when |c \mid k|, then |\vert S_{k,c}\vert = k/c|. |C_{n_P} = [n_P] \setminus \bigcup_{p \in \mathrm{dom}(P)} S_{n_P,p}| because every number not coprime to |n_P| shares some prime factor with it. Applying inclusion-exclusion to the union yields \[\begin{align} \vert C_{n_P}\vert & = n_P - \sum_{\varnothing\neq Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert+1}\left\vert \bigcap_{p\in Q}S_{n_P,p}\right\vert \\ & = n_P + \sum_{\varnothing\neq Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{\prod_{p\in Q}p} \\ & = \sum_{Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{n_Q} \end{align}\] |\square|

Many of you will already have recognized that this is Euler’s totient function.

Combinatorial Species

The book Combinatorial Species and Tree-Like Structures has many examples where Dirichlet convolutions and Möbius inversion come up. A combinatorial species is a functor |\operatorname{Core}(\mathbf{FinSet})\to\mathbf{FinSet}|. Any permutation on a finite set can be decomposed into a collection of cyclic permutations. Let |U| be a finite set of cardinality |n| and |\pi : U \cong U| a permutation of |U|. For any |u\in U|, there is a smallest |k\in\mathbb N_+| such that |\pi^k(u) = u| where |\pi^{k+1} = \pi \circ \pi^k| and |\pi^0 = \operatorname{id}|. The |k| elements |\mathcal O(u)=\{\pi^{i-1}(u)\mid i\in[k]\}| make up a cycle of length |k|, and |\pi| restricted to |U\setminus O(u)| is a permutation on this smaller set. We can just inductively pull out another cycle until we run out of elements. Write |\pi_k| for the number of cycles of length |k| in the permutation |\pi|. We clearly have |n = \sum_{k=1}^\infty k\pi_k| as every cycle has |k| elements in it.

Write |\operatorname{fix}\pi| for the number of fixed points of |\pi|, i.e. the cardinality of the set |\{u\in U\mid \pi(u) = u\}|. Clearly, every element that is fixed by |\pi^k| needs to be in a cycle whose length divides |k|. This leads to the equation:

\[ \operatorname{fix}\pi^k = \sum_{d\mid k} d\pi_d = ((d \mapsto d\pi_d) \star \bar 1)(k)\]

Since |F(\pi^k) = F(\pi)^k| for a combinatorial species |F|, Möbius inversion, as explicitly stated in Proposition 2.2.3 of Combinatorial Species and Tree-Like Structures, leads to:

\[k(F(\pi))_k = \sum_{d\mid k}\mu\left(\frac{k}{d}\right)\operatorname{fix}F(\pi^d) = (\mu\star(d\mapsto \operatorname{fix}F(\pi^d)))(k) \]

If we Dirichlet convolve both sides of this with |\operatorname{id}|, replacing |F(\pi)| with |\beta| as it doesn’t matter that this permutation comes from an action of a species, we get:

\[\sum_{d\mid m} d\beta_d(m/d) = m\sum_{d\mid m} \beta_d = (\varphi\star(d\mapsto \operatorname{fix}\beta^d))(m)\]

This is just using |\varphi = \operatorname{id}\star\mu|. If we choose |m| such that |\beta^m = \operatorname{id}|, then we get |\sum_{d\mid m} \beta_d = \sum_{k=1}^\infty \beta_k| because |\beta_k| will be |0| for all the |k| which don’t divide |m|. This makes the previous equation into equation 2.2 (34) in the book.

Since we know |n = \sum_{k=1}^\infty k\pi_k| for any permutation |\pi|, we also get: \[\vert F([n])\vert = \sum_{k=1}^\infty\sum_{d\mid k}\mu\left(\frac{k}{d}\right)\operatorname{fix}F(\pi^d) = \sum_{k=1}^\infty(\mu\star(d\mapsto\operatorname{fix}F(\pi^d)))(k)\]

These equations give us a way to compute some of these divisor sums by looking at the number fixed points and cycles of the action of species and vice versa. For example, 2.3 (49) is a series of Dirichlet convolutions connected to weighted species.

Example 12 from this book presents a nice and perhaps surprising identity. The core of it can be written as: \[\sum_{k=1}^\infty\ln(1-ax^k) = \sum_{k=1}^\infty\rho_k(a)\ln(1-x^k)\] where |\rho_k(a) = k^{-1}\sum_{d\mid k}\varphi(k/d)a^d|. We can rewrite this definition as the characterization |k\rho_k(a) = (\varphi\star a^{({-})})(k)|. Recalling that |\varphi = \mu \star \operatorname{id}| and |\ln(1-x) = -\sum_{n=1}^\infty x^n/n|, we get the following derivation:

Theorem: \[\sum_{k=1}^\infty\ln(1-ax^k) = \sum_{k=1}^\infty\rho_k(a)\ln(1-x^k)\] where |\rho_k(a) = k^{-1}\sum_{d\mid k}\varphi(k/d)a^d|.

Proof: \[\begin{flalign} \sum_{k=1}^\infty\ln(1-ax^k) & = -\sum_{k=1}^\infty\sum_{n=1}^\infty \frac{a^n x^{nk}}{n} \\ & = -\sum_{n=1}^\infty\sum_{k=1}^\infty \frac{a^n x^{nk}}{n} \\ & = -\sum_{N=1}^\infty\sum_{k\mid N} \frac{a^{N/k} x^N}{N/k} \tag{N=nk} \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N} ka^{N/k} \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\operatorname{id}\star a^{({-})})(N) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\delta\star\operatorname{id}\star a^{({-})})(N) \tag{the trick} \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\bar 1\star\mu\star\operatorname{id}\star a^{({-})})(N) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\bar 1\star\varphi\star a^{({-})})(N) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N}(\varphi\star a^{({-})})(k) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N}k\rho_k(a) \\ & = -\sum_{n=1}^\infty\frac{x^{nk}}{n}\sum_{k=1}^\infty\rho_k(a) \tag{N=nk again} \\ & = \sum_{k=1}^\infty\rho_k(a) \ln(1-x^k) \end{flalign}\] |\square|

Derivative of Dirichlet series

We can easily compute the derivative of a Dirichlet series (assuming sufficiently strong convergence so we can push the differentiation into the sum):

\[\begin{flalign} \mathcal D[f]’(s) & = \frac{d}{ds}\sum_{n=1}^\infty f(n)n^{-s} \\ & = \sum_{n=1}^\infty f(n)\frac{d}{ds}n^{-s} \\ & = \sum_{n=1}^\infty f(n)\frac{d}{ds}e^{-s\ln n} \\ & = \sum_{n=1}^\infty -f(n)\ln n e^{-s\ln n} \\ & = -\sum_{n=1}^\infty f(n)\ln n n^{-s} \\ & = -\mathcal D[f\ln](s) \end{flalign}\]

This leads to the identity |\frac{d}{ds}\ln\mathcal D[f](s) = \mathcal D[f]’ (s)/\mathcal D[f](s) = -\mathcal D[f\ln \star \mu](s)|. For example, we have |-\zeta’(s)/\zeta(s) = \mathcal D[\ln \star \mu](s)|. Using the Euler product formula, we have |\ln\zeta(s) = -\sum_{p\in\mathbb P}\ln(1-p^{-s})|. Differentiating this gives \[\begin{flalign} \frac{d}{ds}\ln\zeta(s) & = -\sum_{p\in\mathbb P} p^{-s}\ln p/(1 - p^{-s}) \\ & = -\sum_{p\in\mathbb P} \sum_{k=1}^\infty \ln p (p^k)^{-s} \\ & = -\sum_{n=1}^\infty \Lambda(n) n^{-s} \\ & = -\mathcal D[\Lambda](s) \end{flalign}\] where |\Lambda(n) = \begin{cases}\ln p,&p\in\mathbb P\land\exists k\in\mathbb N_+.n=p^k \\ 0, & \text{otherwise}\end{cases}|. |\Lambda|, which is not a multiplicative nor an additive function, is known as the von Mangoldt function. Just to write it explicitly, the above implies |\Lambda = \ln \star \mu|, i.e. |\Lambda| is the Möbius inversion of |\ln|. This can be generalized for arbitrary completely multiplicative functions besides |\bar 1| to get |\mathcal D[f]’/\mathcal D[f] = \mathcal D[f\Lambda]|.

We now have multiple perspectives on |\Lambda| which is a kind of “indicator function” for prime powers.

Dirichlet Inverse

Let’s say we’re given an arithmetic function |f|, and we want to find an arithmetic function |g| such that |f \star g = \delta| which we’ll call the Dirichlet inverse of |f|. We immediately get |(f \star g)(1) = f(1)g(1) = 1 = \delta(1)|. So, supposing |f(1)\neq 1|, we can define |g(1) = 1/f(1)|. We then get a recurrence relation for all the remaining values of |g| via: \[0 = (f \star g)(n) = f(1)g(n) + \sum_{d \mid n, d\neq 1} f(d)g(n/d)\] for |n > 1|. Solving for |g(n)|, we have: \[g(n) = -f(1)^{-1}\sum_{d\mid n,d\neq 1}f(d)g(n/d)\] where the right-hand side only requires |g(k)| for |k < n|. If |f| is multiplicative, then |f(1) = 1| and the inverse of |f| exists.

If |f| is completely multiplicative, its Dirichlet inverse is |\mu f|. This follows easily from |f \star \mu f = (\bar 1 \star \mu)f = \delta f = \delta|. As an example, |({-})^z| is completely multiplicative so its inverse is |({-})^z\mu|. Since the inverse of a Dirichlet convolution is the convolution of the inverses, we get |\varphi^{-1}(n) = \sum_{d\mid n}d\mu(d)|. Not to be confused with |\varphi(n) = (\operatorname{id}\star\mu)(n) = \sum_{d\mid n} d\mu(n/d)|.

Less trivially, the inverse of a multiplicative function is also a multiplicative function. We can prove it by complete induction on |\mathbb N_+| using the formula for |g| from above.

Theorem: If |f\star g = \delta|, then |g| is multiplicative when |f| is.

Proof: Let |n = ab| where |a| and |b| are coprime. If |a| (or, symmetrically, |b|) is equal to |1|, then since |g(1) = 1/f(1) = 1|, we have |g(1n) = g(1)g(n) = g(n)|. Now assume neither |a| nor |b| are |1| and, as the induction hypothesis, assume that |g| is multiplicative on all numbers less than |n|. We have: \[\begin{flalign} g(ab) & = -\sum_{d\mid ab,d\neq 1}f(d)g(ab/d) \\ & = -\sum_{d_a \mid a}\sum_{d_b \mid b,d_a d_b \neq 1}f(d_ad_b)g(ab/(d_ad_b)) \\ & = -\sum_{d_a \mid a}\sum_{d_b \mid b,d_a d_b \neq 1}f(d_a)f(d_b)g(a/d_a)g(b/d_b)) \\ & = -\sum_{d_b \mid b,d_b \neq 1}f(d_b)g(a)g(b/d_b)) - \sum_{d_a \mid a,d_a \neq 1}\sum_{d_b \mid b}f(d_a)f(d_b)g(a/d_a)g(b/d_b)) \\ & = -g(a)\sum_{d \mid b,d \neq 1}f(d)g(b/d)) - \sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a)\sum_{d_b \mid b}f(d_b)g(b/d_b)) \\ & = g(a)g(b) - \sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a) (f \star g)(b) \\ & = g(a)g(b) - \delta(b)\sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a) \\ & = g(a)g(b) \end{flalign}\] |\square|

Assuming |f| has a Dirichlet inverse, we also have: \[\mathcal D[f^{-1}](s) = \mathcal D[f](s)^{-1}\] immediately from the convolution theorem.

More Examples

Given a multiplicative function |f|:

\[\begin{align} \mathcal D[f(\gcd({-},n_P))](s) & = \zeta(s)\prod_{(p,k)\in P}(1 - p^{-s})\left(\sum_{n=0}^\infty f(p^{\min(k,n)})p^{-ns}\right) \\ & = \zeta(s)\prod_{(p,k)\in P}(1 - p^{-s})\left(\frac{f(p^k)p^{-(k+1)s}}{1 - p^{-s}} + \sum_{n=0}^k f(p^n)p^{-ns}\right) \end{align}\]

As an example, |\eta(s) = (1 - 2^{1-s})\zeta(s) = \mathcal D[f](s)| where |f(n) = \begin{cases}-1,&n=2\\1,&n\neq 2\end{cases}|.

Alternatively, |f(n) = \mu(\gcd(n, 2))| and we can apply the above formula to see: \[\begin{flalign} \mathcal D[\mu(\gcd({-},2))] & = \zeta(s)(1-2^{-s})\left(\frac{\mu(2)2^{-2s}}{1 - 2^{-s}} + \sum_{n=0}^1 \mu(2^n)2^{-ns}\right) \\ & = \zeta(s)(1-2^{-s})\left(\frac{-2^{-2s}}{1 - 2^{-s}} + 1 - 2^{-s}\right) \\ & = \zeta(s)(-2^{-2s} + (1 - 2^{-s})^2) \\ & = \zeta(s)(1 - 2^{1-s}) \end{flalign}\]

|\lambda| and |\gamma|

Recalling, |\lambda| is completely multiplicative and is characterized by |\lambda(p) = -1|.

We can show that |\mathcal D[\lambda](s) = \zeta(2s)/\zeta(s)| which is equivalent to saying |\bar 1^{(2)} \star \mu = \lambda| or |\lambda\star\bar 1 = \bar 1^{(2)}|.

\[\begin{flalign} \zeta(2s)/\zeta(s) & = \prod_{p\in\mathbb P} \frac{1-p^{-s}}{1-(p^{-s})^2} \\ & = \prod_{p\in\mathbb P} \frac{1-p^{-s}}{(1-p^{-s})(1+p^{-s})} \\ & = \prod_{p\in\mathbb P} (1 + p^{-s})^{-1} \\ & = \prod_{p\in\mathbb P} (1 - \lambda(p)p^{-s})^{-1} \\ & = \mathcal D[\lambda](s) \end{flalign}\]

We have |\lambda\mu = \vert\mu\vert = \mu\mu| is the inverse of |\lambda| so |\mathcal D[\vert\mu\vert](s) = \zeta(s)/\zeta(2s)|.

Recalling, |\gamma| is multiplicative and is characterized by |\gamma(p^n) = -1|.

\[\begin{flalign} \mathcal D[\gamma](s) & = \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty \gamma(p^n)p^{-ns}\right) \\ & = \prod_{p \in \mathbb P}\left(1 - \sum_{n=1}^\infty p^{-ns}\right) \\ & = \prod_{p \in \mathbb P}\left(1 - \left(\sum_{n=0}^\infty p^{-ns} - 1\right)\right) \\ & = \prod_{p \in \mathbb P}\frac{2(1 - p^{-s}) - 1}{1 - p^{-s}} \\ & = \prod_{p \in \mathbb P}\frac{1 - 2p^{-s}}{1 - p^{-s}} \end{flalign}\]

This implies that |(\gamma\star\mu)(p^n) = \begin{cases}-2, & n=1 \\ 0, & n > 1 \end{cases}|.

Indicator Functions

Let |1_{\mathbb P}| be the indicator function for the primes. We have |\omega = 1_{\mathbb P}\star\bar 1| or |1_{\mathbb P} = \omega\star\mu|. Directly, |\mathcal D[1_{\mathbb P}](s) = \sum_{p\in\mathbb P}p^{-s}| so we have |\mathcal D[\omega](s)/\zeta(s) = \sum_{p\in\mathbb P} p^{-s}|.

Lemma: |\mathcal D[1_{\mathbb P}](s)=\sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns)|
Proof: We proceed as follows: \[\begin{align} \sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns) & = \sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\left(\prod_{p\in\mathbb P}(1 - p^{-ns})^{-1}\right) \\ & = -\sum_{n=1}^\infty \frac{\mu(n)}{n}\sum_{p\in\mathbb P}\ln(1 - p^{-ns}) \\ & = \sum_{p\in\mathbb P}\sum_{n=1}^\infty \frac{\mu(n)}{n}\sum_{k=1}^\infty p^{-kns}/k \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \sum_{N=kn} \frac{\mu(n)}{N}p^{-Ns} \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\sum_{N=kn}\mu(n) \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}(\mu\star\bar 1)(N) \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\delta(N) \\ & = \sum_{p\in\mathbb P} p^{-s} \\ & = \mathcal D[1_{\mathbb P}](s) \end{align}\] |\square|

Let |1_{\mathcal P}| be the indicator function for prime powers. |\Omega = 1_{\mathcal P}\star\bar 1| or |1_{\mathcal P} = \Omega\star\mu|. |\mathcal D[1_{\mathcal P}](s) = \sum_{p\in\mathbb P}(1 - p^{-s})^{-1}| so we have |\mathcal D[\Omega](s)/\zeta(s) = \sum_{p\in\mathbb P}(1 - p^{-s})^{-1}|.

Lemma: |\mathcal D[1_{\mathcal P}](s)=\sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns)|
Proof: This is quite similar to the previous proof. \[\begin{align} \sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns) & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\sum_{N=kn}\varphi(n) \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}(\varphi\star\bar 1)(N) \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N} N \\ & = \sum_{p\in\mathbb P}\sum_{N=1}^\infty p^{-Ns} \\ & = \mathcal D[1_{\mathcal P}](s) \end{align}\] |\square|

Summatory Functions

One thing we’ve occasionally been taking for granted is that the operator |\mathcal D| is injective. That is, |\mathcal D[f] = \mathcal D[g]| if and only if |f = g|. To show this, we’ll use the fact that we can (usually) invert the Mellin transform which can be viewed roughly as a version of |\mathcal D| that operates on continuous functions.

Before talking about the Mellin transform, we’ll talk about summatory functions as this will ease our later discussion.

We will turn a sum into a continuous function via a zero-order hold, i.e. we will take the floor of the input. Thus |\sum_{n\leq x} f(n)| is constant on any interval of the form |[k,k+1)|. It then (potentially) has jump discontinuities at integer values. The beginning of the sum is at |n=1| so for all |x<1|, the sum up to |x| is |0|. We will need a slight tweak to better deal with these discontinuities. This will be indicated by a prime on the summation sign.

For non-integer values of |x|, we have: \[\sum_{n \leq x}’ f(n) = \sum_{n \leq x} f(n)\]

For |m| an integer, we have: \[ \sum_{n \leq m}’ f(n) = \frac{1}{2}\left(\sum_{n<m} f(n) + \sum_{n \leq m} f(n)\right) = \sum_{n\leq m} f(n) - f(m)/2 \]

This kind of thing should be familiar to those who’ve worked with things like Laplace transforms of discontinuous functions. (Not for no reason…)

One reason for introducing these summation functions is they are a little easier to work with. Arguably, we want something like |\frac{d}{dx}\sum_{n\leq x}f(n) = \sum_{n=1}^\infty f(n)\delta(n-x)|, but that means we end up with a bunch of distribution nonsense and even more improper integrals. The summation function may be discontinuous, but it at least has a finite value everywhere. Of course, another reason for introducing these functions is that they often are values we’re interested in.

Several important functions are these continuous “sums” of arithmetic functions of this form:

  • Mertens function: |M(x) = \sum_{n\leq x}’ \mu(n)|
  • Chebyshev function: |\vartheta(x) = \sum_{p\leq x, p\in\mathbb P}’ \ln p = \sum_{n\leq x} 1_{\mathbb P}(n)\ln n|
  • Second Chebyshev function: |\psi(x) = \sum_{n\leq x}’ \Lambda(n) = \sum_{n=1}^\infty \vartheta(x^{1/n})|
  • The prime-counting function: |\pi(x) = \sum_{n\leq x}’ 1_{\mathbb P}|
  • [Riemann’s prime-power counting function](https://en.wikipedia.org/wiki/Prime_counting_function#Riemann’s_prime-power_counting_function): |\Pi_0(x) = \sum_{n\leq x} \frac{\Lambda(n)}{\ln n} = \sum_{n=1}^\infty \sum_{p^n\leq x,p\in\mathbb P}’ n^{-1} = \sum_{n=1}^\infty\pi(x^{1/n})n^{-1}|
  • |D(x) = \sum_{n\leq x}d(n)|

These are interesting in how they related to the prime-counting function.

Let’s consider the arithmetic function |\Lambda/\ln| whose Dirichlet series is |\ln\zeta|.

We have the summation function |\sum_{n\leq x}’ \Lambda(n)/\ln(n)|, but |\Lambda(n)| is |0| except when |n=p^k| for some |p\in\mathbb P| and |k\in\mathbb N_+|. Therefore, we have \[\begin{align} \sum_{n\leq x}’ \frac{\Lambda(n)}{\ln(n)} & = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{\Lambda(p^k)}{\ln(p^k)} \\ & = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{\ln(p)}{k\ln(p)} \\ & = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{1}{k} \\ & = \sum_{k=1}^\infty \frac{1}{k} \sum_{p^k\leq x, p\in\mathbb P}’ 1 \\ & = \sum_{k=1}^\infty \frac{1}{k} \sum_{p\leq x^{1/k}, p\in\mathbb P}’ 1 \\ & = \sum_{k=1}^\infty \frac{\pi(x^{1/k})}{k} \\ \end{align}\]

|\ln\zeta(s) = s\mathcal M[\Pi_0](-s)=\mathcal D[\Lambda/\ln](s)| where |\mathcal M| is the Mellin transform, and the connection to Dirichlet series is described in the following section.

Mellin Transform

The definition of the Mellin transform and its inverse are:

\[\mathcal M[f](s) = \int_0^\infty x^s\frac{f(x)}{x}dx\] \[\mathcal M^{-1}[\varphi](x) = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty} x^{-s}\varphi(s)ds\]

The contour integral is intended to mean the vertical line with real part |c| traversed from negative to positive imaginary values. Modulo the opposite sign of |s| and the extra factor of |x|, this is quite similar to a continuous version of a Dirichlet series.

The Mellin transform is closely related to the two-sided Laplace transform.

\[\mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s)\]

Using Mellin transform properties, particularly the one for transforming the derivative, we can write the following.

\[\begin{align} \mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s) & \iff \mathcal D[f](1-s) = -(s-1)\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](s-1) \\ & \iff \mathcal D[f](1-s) = \mathcal M\left[x\mapsto \frac{d}{dx}\sum_{n\leq x}’ f(n)\right](s) \\ & \iff \mathcal D[f](1-s) = \int_0^\infty x^{s-1}\frac{d}{dx}\sum_{n\leq x}’ f(n)dx \\ & \iff \mathcal D[f](1-s) = \int_0^\infty x^{s-1}\sum_{n=1}^\infty f(n)\delta(x-n)dx \\ & \iff \mathcal D[f](1-s) = \sum_{n=1}^\infty f(n)n^{s-1} \\ & \iff \mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s} \end{align}\]

This leads to Perron’s formula

\[\begin{align} \sum_{n\leq x}’ f(n) & = \mathcal M^{-1}[s\mapsto -\mathcal D[f](-s)/s](x) \\ & = \frac{1}{2\pi i}\int_{-c-i\infty}^{-c+i\infty}\frac{\mathcal D[f](-s)}{-s} x^{-s} ds \\ & = -\frac{1}{2\pi i}\int_{c+i\infty}^{c-i\infty}\frac{\mathcal D[f](s)}{s} x^s ds \\ & = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty}\frac{\mathcal D[f](s)}{s} x^s ds \end{align}\]

for which we need to take the Cauchy principal value to get something defined. (See also Abel summation.)

There are side conditions on the convergence of |\mathcal D[f]| for these formulas to be justified. See the links.

Many of the operations we’ve described on Dirichlet series follow from Mellin transform properties. For example, we have |\mathcal M[f]’(s) = \mathcal M[f\ln](s)| generally.

Summary

Properties

Dirichlet Convolution

Dirichlet convolution is |(f\star g)(n) = \sum_{d\mid n} f(d)g(n/d) = \sum_{mk=n} f(m)g(k)|.

Dirichlet convolution forms a commutative ring with it as the multiplication, |\delta| as the multiplicative unit and the usual additive structure. This is to say that Dirichlet convolution is commutative, associative, unital, and bilinear.

For |f| completely multiplicative, |f(g\star h) = fg \star fh|.

Dirichlet Inverse

For any |f| such that |f(1)\neq 0|, there is a |g| such that |f\star g = \delta|. In particular, the set of multiplicative functions forms a subgroup of this multiplicative group, i.e. the Dirichlet convolution of multiplicative functions is multiplicative.

If |f(1) \neq 0|, then |f \star g = \delta| where |g| is defined by the following recurrence:

\[\begin{flalign} g(1) & = 1/f(1) \\ g(n) & = -f(1)^{-1}\sum_{d\mid n,d\neq 1}f(d)g(n/d) \end{flalign}\]

For a completely multiplicative |f|, its Dirichlet inverse is |\mu f|.

Convolution Theorem

\[\mathcal D[f](s)\mathcal D[g](s) = \mathcal D[f\star g](s)\]

Möbius Inversion

\[\delta = \bar 1 \star \mu\]

This means from a divisor sum |g(n)\sum_{d\mid n}f(d) = (f\star\bar 1)(n)| for each |n|, we can recover |f| via |g\star\mu = f\star\bar 1\star\mu = f|. Which is to say |f(n)=\sum_{d\mid n}g(d)\mu(n/d)|.

This can be generalized via |({-})^k\mu\star({-})^k = \delta|. In sums, this means when |g(n)=\sum_{d\mid n}d^k f(n/d)|, then |f(n)=\sum_{d\mid n}\mu(d)d^k g(n/d)|.

Let |h| be a completely multiplicative function. Given |g(m) = \sum_{n=1}^\infty f(mh(n))|, then |f(n) = \sum_{m=1}^\infty \mu(m)g(nh(m))|.

Using the Möbius function for finite multisets and their inclusion ordering, we can recast Möbius inversion of naturals as Möbius inversion of finite multisets (of primes) a la: \[n_P = \sum_{Q\subseteq P}\mu(P\setminus Q)n_Q = \sum_{Q\subseteq P}\mu(n_P/n_Q)n_Q = \sum_{d\mid n_P}\mu(n_P/d)d \]

As a nice result, we have: \[\sum_{n=1}^\infty\ln(1-ax^n) = \sum_{n=1}^\infty\rho_n(a)\ln(1-x^n)\] where |n\rho_n(a) = (\varphi \star a^{({-})})(n)|.

Dirichlet Series

\[\mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s}\]

\[\mathcal D[n\mapsto f(n)n^k](s) = \mathcal D[f](s - k)\]

\[\mathcal D[f^{-1}](s) = \mathcal D[f](s)^{-1}\] where the inverse on the left is the Dirichlet inverse.

\[\mathcal D[f]’(s) = -\mathcal D[f\ln](s)\]

For a completely multiplicative |f|, \[\mathcal D[f]’(s)/\mathcal D[f](s) = -\mathcal D[f\Lambda](s)\] and: \[\ln\mathcal D[f](s) = \mathcal D[f\Lambda/\ln](s)\]

Dirichlet series as a Mellin transform:

\[\mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s)\]

The corresponding inverse Mellin transform statement is called Perron’s Formula:

\[\sum_{n\leq x}’ f(n) = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty}\frac{\mathcal D[f](s)}{s} x^s ds\]

Euler Product Formula

Assuming |f| is multiplicative, we have:

\[\mathcal D[f](s) = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-ns} = \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty f(p^n)p^{-ns}\right) \]

When |f| is completely multiplicative, this can be simplified to:

\[\mathcal D[f](s) = \prod_{p \in \mathbb P}(1 - f(p)p^{-s})^{-1} \]

Lambert Series

Given an arithmetic function |a|, these are series of the form: \[ \sum_{n=1}^\infty a(n) \frac{x^n}{1-x^n} = \sum_{n=1}^\infty (a \star \bar 1)(n) x^n \]

\[\sum_{n=1}^\infty \mu(n) \frac{x^n}{1-x^n} = x\]

\[\sum_{n=1}^\infty \varphi(n) \frac{x^n}{1-x^n} = \frac{x}{(1-x)^2}\]

Arithmetic function definitions

|f(p^n)=\cdots| implies a multiplicative/additive function, while |f(p)=\cdots| implies a completely multiplicative/additive function.

|p^z| for |z\in\mathbb C| is completely multiplicative. This includes the identity function (|z=1|) and |\bar 1| (|z=0|). For any multiplicative |f|, |f\circ \gcd({-},k)| is multiplicative.

|\ln| is completely additive.

Important but neither additive nor multiplicative are the indicator functions for primes |1_{\mathbb P}| and prime powers |1_{\mathcal P}|.

The following functions are (completely) multiplicative unless otherwise specified.

\[\begin{flalign} \delta(p) & = 0 \tag{Kronecker delta} \\ \bar 1(p) & = 1 = p^0 \\ \mu(p^n) & = \begin{cases}-1, & n = 1 \\ 0, & n > 1\end{cases} \tag{Möbius function} \\ \Omega(p) & = 1 \tag{additive} \\ \lambda(p) & = -1 = (-1)^{\Omega(p)} \tag{Liouville function} \\ \omega(p^n) & = 1 \tag{additive} \\ \gamma(p^n) & = -1 = (-1)^{\omega(p^n)} \\ a(p^n) & = p(n) \tag{p(n) is the partition function} \\ \varphi(p^n) & = p^n - p^{n-1} = p^n(1 - 1/p) = J_1(p^n) \tag{Euler totient function} \\ \sigma_k(p^n) & = \sum_{m=0}^n p^{km} = \sum_{d\mid p^n} d^k = \frac{p^{k(n+1)}-1}{p^k - 1} \tag{last only works for k>0} \\ d(p^n) & = n + 1 = \sigma_0 \\ f^{[k]}(p^n) & = \begin{cases}f(p^m),& km=n\\0,& k\nmid n\end{cases} \tag{f multiplicative} \\ \Lambda(n) & = \begin{cases}\ln p,&p\in\mathbb P\land\exists k\in\mathbb N_+.n=p^k \\ 0, & \text{otherwise}\end{cases} \tag{not multiplicative} \\ J_k(p^n) & = p^{kn} - p^{k(n-1)} = p^{kn}(1 - p^{-k}) \tag{Jordan totient function} \\ \psi_k(p^n) & = p^{kn} + p^{k(n-1)} = p^{kn}(1 + p^{-k}) = J_{2k}(p^n)/J_k(p^n) \tag{Dedekind psi function} \\ \end{flalign}\]

Dirichlet convolutions

\[\begin{flalign} \delta & = \bar 1 \star \mu \\ \varphi & = \operatorname{id}\star\mu \\ \sigma_z & = ({-})^z \star \bar 1 = \psi_z \star \bar 1^{(2)} \\ \sigma_1 & = \varphi \star d \\ d & = \sigma_0 = \bar 1 \star \bar 1 \\ f \star f & = fd \tag{f completely multiplicative} \\ f\Lambda & = f\ln \star f\mu = f\ln \star f^{-1} \tag{f completely multiplicative, Dirichlet inverse} \\ \lambda & = \bar 1^{(2)} \star \mu \\ \vert\mu\vert & = \lambda^{-1} = \mu\lambda \tag{Dirichlet inverse} \\ 2^\omega & = \vert\mu\vert \star \bar 1 \\ \psi_z & = ({-})^z \star \vert\mu\vert \\ \operatorname{fix} \pi^{(-)} & = \bar 1 \star (k \mapsto k\pi_k) \tag{for a permutation} \\ ({-})^k & = J_k \star \bar 1 \end{flalign}\]

More Dirichlet convolution identities are here, though many are trivial consequences of the earlier properties.

Dirichlet series

\[\begin{array}{l|ll} f(n) & \mathcal D[f](s) & \\ \hline \delta(n) & 1 & \\ \bar 1(n) & \zeta(s) & \\ n & \zeta(s-1) & \\ n^z & \zeta(s-z) & \\ \sigma_z(n) & \zeta(s-z)\zeta(s) & \\ \mu(n) & \zeta(s)^{-1} & \\ \vert\mu(n)\vert & \zeta(s)/\zeta(2s) & \\ \varphi(n) & \zeta(s-1)/\zeta(s) & \\ d(n) & \zeta(s)^2 & \\ \mu(\gcd(n, 2)) & \eta(s) = (1-2^{1-s})\zeta(s) & \\ \lambda(n) & \zeta(2s)/\zeta(s) \\ \gamma(n) & \prod_{p \in \mathbb P}\frac{1-2p^{-s}}{1-p^{-s}} & \\ f^{[k]}(n) & \mathcal D[f](ks) & \\ f(n)\ln n & -\mathcal D[f]’ (s) & f\text{ completely multiplicative}\\ \Lambda(n) & -\zeta’(s)/\zeta(s) & \\ \Lambda(n)/\ln(n) & \ln\zeta(s) & \\ 1_{\mathbb P}(n) & \sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns) & \\ 1_{\mathcal P}(n) & \sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns) & \\ \psi_k(n) & \zeta(s)\zeta(s - k)/\zeta(2s) & \\ J_k(n) & \zeta(s - k)/\zeta(s) & \end{array}\]


  1. Viewing natural numbers as multisets, |D_n| is the set of all sub-multisets of |n|. The isomorphism described is then simply the fact that given any sub-multiset of the union of two disjoint multisets, we can sort the elements into their original multisets producing two sub-multisets of the disjoint multisets.↩︎

  2. Incidence algebras are a decategorification of the notion of a category algebra.↩︎

Introduction

Purely functional list concatenation, xs ++ ys in Haskell syntax, is well known to be linear time in the length of the first input and constant time in the length of the second, i.e. xs ++ ys is O(length xs). This leads to quadratic complexity if we have a bunch of left associated uses of concatenation.

The ancient trick to resolve this is to, instead of producing lists, produce list-to-list functions a la [a] -> [a] or ShowS = String -> String = [Char] -> [Char]. “Concatenation” of “lists” represented this way is just function composition which is a constant time operation. We can lift a list xs to this representation via the section (xs ++). This will still lead to O(length xs) amount of work to apply this function, but a composition of such functions applied to a list will always result in a fully right associated expression even if the function compositions aren’t right associated.

In the last several years, it has become popular to refer to this technique as “difference lists”. Often no justification is given for this name. When it is given, it is usually a reference to the idea of difference lists in logic programming. Unfortunately, other than both techniques giving rise to efficient concatenation, they have almost no similarities.

Read more...

Introduction

Classical First-Order Logic (Classical FOL) has an absolutely central place in traditional logic, model theory, and set theory. It is the foundation upon which ZF(C), which is itself often taken as the foundation of mathematics, is built. When classical FOL was being established there was a lot of study and debate around alternative options. There are a variety of philosophical and metatheoretic reasons supporting classical FOL as The Right Choice.

This all happened, however, well before category theory was even a twinkle in Mac Lane’s and Eilenberg’s eyes, and when type theory was taking its first stumbling steps.

My focus in this article is on what classical FOL looks like to a modern categorical logician. This can be neatly summarized as “classical FOL is the internal logic of a Boolean First-Order Hyperdoctrine.” Each of the three words in this term, “Boolean”, “First-Order”, and “Hyperdoctrine”, suggest a distinct axis in which to vary the (class of categorical models of the) logic. All of them have compelling categorical motivations to be varied.

Read more...

Introduction

In 1983, Mark Overmars described global rebuilding in The Design of Dynamic Data Structures. The problem it was aimed at solving was turning the amortized time complexity bounds of batched rebuilding into worst-case bounds. In batched rebuilding we perform a series of updates to a data structure which may cause the performance of operations to degrade, but occasionally we expensively rebuild the data structure back into an optimal arrangement. If the updates don’t degrade performance too much before we rebuild, then we can achieve our target time complexity bounds in an amortized sense. An update that doesn’t degrade performance too much is called a weak update.

Taking an example from Okasaki’s Purely Functional Data Structures, we can consider a binary search tree where deletions occur by simply marking the deleted nodes as deleted. Then, once about half the tree is marked as deleted, we rebuild the tree into a balanced binary search tree and clean out the nodes marked as deleted at that time. In this case, the deletions count as weak updates because leaving the deleted nodes in the tree even when it corresponds to up to half the tree can only mildly impact the time complexity of other operations. Specifically, assuming the tree was balanced at the start, then deleting half the nodes could only reduce the tree’s depth by about 1. On the other hand, naive inserts are not weak updates as they can quickly increase the tree’s depth.

The idea of global rebuilding is relatively straightforward, though how you would actually realize it in any particular example is not. The overall idea is simply that instead of waiting until the last moment and then rebuilding the data structure all at once, we’ll start the rebuild sooner and work at it incrementally as we perform other operations. If we update the new version faster than we update the original version, we’ll finish it by the time we would have wanted to perform a batched rebuild, and we can just switch to this new version.

More concretely, though still quite vaguely, global rebuilding involves, when a threshold is reached, rebuilding by creating a new “empty” version of the data structure called the shadow copy. The original version is the working copy. Work on rebuilding happens incrementally as operations are performed on the data structure. During this period, we service queries from the working copy and continue to update it as usual. Each update needs to make more progress on building the shadow copy than it worsens the working copy. For example, an insert should insert more nodes into the shadow copy than the working copy. Once the shadow copy is built, we may still have more work to do to incorporate changes that occurred after we started the rebuild. To this end, we can maintain a queue of update operations performed on the working copy since the start of a rebuild, and then apply these updates, also incrementally, to the shadow copy. Again, we need to apply the updates from the queue at a fast enough rate so that we will eventually catch up. Of course, all of this needs to happen fast enough so that 1) the working copy doesn’t get too degraded before the shadow copy is ready, and 2) we don’t end up needing to rebuild the shadow copy before it’s ready to do any work.

Read more...

Introduction

Morleyization is a fairly important operation in categorical logic for which it is hard to find readily accessible references to a statement and proof. Most refer to D1.5.13 of “Sketches of an Elephant” which is not an accessible text. 3.2.8 of “Accessible Categories” by Makkai and Paré is another reference, and “Accessible Categories” is more accessible but still a big ask for just a single theorem.

Here I reproduce the statement and proof from “Accessible Categories” albeit with some notational and conceptual adaptations as well as some commentary. This assumes some basic familiarity with the ideas and notions of traditional model theory, e.g. what structures, models, and |\vDash| are.

Read more...

Introduction

Andrej Bauer has a paper titled The pullback lemma in gory detail that goes over the proof of the pullback lemma in full detail. This is a basic result of category theory and most introductions leave it as an exercise. It is a good exercise, and you should prove it yourself before reading this article or Andrej Bauer’s.

Andrej Bauer’s proof is what most introductions are expecting you to produce. I very much like the representability perspective on category theory and like to see what proofs look like using this perspective.

So this is a proof of the pullback lemma from the perspective of representability.

Preliminaries

The key thing we need here is a characterization of pullbacks in terms of representability. To just jump to the end, we have for |f : A \to C| and |g : B \to C|, |A \times_{f,g} B| is the pullback of |f| and |g| if and only if it represents the functor \[\{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \}\]

That is to say we have the natural isomorphism \[ \mathrm{Hom}({-}, A \times_{f,g} B) \cong \{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \} \]

We’ll write the left to right direction of the isomorphism as |\langle u,v\rangle : U \to A \times_{f,g} B| where |u : U \to A| and |v : U \to B| and they satisfy |f \circ u = g \circ v|. Applying the isomorphism right to left on the identity arrow gives us two arrows |p_1 : A \times_{f,g} B \to A| and |p_2 : A \times_{f,g} B \to B| satisfying |p_1 \circ \langle u, v\rangle = u| and |p_2 \circ \langle u,v \rangle = v|. (Exercise: Show that this follows from being a natural isomorphism.)

One nice thing about representability is that it reduces categorical reasoning to set-theoretic reasoning that you are probably already used to, as we’ll see. You can connect this definition to a typical universal property based definition used in Andrej Bauer’s article. Here we’re taking it as the definition of the pullback.

Proof

The claim to be proven is if the right square in the below diagram is a pullback square, then the left square is a pullback square if and only if the whole rectangle is a pullback square. \[ \xymatrix { A \ar[d]_{q_1} \ar[r]^{q_2} & B \ar[d]_{p_1} \ar[r]^{p_2} & C \ar[d]^{h} \\ X \ar[r]_{f} & Y \ar[r]_{g} & Z }\]

Rewriting the diagram as equations, we have:

Theorem: If |f \circ q_1 = p_1 \circ q_2|, |g \circ p_1 = h \circ p_2|, and |(B, p_1, p_2)| is a pullback of |g| and |h|, then |(A, q_1, q_2)| is a pullback of |f| and |p_1| if and only if |(A, q_1, p_2 \circ q_2)| is a pullback of |g \circ f| and |h|.

Proof: If |(A, q_1, q_2)| was a pullback of |f| and |p_1| then we’d have the following.

\[\begin{align} \mathrm{Hom}({-}, A) & \cong \{(u_1, u_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, B) \mid f \circ u_1 = p_1 \circ u_2 \} \\ & \cong \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = p_1 \circ \langle v_1, v_2\rangle \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = v_1 \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, v_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, C) \mid g \circ f \circ u_1 = h \circ v_2 \} \end{align}\]

The second isomorphism is |B| being a pullback and |u_2| is an arrow into |B| so it’s necessarily of the form |\langle v_1, v_2\rangle|. The first equality is just |p_1 \circ \langle v_1, v_2\rangle = v_1| mentioned earlier. The second equality merely eliminates the use of |v_1| using the equation |f \circ u_1 = v_1|.

This overall natural isomorphism, however, is exactly what it means for |A| to be a pullback of |g \circ f| and |h|. We verify the projections are what we expect by pushing |id_A| through the isomorphism. By assumption, |u_1| and |u_2| will be |q_1| and |q_2| respectively in the first isomorphism. We see that |v_2 = p_2 \circ \langle v_1, v_2\rangle = p_2 \circ q_2|.

We simply run the isomorphism backwards to get the other direction of the if and only if. |\square|

The simplicity and compactness of this proof demonstrates why I like representability.

Introduction

It is not uncommon for universal quantification to be described as (potentially) infinite conjunction1. Quoting Wikipedia’s Quantifier_(logic) page (my emphasis):

For a finite domain of discourse |D = \{a_1,\dots,a_n\}|, the universal quantifier is equivalent to a logical conjunction of propositions with singular terms |a_i| (having the form |Pa_i| for monadic predicates).

The existential quantifier is equivalent to a logical disjunction of propositions having the same structure as before. For infinite domains of discourse, the equivalences are similar.

While there’s a small grain of truth to this, I think it is wrong and/or misleading far more often than it’s useful or correct. Indeed, it takes a bit of effort to even get a statement that makes sense at all. There’s a bit of conflation between syntax and semantics that’s required to have it naively make sense, unless you’re working (quite unusually) in an infinitary logic where it is typically outright false.

What harm does this confusion do? The most obvious harm is that this view does not generalize to non-classical logics. I’ll focus on constructive logics, in particular. Besides causing problems in these contexts, which maybe you think you don’t care about, it betrays a significant gap in understanding of what universal quantification actually is. Even in purely classical contexts, this confusion often manifests, e.g., in confusion about |\omega|-inconsistency.

So what is the difference between universal quantification and infinite conjunction? Well, the most obvious difference is that infinite conjunction is indexed by some (meta-theoretic) set that doesn’t have anything to do with the domain the universal quantifier quantifies over. However, even if these sets happened to coincide2 there are still differences between universal quantification and infinite conjunction. The key is that universal quantification requires the predicate being quantified over to hold uniformly, while infinite conjunction does not. It just so happens that for the standard set-theoretic semantics of classical first-order logic this “uniformity” constraint is degenerate. However, even for classical first-order logic, this notion of uniformity will be relevant.

Read more...

Introduction

The purpose of this article is to answer the question: what is the coproduct of two groups? The approach, however, will be somewhat absurd. Instead of simply presenting a construction and proving that it satisfies the appropriate universal property, I want to find the general answer and simply instantiate it for the case of groups.

Specifically, this will be a path through the theory of Lawvere theories and their models with the goal of motivating some of the theory around it in pursuit of the answer to this relatively simple question.

If you really just want to know the answer to the title question, then the construction is usually called the free product and is described on the linked Wikipedia page.

Read more...

Introduction

This is a brief article about the notions of preserving, reflecting, and creating limits and, by duality, colimits. Preservation is relatively intuitive, but the distinction between reflection and creation is subtle.

Preservation of Limits

A functor, |F|, preserves limits when it takes limiting cones to limiting cones. As often happens in category theory texts, the notation focuses on the objects. You’ll often see things like |F(X \times Y) \cong FX \times FY|, but implied is that one direction of this isomorphism is the canonical morphism |\langle F\pi_1, F\pi_2\rangle|. To put it yet another way, in this example we require |F(X \times Y)| to satisfy the universal property of a product with the projections |F\pi_1| and |F\pi_2|.

Other than that subtlety, preservation is fairly intuitive.

Reflection of Limits versus Creation of Limits

A functor, |F|, reflects limits when whenever the image of a cone is a limiting cone, then the original cone was a limiting cone. For products this would mean that if we had a wedge |A \stackrel{p}{\leftarrow} Z \stackrel{q}{\to} B|, and |FZ| was the product of |FA| and |FB| with projections |Fp| and |Fq|, then |Z| was the product of |A| and |B| with projections |p| and |q|.

A functor, |F|, creates limits when whenever the image of a diagram has a limit, then the diagram itself has a limit and |F| preserves the limiting cones. For products this would mean if |FX| and |FY| had a product, |FX \times FY|, then |X| and |Y| have a product and |F(X \times Y) \cong FX \times FY| via the canonical morphism.

Creation of limits implies reflection of limits since we can just ignore the apex of the cone. While creation is more powerful, often reflection is enough in practice as we usually have a candidate limit, i.e. a cone. Again, this is often not made too explicit.

Example

Consider the posets:

$$\xymatrix{ & & & c \\ X\ar@{}[r]|{\Large{=}} & a \ar[r] & b \ar[ur] \ar[dr] & \\ & & & d \save "1,2"."3,4"*+[F]\frm{} \restore } \qquad \xymatrix{ & & c \\ Y\ar@{}[r]|{\Large{=}} & b \ar[ur] \ar[dr] & \\ & & d \save "1,2"."3,3"*+[F]\frm{} \restore } \qquad \xymatrix{ & c \\ Z\ar@{}[r]|{\Large{=}} & \\ & d \save "1,2"."3,2"*+[F]\frm{} \restore }$$

Failure of reflection

Let |X=\{a, b, c, d\}| with |a \leq b \leq c| and |b \leq d| mapping to |Y=\{b, c, d\}| where |a \mapsto b|. Reflection fails because |a| maps to a meet but is not itself a meet.

Failure of creation

If we change the source to just |Z=\{c, d\}|, then creation fails because |c| and |d| have a meet in the image but not in the source. Reflection succeeds, though, because there are no non-trivial cones in the source, so every cone (trivially) gets mapped to a limit cone. It’s just that we don’t have any cones with both |c| and |d| in them.

In general, recasting reflection and creation of limits for posets gives us: Let |F: X \to Y| be a monotonic function. |F| reflects limits if every lower bound that |F| maps to a meet is already a meet. |F| creates limits if whenever |F[U]| has a meet for |U \subseteq X|, then |U| already had a meet and |F| sends the meet of |U| to the meet of |F[U]|.