Complex-step differentiation is a simple and effective technique for numerically differentiating a(n analytic) function.
Discussing it is a neat combination of complex analysis, numerical analysis, and ring theory. We’ll see that it is very
closely connected to forward-mode automatic differentiation (FAD). For better or worse, while widely applicable, the scenarios
where complex-step differentiation is the best solution are a bit rare. To apply complex-step differentiation, you need
a version of your desired function that operates on complex numbers. If you have that, then you can apply complex-step
differentiation immediately. Otherwise, you need to adapt the function to complex arguments. This can be done essentially
automatically using the same techniques as automatic differentiation, but at that point you might as well use automatic
differentiation. Adapting the code to complex numbers or AD takes about the same amount of effort, however, the AD version
will be more efficient, more accurate, and easier to use.
Nevertheless, this serves as a simple example to illustrate several theoretical and practical ideas.
Numerical Differentiation
The problem we’re solving is given a function which is differentiable around a point ,
we’d like to compute its derivative at . In many cases, is real analytic
at the point meaning has a Taylor series which converges
to in some open interval containing .
The most obvious way of numerically differentiating is to approximate the limit in the definition of
the derivative, by simply choosing a small value for rather
than taking the limit. When is real analytic at , we can analyze the quality of this approximation by expanding
in a Taylor series at . This produces A slight tweak produces a better result with the
same number of evaluations of . Specifically, the Taylor series of at is equal to the
odd part the Taylor series of at . This leads to the Central Differences formula:
The following interactively illustrates this using the function
evaluated at 1.5.
The correct answer to digits is 1.518.600812734259759. The slider ranges from to .
If you play with the slider using the first example, you’ll see that the error decreases until around after which it starts
increasing until where it is off by more than . At the estimated derivative is which is, of course,
completely incorrect. Even at the error is on the order of which is much higher than the
double precision floating point machine epsilon of approximately .
There are two issues here. First, we have the issue that if , then for sufficiently small .
This happens when has a magnitude of around or more.
The second issue here is known as catastrophic cancellation. For simplicity, let’s say . (It’s actually about for the
first example.) Let’s further say for some small value of , . The value we care about is the
, but given limited precision, we might have , meaning we only
have three digits of precision for the value we care about. Indeed, as becomes smaller we’ll lose more and more
precision in our desired value until we lose all precision which happens when . It is generally
a bad idea numerically to calculate a small value by subtracting two larger values for this reason.
We would not have the first issue if as in the second and fourth examples ().
We would not have the second issue if as in the second and third examples ( with ).
We have neither issue in the second example of with .
This will become important later.
We have a dilemma. For the theory, we want as small a value of as possible without being zero. In practice, we start
losing precision as gets smaller, and generally larger values of are going to be less impacted by this.
Let’s set this aside for now and look at other ways of numerically computing the derivative in the hopes that
we can avoid this problem.
Cauchy’s Residue Theorem
If we talk about functions , the analogue of real analyticity
is holomorphicity or complex analyticity.
A complex function is holomorphic if it satisfies the Cauchy-Riemann equations.
(See the Appendix for more details about where the Cauchy-Riemann equations come from.)
A complex function is complex analytic if it has a Taylor series which converges to the function. It can be proven
that these two descriptions are equivalent, though this isn’t a trivial fact. We can also talk about functions that
are holomorphic or complex analytic on an open subset of and at a point by considering an open subset around
that point. The typical choice of open subset is some suitably small open disk in the complex plane about the point.
(Other common domains are ellipses, infinite strips, and areas bounded by Hankel contours
and variations such as sideways opening parabolas.)
A major fact about holomorphic functions is the Cauchy integral theorem.
If is a holomorphic function inside a (suitably nice) closed curve in the complex plane, then .
Again, will typically be chosen to be some circle. (Integrals like this in the complex plane are often called
contour integrals and the curves we’re integrating along are called contours.)
Things get really interesting when we generalize to meromorphic functions
which are complex functions that are holomorphic except at an isolated set of points. These take the form of poles
which are points such that , i.e. poles are where a function goes to infinity as, e.g., does at .
The generalization of Cauchy’s integral theorem is Cauchy’s Residue Theorem. This theorem
is surprising and is one of the most useful theorems in all of mathematics both theoretically and practically.
We’ll only need a common special case of it. Let be a holomorphic function, then is a meromorphic function
with a single pole of order at . If is a positively oriented, simple closed curve containing ,
then In this case,
is the residue of at . More generally, if there are multiple poles in the area bounded by , then we will
sum up their residues.
This formula provides us a means of calculating the -st Taylor coefficient of a complex analytic function at any point.
For our particular purposes, we’ll only need the case,
For the remainder of this section I want to give some examples of how Cauchy’s Residue Theorem is used both theoretically
and practically. This whole article will itself be another practical example of Cauchy’s Residue Theorem. This is not exhaustive
by any means.
To start illustrating some of the surprising properties of this theorem, we can take the case which states that we can evaluate
a holomorphic function at any point via where
is any contour which bounds an area containing . This leads to an interesting discreteness. Not only can we evaluate
a (holomorphic) function (or any of its derivatives) at a point via the values of the function on a contour, the only significant
constraint on that contour is that it bound an area containing the desired point. In other words, no matter how we deform the contour the
integral is constant except when we deform the contour so as not to bound an area containing the point being evaluated, at which point
the integral’s value is 1. It may seem odd to use an integral to evaluate
a function at a point, but it can be useful when there are numerical issues with evaluating the function near the desired point2. In fact, these results show that if we know the values of a holomorphic function on the boundary of a given open subset of
the complex plane, then we know the value of the function everywhere. In this sense, holomorphic functions (and analytic
functions in general) are extremely rigid.
This leads to the notion of analytic continuation
where we try to compute an analytic function beyond its overt domain of definition. This is the basis of most “sums of divergent series”.
For example, there is the first-year calculus fact that the sum of the infinite series is converging
on the interval . In fact, the proof of convergence only needs so we can readily generalize to
complex with , i.e. contained in the open unit disk. However, is a meromorphic function that is holomorphic
everywhere except for , therefore there is a unique analytic function defined everywhere except that agrees with
the infinite sum on the unit disk, namely itself. Choosing leads to the common example of “summing a divergent series”
with “” which really means “the value at of the unique complex analytic function which agrees
with this infinite series when it converges”.
Sticking with just evaluation, applying the Cauchy Residue theorem to quadrature, i.e. numerical integration, leads to an interesting
connection to a rational approximation problem. Say we want to compute , we can use the Cauchy
integral to evaluate leading to
A quadrature formula looks like . The sum
can be written as a Cauchy integral of . We thus have
as the error of
the approximation. The sum is a rational function (in partial fraction form) and thus the error is minimized by points ()
and weights () that lead to better rational approximations of 3.
The ability to calculate coefficients of the Taylor series of a holomorphic function is, by itself, already a valuable
tool in both theory and practice. In particular, the coefficients of a generating function
or a Z-transform can be computed with Cauchy integrals. This has applications
in probability theory, statistics, finance, combinatorics, recurrences, differential equations, and signal processing.
Indeed, when and is the unit circle, then the Cauchy integral is a component of the Fourier series
of . Approximating these integrals with the Trapezoid Rule (which we’ll discuss in a bit) produces the Discrete Fourier Transform.
Let be a polynomial and, for simplicity, assume all its zeroes are of multiplicity one. Then is a meromorphic function
that’s holomorphic everywhere except for the roots of . The Cauchy integral
counts the number of roots of contained in the area bounded by . If we know there is only one root of within
the area bounded by , then we can compute that root with .
A better approach is to use the formulas .
Similar ideas can be used to adapt this to counting and finding multiple roots. See
Numerical Algorithms based on Analytic Function Values at Roots of Unity by Austin, Kravanja, and
Trefethen (2014) which is a good survey in general.
Another very common use of Cauchy’s Residue Theorem is to sum (convergent) infinite series. has a zero at
for each integer and a non-zero derivative at those points. In fact, the derivative is . Alternatively,
we could use which has a zero at for each integer but has derivative at those points.
Therefore, has a (first-order) pole at for each integer
with residue . In particular, if is a holomorphic function (at least near the real axis), then the value of the
Cauchy integral of along a Hankel contour will be . Along an infinite strip
around the real axis we’d get . As an example, we can consider the
famous sum, . It can be shown that if is a meromorphic
function whose poles are not located at integers and is bounded for sufficiently large , then . We thus have
that where are the poles of .
In particular, has (first-order) poles at . This gives us simply
where I’ve used and the fact that is an odd function. Exploiting the symmetry of the sum
gives us By expanding
in a Laurent series, we see that the limit of the right-hand side as approaches is . While contour
integration is quite effective for coming up with analytic solutions to infinite sums, numerically integrating the contour
integrals is also highly effective as illustrated in Talbot quadratures and rational approximations
by Trefethen, Weideman, and Schmelzer (2006), for example.
Computing the Integrals
We’ve seen in the previous section that .
This doesn’t much help us if we don’t have a way to compute the integrals. From this point forward, fix
as a circle of radius centered on .
Before that, let’s consider numerical integration in general. Say we want to integrate the real function from to ,
i.e. we want to calculate . The most obvious way to go about it is to approximate the Riemann
sums that define the (Riemann) integral. This would produce a formula like
corresponding to summing the areas of rectangles
whose left points are the values of . As before with central differences, relatively minor tweaks will give better approximations. In particular,
we get the two roughly equivalent approximations of the Midpoint Rule where we take the midpoint rather than the
left or right point, and the Trapezoid Rule where we average the left and right Riemann
sums. While both of these perform substantially better than the left/right Riemann sums, they are still rather basic
quadrature rules; the error decreases as .
Something special happens when is a periodic function. First, the Trapezoid rule reduces to
. More importantly, the Midpoint rule and the Trapezoid rule both start converging
geometrically rather than quadratically. Furthermore, for the particular case we’re interested in, namely integrating analytic
functions along a circle in the complex plane, these quadrature rules are optimal. Let be the -th root of unity.
The Trapezoid rule corresponds to sum the values of at the even powers of scaled by the radius and translated
by , and the Midpoint rule corresponds to the sum of the odd powers.
We now have two parameters for approximating a Cauchy integral via the Trapezoid or Midpoint rules: the radius and the
number of points .
Complex-Step Differentiation corresponds to approximating the Cauchy integral for the derivative using the extreme case of
the Midpoint rule with and very small radii (i.e. values of ). Meanwhile, Central Differences corresponds to the extreme case
of using the Trapezoid rule with and very small radii. To spell this out a bit more, we perform the substitution
which leads to and
Applying the Trapezoid rule to the right hand side of this corresponds to picking , while applying the
Midpoint rule corresponds to picking . for , and
for . For the Trapezoid rule, this leads to which is
Central Differences. For the Midpoint rule, this leads to This
is Complex-Step Differentiation when is real.
Complex-Step Differentiation
As just calculated, Complex-Step Differentiation computes the derivative at the real number via the formula:
Another perspective on this formula is that it is just the
Central Differences formula along the imaginary axis instead of the real axis.
When is complex analytic and real-valued on real arguments, then we have
where is the complex conjugate of , i.e. it maps to
or to . This leads to
. This lets us simplify
Complex-Step Differentiation to .
Here is the earlier interactive example but now using Complex-Step Differentiation. As decreases in magnitude, the error
steadily decreases until there is no error at all.
: 1e-11 1.5: 18.60081273425976
error: 0
This formula using avoids catastrophic cancellation simply by not doing a subtraction. However, it turns out
for real (which is necessary to derive the simplified formula), there isn’t a problem either way. Using the first form of
the Complex-Step Differentiation formula is also numerically stable. The key here is that the imaginary part of and are
both and so we don’t get catastrophic cancellation for the same reason we wouldn’t get it with Central Differences if .
This suggests that if we wanted to evaluate at some non-zero point on the imaginary axis, Complex-Step Differentiation would
perform poorly while Central Differences would perform well. Further, if we wanted to evaluate at some point not on either
the real or imaginary axes, neither approach would perform well. In this case, choosing different values for and the radius
would be necessary4.
A third perspective on Complex-Step Differentiation comes when we think about which value of should we use. The smaller
is, the smaller we’d want to be. Unlike Central Differences, there is little stopping us from having be very small and
values like are typical. In fact, around in double precision floating point arithmetic, gets the
theoretically useful property that due to underflow. In this case, behaves like where
. This is the defining property of the ring of dual numbers.
Dual numbers are exactly what are used in forward-mode automatic differentiation.
Forward-Mode Automatic Differentiation
The ring of dual numbers has numbers of the form where . This behaves just like
the complex numbers except that instead of we have . The utility of dual numbers for
our purposes can be seen by expanding in a Taylor series about . We get
. All higher power terms of the Taylor series are zero because .
We can thus get the derivative of simply by computing and then looking at the coefficient of
in the result.
In this example there is no interactivity as we are not estimating the derivative in the AD case but instead calculating it in parallel.
There is no parameter.
1.5: 18.60081273425976
error: 0
As the end of the previous section indicated, Complex-Step Differentiation approximates this (often exactly) by using as .
Nevertheless, this is not ideal. Often the complex versions of a function will be more costly than their dual number counterparts.
For example, involves four real multiplications and two additions.
involves three real multiplications and one addition on
the other hand.
References
Using Complex Variables to Estimate Derivatives of Real Functions by Squire and Trapp (1998)
is the first(?) published paper specifically about the idea of complex-step differentiation. It’s a three page paper and the authors
are not claiming any originality but just demonstrating the effectiveness of ideas from the ’60s that the authors found to be underappreciated.
The Complex-Step Derivative Approximation by Martins, Sturdza, and Alonso (2003) does
a much deeper dive into the theory behind complex-step differentiation and its connections to automatic differentiation.
You may have noticed the name “Trefethen” in many of the papers cited. Nick Trefethen and his collaborators have been doing amazing
work for the past couple of decades, most notably in the Chebfun project. Looking at
Trefethen’s book Approximation Theory and Approximation Practice (and lectures)
recently reintroduced me to Trefethen’s work. This particular article was
prompted by a footnote in the paper The Exponentially Convergent Trapezoidal Rule which I highly
recommend. In fact, I highly recommend Chebfun as well as nearly all of Trefethen’s work. It is routinely compelling, interesting, and
well presented.
Appendix
Using the language of Geometric Calculus, we can write a very general form of the Fundamental
Theorem of Calculus. Namely, where
is an -dimensional manifold. Here is a multivector-valued vector function. If and ,
then this would produce a formula very similar to the Cauchy integral formula.
Writing , the Cauchy-Riemann equations are
and . However, leads to the slightly
different equations and .
The resolution of this discrepancy is found by recognizing that we don’t want to be a vector-valued vector function but rather
a spinor-valued spinor function. It is most natural to identify complex numbers with the even subalgebra of the 2D geometric
algebra. If and are the two orthonormal basis vectors of the 2D space, then the pseudoscalar
satisfies . For the 2D case, a spinor
is a multivector of the form .
We can generalize the vector derivative, , to a multivector derivative where is a multivector variable
by using the generic formula for the directional derivative in a linear space and then defining to be a linear
combination of directional derivatives. Given any -linear space and an element , we can
define the directional derivative of in the direction via
. In our case,
we have the basis vectors though we only care about the even subalgebra
corresponding to the basis vectors . Define
and assuming is a spinor-valued function5.
We can then define . We now have is
the equivalent to the Cauchy-Riemann equations where is now a spinor-valued spinor function, i.e. a function of .
In terms of the general theory of partial differential equations, we are saying that is
a Green’s function for . We can then understand everything that is happening
here in terms of general results. In particular, it is the two-dimensional case of the results described in
Multivector Functions by Hestenes.↩︎