Assuming we can find a nilpotent object \( \epsilon\ne 0 \) such that \(\epsilon^2=0\),
one could then reduce the above expression exactly to
\[
f(x+\epsilon) = f(x) + f'(x)\epsilon.
\]
How is this useful?
Consider as an example the quadratic function \(f(x)=x^2\),
and assume we are interested in knowing the value
of its derivative \(f'(p)\) in \(p\).
In this case we have:
Thus \(f'(p) = 2p\).
Numbers of the form \(a+b\epsilon\) are known as
dual numbers
and accept a representation in the space of \(2\times 2\) matrices.
In particular any dual number \(a+b\epsilon\) has representation
satisfies the desire property \(\epsilon\ne 0\) and \(\epsilon^2=0\).
More generally, higher \(n\)-th order derivatives can be computed as well.
One needs however to consider a more general class of hyper-numbers
of the form \(a_0 + a_1\epsilon + ... + a_{n}\epsilon^{n}\) with
\(\epsilon^{k}\ne 0\) for any \(k \le n\) and \(\epsilon^{n+1}=0\).
Once again, these numbers accept a representation in the space of
\((n+1)\times (n+1)\) matrices \(\mathbb{R}^{(n+1)\times (n+1)}\).
In particular the elements of matrix \(\epsilon\) are given by
In light of this, it should become apparent the appeal of automatic differentiation
for exact numerical computation of a function's derivative, without the
need for symbolic manipulation of algebraic expressions.
Let us therefore write a Python class Dual implementing
the algebras we have introduced above. One can represent the number
\(a_0 + a_1\epsilon +...+a_{n-1}\epsilon^{n-1}\) as the collection of \(n\)
parameters \(a_i\). In the following we shall refer to a hyper-number
such that \(\epsilon^n=0\) as a number of order \(n\).
First of all, let us start by writing a function _generate_epsilon
to generate the matrix representations of each \(\epsilon^i\):
import numpy as np
import scipy.sparse as sp
from functools import lru_cache
@lru_cache()
def _generate_epsilon(n=None, diff_order=1):
row = np.arange(n - diff_order)
col = row + diff_order
data = np.ones(n - diff_order)
return sp.coo_matrix((data, (row, col)), shape=(n, n)).toarray()
We can therefore start writing our Dual class
which will store the number's parameters \(a_i\)
in the attribute _params, and the associated matrices
\(\epsilon^i\) in eps_all:
class Dual:
def __init__(self, *args):
if not args:
args = [0]
self._params = args
self.n = len(args)
self.eps_all = [_generate_epsilon(self.n, k) for k in range(1, self.n)]
We will therefore be able to generate an instance of our number
by passing the corresponding parameters as arguments.
As an example, the number \(1+2\epsilon+3\epsilon^2\) will be instantiated as
Dual(1,2,3).
In order to return a formatted representation of this object we can also
override the magic method __repr__
as
class Dual:
def __init__(self, *args): ...
def __repr__(self):
SUPERSCRIPTS = ["⁰", "¹", "²", "³", "⁴",
"⁵", "⁶", "⁷", "⁸", "⁹"]
repr = f'{self._params[0]}'
for k, p in enumerate(self._params[1:]):
repr += f" + {p}ε{''.join([SUPERSCRIPTS[int(q)]
for q in list(str(k + 1))])
if k > 0 else ''}"
return repr
so that, as an example, the code Dual(1, 2, 3) will return
1 + 2ε + 3ε². More importantly however,
we would like to access the full matrix representation of the number,
as well as to be able to create a new instance from a matrix representation.
This can be achieved by implementing the methods matrix_repr
and from_matrix:
class Dual:
def __init__(self, *args): ...
def __repr__(self): ...
def matrix_repr(self):
return sum(p*ε for p, ε in zip(self._params[1:], self.eps_all)) + self._params[0]*np.eye(self.n)
def from_matrix(self, M):
self._params, self.n = M[0], len(M[0])
self.eps_all = [_generate_epsilon(self.n, k) for k in range(1, self.n)]
return self
Now Dual(1,2,3).matrix_repr() will return
array([[1., 2., 3.],
[0., 1., 2.],
[0., 0., 1.]])
and an instance can also be created from a numpy.array
or numpy.matrix (say M) as Dual().from_matrix(M).
Let us now turn to implementing the algebra.
Addition and subtraction operations are trivial:
class Dual:
def __init__(self, *args): ...
def __repr__(self): ...
def matrix_repr(self): ...
def from_matrix(self, M): ...
def __add__(self, other):
if isinstance(other, Dual):
return Dual(*[p1 + p2 for p1, p2 in zip(self._params, other._params)])
else:
return Dual(*([self._params[0] + other] + list(self._params[1:])))
def __radd__(self, other):
return self + other
def __neg__(self):
return Dual(*[-p for p in self._params])
def __sub__(self, other):
return self.__add__(other.__neg__())
def __rsub__(self, other):
return -self + other
Multiplication and division are instead slightly more involved.
One straightforward option of course could be to let a matrix algebra package
handle the operations, for example by defining multiplications as
and similarly for division. However this is slow, and we can do better.
In particular recalling the nilpotence of all \(\epsilon^i\) we might
realise that multiplying a number by \(\epsilon\) simply amounts to
shifting the vector parameter to the right.
As an example, the order-\(3\) number \((1+2\epsilon+3\epsilon^2)\epsilon=1\epsilon+2\epsilon^2\)
so that the associated parameters vector transforms as \((1,2,3)\rightarrow(0,1,2)\).
Similarly, multiplying by \(\epsilon^2\) would entail the transformation of the
parameters vector \((1,2,3)\rightarrow(0,0,1)\), and so forth.
Consequently, multiplication is easily and efficiently implemented as
class Dual:
def __init__(self, *args): ...
def __repr__(self): ...
def matrix_repr(self): ...
def from_matrix(self, M): ...
def __add__(self, other): ...
def __radd__(self, other): ...
def __neg__(self): ...
def __sub__(self, other): ...
def __rsub__(self, other):...
def _mul_epsilon(self, k=1):
if k==0: return self
return Dual(*([0]*k + [p for p in self._params[:-k]]))
def __mul__(self, other):
if isinstance(other, Dual):
return sum(self._mul_epsilon(k)*p2 for k, p2 in enumerate(other._params))
else:
return Dual(*[p * other for p in self._params])
def __rmul__(self, other):
return self * other
without the need for matrix algebra.
Division instead reduces to computing the inverse of an object of the class
Dual. Once again, one could think of making use of
matrix algebra, which would remind us that the inverse of a matrix might not
exist in the general case. However, the reader might realise we are dealing
with square upper triangular matrices, which makes our task much easier.
In any case, once again for the sake of efficiency,
we shall try to avoid using matrix algebra.
Here the crucial point, is to notice that our numbers accept matrix representation
of the form \(a_0\mathbb{1} + D\), where the matrix \(D = \sum_i^{n-1} a_i\epsilon^i\)
is nilpotent, that is \(D^n=0\).
The inverse can therefore be written as the truncated geometric sum
Thus, introducing the power method __pow__,
one can efficiently compute an inverse and implement
the division operation between two numbers \(d_1,\ d_2\) as \(d_1 d_2^{-1}\):
class Dual:
def __init__(self, *args): ...
def __repr__(self): ...
def matrix_repr(self): ...
def from_matrix(self, M): ...
def __add__(self, other): ...
def __radd__(self, other): ...
def __neg__(self): ...
def __sub__(self, other): ...
def __rsub__(self, other):...
def _mul_epsilon(self, k=1): ...
def __mul__(self, other): ...
def __rmul__(self, other): ...
def __pow__(self, power):
if power<0:
return self.__pow__(-power).inverse()
elif power==0:
return Dual(*([1]+[0]*(self.n-1)))
elif power%2==0:
return (self*self).__pow__(power/2)
else:
return self * (self).__pow__(power-1)
def inverse(self):
a = self._params[0]
D = Dual(*([0]+list(self._params[1:])))
return sum((-D / a)**k for k in range(self.n)) / a
def __truediv__(self, other):
if isinstance(other, Dual):
return self * other.inverse()
else:
return Dual(*[p / other for p in self._params])
def __rtruediv__(self, other):
return self.inverse() * other
Finally, noticing that a number
\(
a_0+a_1\epsilon+\ ...\ +a_n\epsilon^n
\)
transforms under any function \(f\) as
We have now all we need to attempt to auto-differentiate a function.
Let us take as an example the function
\[
f(x) = \frac{\sin(x)^2}{(x^2-x+1)},
\]
with derivatives
we can automatically obtain the first three derivatives in a given interval
x, recalling that the coefficient of \(\epsilon^k\) is \(f^{(k)}(x)/k!\). Thus:
AD error: 2.220446049250313e-16
FD error: 0.020607168256352445
Fig.1: Plots of the function \(f(x)\) and its first three derivatives
computed with automatic differentiation.
for k in range(4):
plt.plot(x, Dual_f._params[k]*np.math.factorial(k))
plt.show()
In Python one finds a number of pre-packaged libraries for automatic differentiation
such as JAX, or within broader ML
libraries such as Aesara (ex-Theano), TensorFlow, and PyTorch.