Assuming we can find a nilpotent object such that ,
one could then reduce the above expression exactly to
How is this useful?
Consider as an example the quadratic function ,
and assume we are interested in knowing the value
of its derivative in .
In this case we have:
Thus .
Numbers of the form are known as
dual numbers
and accept a representation in the space of matrices.
In particular any dual number has representation
𝟙
where
satisfies the desire property and .
More generally, higher -th order derivatives can be computed as well.
One needs however to consider a more general class of hyper-numbers
of the form with
for any and .
Once again, these numbers accept a representation in the space of
matrices .
In particular the elements of matrix are given by
so that for example, for one has
for
and so forth at higher orders.
A Python implementation
In light of this, it should become apparent the appeal of automatic differentiation
for exact numerical computation of a function's derivative, without the
need for symbolic manipulation of algebraic expressions.
Let us therefore write a Python class Dual implementing
the algebras we have introduced above. One can represent the number
as the collection of
parameters . In the following we shall refer to a hyper-number
such that as a number of order .
First of all, let us start by writing a function _generate_epsilon
to generate the matrix representations of each :
import numpy as np
import scipy.sparse as sp
from functools import lru_cache
@lru_cache()def_generate_epsilon(n=None, diff_order=1):
row = np.arange(n - diff_order)
col = row + diff_order
data = np.ones(n - diff_order)return sp.coo_matrix((data,(row, col)), shape=(n, n)).toarray()
We can therefore start writing our Dual class
which will store the number's parameters
in the attribute _params, and the associated matrices
in eps_all:
We will therefore be able to generate an instance of our number
by passing the corresponding parameters as arguments.
As an example, the number will be instantiated as
Dual(1,2,3).
In order to return a formatted representation of this object we can also
override the magic method __repr__
as
classDual:def__init__(self,*args):...def__repr__(self):
SUPERSCRIPTS =["⁰","¹","²","³","⁴","⁵","⁶","⁷","⁸","⁹"]repr=f'{self._params[0]}'for k, p inenumerate(self._params[1:]):repr+= f" +{p}ε{''.join([SUPERSCRIPTS[int(q)]for q inlist(str(k +1))])if k >0else''}"
returnrepr
so that, as an example, the code Dual(1, 2, 3) will return
1 + 2ε + 3ε². More importantly however,
we would like to access the full matrix representation of the number,
as well as to be able to create a new instance from a matrix representation.
This can be achieved by implementing the methods matrix_repr
and from_matrix:
classDual:def__init__(self,*args):...def__repr__(self):...defmatrix_repr(self):returnsum(p*ε for p, ε inzip(self._params[1:], self.eps_all))+ self._params[0]*np.eye(self.n)deffrom_matrix(self, M):
self._params, self.n = M[0],len(M[0])
self.eps_all =[_generate_epsilon(self.n, k)for k inrange(1, self.n)]return self
Now Dual(1,2,3).matrix_repr() will return
array([[1., 2., 3.],
[0., 1., 2.],
[0., 0., 1.]])
and an instance can also be created from a numpy.array
or numpy.matrix (say M) as Dual().from_matrix(M).
Let us now turn to implementing the algebra.
Addition and subtraction operations are trivial:
classDual:def__init__(self,*args):...def__repr__(self):...defmatrix_repr(self):...deffrom_matrix(self, M):...def__add__(self, other):ifisinstance(other, Dual):return Dual(*[p1 + p2 for p1, p2 inzip(self._params, other._params)])else:return Dual(*([self._params[0]+ other]+list(self._params[1:])))def__radd__(self, other):return self + other
def__neg__(self):return Dual(*[-p for p in self._params])def__sub__(self, other):return self.__add__(other.__neg__())def__rsub__(self, other):return-self + other
Multiplication and division are instead slightly more involved.
One straightforward option of course could be to let a matrix algebra package
handle the operations, for example by defining multiplications as
and similarly for division. However this is slow, and we can do better.
In particular recalling the nilpotence of all we might
realise that multiplying a number by simply amounts to
shifting the vector parameter to the right.
As an example, the order- number
so that the associated parameters vector transforms as .
Similarly, multiplying by would entail the transformation of the
parameters vector , and so forth.
Consequently, multiplication is easily and efficiently implemented as
classDual:def__init__(self,*args):...def__repr__(self):...defmatrix_repr(self):...deffrom_matrix(self, M):...def__add__(self, other):...def__radd__(self, other):...def__neg__(self):...def__sub__(self, other):...def__rsub__(self, other):...def_mul_epsilon(self, k=1):if k==0:return self
return Dual(*([0]*k +[p for p in self._params[:-k]]))def__mul__(self, other):ifisinstance(other, Dual):returnsum(self._mul_epsilon(k)*p2 for k, p2 inenumerate(other._params))else:return Dual(*[p * other for p in self._params])def__rmul__(self, other):return self * other
without the need for matrix algebra.
Division instead reduces to computing the inverse of an object of the class
Dual. Once again, one could think of making use of
matrix algebra, which would remind us that the inverse of a matrix might not
exist in the general case. However, the reader might realise we are dealing
with square upper triangular matrices, which makes our task much easier.
In any case, once again for the sake of efficiency,
we shall try to avoid using matrix algebra.
Here the crucial point, is to notice that our numbers accept matrix representation
of the form 𝟙, where the matrix
is nilpotent, that is .
The inverse can therefore be written as the truncated geometric sum
𝟙𝟙
Thus, introducing the power method __pow__,
one can efficiently compute an inverse and implement
the division operation between two numbers as :
classDual:def__init__(self,*args):...def__repr__(self):...defmatrix_repr(self):...deffrom_matrix(self, M):...def__add__(self, other):...def__radd__(self, other):...def__neg__(self):...def__sub__(self, other):...def__rsub__(self, other):...def_mul_epsilon(self, k=1):...def__mul__(self, other):...def__rmul__(self, other):...def__pow__(self, power):if power<0:return self.__pow__(-power).inverse()elif power==0:return Dual(*([1]+[0]*(self.n-1)))elif power%2==0:return(self*self).__pow__(power/2)else:return self *(self).__pow__(power-1)definverse(self):
a = self._params[0]
D = Dual(*([0]+list(self._params[1:])))returnsum((-D / a)**k for k inrange(self.n))/ a
def__truediv__(self, other):ifisinstance(other, Dual):return self * other.inverse()else:return Dual(*[p / other for p in self._params])def__rtruediv__(self, other):return self.inverse()* other
Finally, noticing that a number
transforms under any function as
other more complex functions such as sin,
cos, and exp, can be easily implemented as:
We have now all we need to attempt to auto-differentiate a function.
Let us take as an example the function
with derivatives
Therefore, defining the function
deff(x):return x.sin()**2/(x**2- x +1)
we can automatically obtain the first three derivatives in a given interval
x, recalling that the coefficient of is . Thus:
x = np.linspace(0,2,100)
f_Dual = f(Dual(x,1,1,1))
A comparison with the analytically derived first derivative
reveals an error in the order of the machine error:
df =(np.sin(x)*(2*(x**2- x +1)* np.cos(x)+(1-2* x)* np.sin(x)))/(x**2- x +1)**2print('AD error: ', np.abs(df - Dual_f._params[1]).max())print('FD error: ', np.abs(df - np.gradient(f(x),x)).max())
AD error: 2.220446049250313e-16
FD error: 0.020607168256352445
Fig.1: Plots of the function and its first three derivatives
computed with automatic differentiation.
for k inrange(4):
plt.plot(x, Dual_f._params[k]*np.math.factorial(k))
plt.show()
In Python one finds a number of pre-packaged libraries for automatic differentiation
such as JAX, or within broader ML
libraries such as Aesara (ex-Theano), TensorFlow, and PyTorch.