LUCA MINGARELLI
X

Automatic Differentiation

Consider the Taylor expansion of a function f:
f(x+ϵ)=f(x)+f(x)ϵ+12!f(x)ϵ2+O(ϵ3).
Assuming we can find a nilpotent object ϵ0 such that ϵ2=0, one could then reduce the above expression exactly to
f(x+ϵ)=f(x)+f(x)ϵ.
How is this useful? Consider as an example the quadratic function f(x)=x2, and assume we are interested in knowing the value of its derivative f(p) in p. In this case we have:
f(p+ϵ)=(p+ϵ)2=p2+2pϵ+ϵ2=p2+2pϵ=f(p)+f(p)ϵ.
Thus f(p)=2p.
Numbers of the form a+bϵ are known as dual numbers and accept a representation in the space of 2×2 matrices. In particular any dual number a+bϵ has representation
a1+b(0100), where ϵ=(0100)
satisfies the desire property ϵ0 and ϵ2=0.
More generally, higher n-th order derivatives can be computed as well. One needs however to consider a more general class of hyper-numbers of the form a0+a1ϵ+...+anϵn with ϵk0 for any kn and ϵn+1=0. Once again, these numbers accept a representation in the space of (n+1)×(n+1) matrices R(n+1)×(n+1). In particular the elements of matrix ϵ are given by
ϵi,j={1,if j=i+10,otherwise
so that for example, for n=2 one has
ϵ=(010001000), ϵ2=(001000000), ϵ3=0,
for n=3
ϵ=(0100001000010000), ϵ2=(0010000100000000), ϵ3=(0001000000000000), ϵ4=0,
and so forth at higher orders.

A Python implementation

In light of this, it should become apparent the appeal of automatic differentiation for exact numerical computation of a function's derivative, without the need for symbolic manipulation of algebraic expressions.
Let us therefore write a Python class Dual implementing the algebras we have introduced above. One can represent the number a0+a1ϵ+...+an1ϵn1 as the collection of n parameters ai. In the following we shall refer to a hyper-number such that ϵn=0 as a number of order n. First of all, let us start by writing a function _generate_epsilon to generate the matrix representations of each ϵi:
import numpy as np
import scipy.sparse as sp
from functools import lru_cache

@lru_cache()
def _generate_epsilon(n=None, diff_order=1):
    row = np.arange(n - diff_order)
    col = row + diff_order
    data = np.ones(n - diff_order)
    return sp.coo_matrix((data, (row, col)), shape=(n, n)).toarray()
We can therefore start writing our Dual class which will store the number's parameters ai in the attribute _params, and the associated matrices ϵi in eps_all:
class Dual:
    def __init__(self, *args):
        if not args:
            args = [0]
        self._params = args
        self.n = len(args)
        self.eps_all = [_generate_epsilon(self.n, k) for k in range(1, self.n)]
We will therefore be able to generate an instance of our number by passing the corresponding parameters as arguments. As an example, the number 1+2ϵ+3ϵ2 will be instantiated as Dual(1,2,3). In order to return a formatted representation of this object we can also override the magic method __repr__ as
class Dual:
    def __init__(self, *args): ...

    def __repr__(self):
        SUPERSCRIPTS = ["⁰", "¹", "²", "³", "⁴",
                        "⁵", "⁶", "⁷", "⁸", "⁹"]
        repr = f'{self._params[0]}'
        for k, p in enumerate(self._params[1:]):
            repr += f" + {p}ε{''.join([SUPERSCRIPTS[int(q)]
                                       for q in list(str(k + 1))])
                              if k > 0 else ''}"
        return repr
so that, as an example, the code Dual(1, 2, 3) will return 1 + 2ε + 3ε². More importantly however, we would like to access the full matrix representation of the number, as well as to be able to create a new instance from a matrix representation. This can be achieved by implementing the methods matrix_repr and from_matrix:
class Dual:
    def __init__(self, *args): ...
    def __repr__(self): ...

    def matrix_repr(self):
        return sum(p*ε for p, ε in zip(self._params[1:], self.eps_all)) + self._params[0]*np.eye(self.n)

    def from_matrix(self, M):
        self._params, self.n = M[0], len(M[0])
        self.eps_all = [_generate_epsilon(self.n, k) for k in range(1, self.n)]
        return self
Now Dual(1,2,3).matrix_repr() will return
array([[1., 2., 3.],
       [0., 1., 2.],
       [0., 0., 1.]])
and an instance can also be created from a numpy.array or numpy.matrix (say M) as Dual().from_matrix(M).
Let us now turn to implementing the algebra. Addition and subtraction operations are trivial:
class Dual:
    def __init__(self, *args): ...
    def __repr__(self): ...
    def matrix_repr(self): ...
    def from_matrix(self, M): ...

    def __add__(self, other):
        if isinstance(other, Dual):
            return Dual(*[p1 + p2 for p1, p2 in zip(self._params, other._params)])
        else:
            return Dual(*([self._params[0] + other] + list(self._params[1:])))


    def __radd__(self, other):
        return self + other

    def __neg__(self):
        return Dual(*[-p for p in self._params])

    def __sub__(self, other):
        return self.__add__(other.__neg__())

    def __rsub__(self, other):
        return -self + other
Multiplication and division are instead slightly more involved. One straightforward option of course could be to let a matrix algebra package handle the operations, for example by defining multiplications as
def slow_multiplication(D1, D2):
    prod = D1.matrix_repr() @ D2.matrix_repr()
    return Dual().from_matrix(prod)
and similarly for division. However this is slow, and we can do better. In particular recalling the nilpotence of all ϵi we might realise that multiplying a number by ϵ simply amounts to shifting the vector parameter to the right. As an example, the order-3 number (1+2ϵ+3ϵ2)ϵ=1ϵ+2ϵ2 so that the associated parameters vector transforms as (1,2,3)(0,1,2). Similarly, multiplying by ϵ2 would entail the transformation of the parameters vector (1,2,3)(0,0,1), and so forth. Consequently, multiplication is easily and efficiently implemented as
class Dual:
    def __init__(self, *args): ...
    def __repr__(self): ...
    def matrix_repr(self): ...
    def from_matrix(self, M): ...
    def __add__(self, other): ...
    def __radd__(self, other): ...
    def __neg__(self): ...
    def __sub__(self, other): ...
    def __rsub__(self, other):...

    def _mul_epsilon(self, k=1):
        if k==0: return self
        return Dual(*([0]*k + [p for p in self._params[:-k]]))

    def __mul__(self, other):
        if isinstance(other, Dual):
            return sum(self._mul_epsilon(k)*p2 for k, p2 in enumerate(other._params))
        else:
            return Dual(*[p * other for p in self._params])

    def __rmul__(self, other):
        return self * other
without the need for matrix algebra. Division instead reduces to computing the inverse of an object of the class Dual. Once again, one could think of making use of matrix algebra, which would remind us that the inverse of a matrix might not exist in the general case. However, the reader might realise we are dealing with square upper triangular matrices, which makes our task much easier. In any case, once again for the sake of efficiency, we shall try to avoid using matrix algebra. Here the crucial point, is to notice that our numbers accept matrix representation of the form a01+D, where the matrix D=in1aiϵi is nilpotent, that is Dn=0. The inverse can therefore be written as the truncated geometric sum
(a01+D)1=1a0(1Da0+(Da0)2+ ... +(Da0)n1).
Thus, introducing the power method __pow__, one can efficiently compute an inverse and implement the division operation between two numbers d1, d2 as d1d21:
class Dual:
    def __init__(self, *args): ...
    def __repr__(self): ...
    def matrix_repr(self): ...
    def from_matrix(self, M): ...
    def __add__(self, other): ...
    def __radd__(self, other): ...
    def __neg__(self): ...
    def __sub__(self, other): ...
    def __rsub__(self, other):...
    def _mul_epsilon(self, k=1): ...
    def __mul__(self, other): ...
    def __rmul__(self, other): ...

    def __pow__(self, power):
        if power<0:
            return self.__pow__(-power).inverse()
        elif power==0:
            return Dual(*([1]+[0]*(self.n-1)))
        elif power%2==0:
            return (self*self).__pow__(power/2)
        else:
            return self * (self).__pow__(power-1)

    def inverse(self):
        a = self._params[0]
        D = Dual(*([0]+list(self._params[1:])))
        return sum((-D / a)**k for k in range(self.n)) / a

    def __truediv__(self, other):
        if isinstance(other, Dual):
            return self * other.inverse()
        else:
            return Dual(*[p / other for p in self._params])

    def __rtruediv__(self, other):
        return self.inverse() * other
Finally, noticing that a number a0+a1ϵ+ ... +anϵn transforms under any function f as
f(a0+a1ϵ+ ... +anϵn)=f(a0)+f(a0)a1ϵ+ ... +f(n)(a0)anϵn
other more complex functions such as sin, cos, and exp, can be easily implemented as:
class Dual:
    def __init__(self, *args): ...
    def __repr__(self): ...
    def matrix_repr(self): ...
    def from_matrix(self, M): ...
    def __add__(self, other): ...
    def __radd__(self, other): ...
    def __neg__(self): ...
    def __sub__(self, other): ...
    def __rsub__(self, other):...
    def _mul_epsilon(self, k=1): ...
    def __mul__(self, other): ...
    def __rmul__(self, other): ...
    def __pow__(self, power): ...
    def inverse(self): ...
    def __truediv__(self, other): ...
    def __rtruediv__(self, other): ...

    def exp(self):
        return Dual(*([np.exp(self._params[0])] +
                      [np.exp(self._params[0]) * p / np.math.factorial(n + 1)
                       for n, p in enumerate(self._params[1:])]))

    def sin(self):
        f_list = [np.cos, lambda k: -np.sin(k), lambda k: -np.cos(k), np.sin]
        return Dual(*([np.sin(self._params[0])] +
                      [f(self._params[0])*p / np.math.factorial(n+1)
                       for n, (p,f) in enumerate(zip(self._params[1:],
                                                     f_list*self.n))]))

    def cos(self):
        return (np.pi/2 - self).sin()
We have now all we need to attempt to auto-differentiate a function. Let us take as an example the function f(x)=sin(x)2(x2x+1), with derivatives
f(x)=sin2(x)(12x)(x2x+1)2+2sin(x)cos(x)(x2x+1),f(x)=(6x26x)sin2(x)(x2x+1)3+(4x2)sin(2x)(x2x+1)2+2cos(2x)(x2x+1),f(x)=6(2x1)3sin2(x)(x2x+1)4+12(2x1)2sin(x)cos(x)+(24x12)sin2(x)(x2x+1)3+(612x)(cos2(x)sin2(x))2sin(x)cos(x)(x2x+1)2+8sin(x)cos(x)(x2x+1),..
Therefore, defining the function
def f(x):
    return x.sin()**2/ (x**2 - x +1)
we can automatically obtain the first three derivatives in a given interval x, recalling that the coefficient of ϵk is f(k)(x)/k!. Thus:
x = np.linspace(0, 2, 100)
f_Dual = f(Dual(x, 1, 1, 1))
A comparison with the analytically derived first derivative reveals an error in the order of the machine error:
df = (np.sin(x) *(2 *(x**2 - x + 1)* np.cos(x) + (1 - 2* x)* np.sin(x)))/(x**2 - x + 1)**2
print('AD error: ', np.abs(df - Dual_f._params[1]).max())
print('FD error: ', np.abs(df - np.gradient(f(x),x)).max())
AD error:  2.220446049250313e-16
FD error:  0.020607168256352445
copulae_grid
Fig.1: Plots of the function f(x) and its first three derivatives computed with automatic differentiation.
for k in range(4):
    plt.plot(x, Dual_f._params[k]*np.math.factorial(k))
plt.show()
In Python one finds a number of pre-packaged libraries for automatic differentiation such as JAX, or within broader ML libraries such as Aesara (ex-Theano), TensorFlow, and PyTorch.

References

[1] "Automatic Differentiation", Dan Pipponi, A NEIGHBORHOOD OF INFINITY Blog, 2005
[2] "Automatic Differentiation Does Incur Truncation Errors (kinda)", Lyndon White

Back to Teaching