Copulae — Concepts and examples

Definitions and basic properties

Definition 1 — Copula
A \(d\)-dimensional copula is a multivariate cumulative distribution function \(C:[0,1]^d\rightarrow[0,1]\) with marginals which are uniformly distributed on the unit interval: \begin{equation} C(u_1, ..., u_d)=\mathbb{P}(U_1\le u_1, ..., U_d\le u_d), \end{equation} where \(U_j\sim\mathcal{U}[0,1], \ \forall j\).

Definition 2 — Copula density
Given a differentiable copula \(C\) the copula density \(c\) is defined by: \begin{equation} c(u_1, ..., u_d) = \frac{\partial^dC(u_1, ..., u_d)}{\partial u_1...\partial u_d}. \label{eq:copula_density} \end{equation}
It follows that \(C(u_1, ..., u_d)\) is monotonically increasing in each component \(u_i=C(1,...,u_i, ..., 1)\). Moreover, the copula function \(C\) is isotonic, that is \(u_j\le v_j\ \forall j \implies C(u_1, ..., u_d)\le C(v_1, ..., v_d)\). Finally, notice that (under mild regularity conditions) one has the following proposition.

Proposition 3
Given a \(d\)-dimensional copula \(C\) one obtains the conditional cumulative distribution function \(F(\mathbf{u}|U_j=u_j)=\mathbb{P}(U_1\le u_1, ..., U_d\le u_d\ |\ U_j=u_j) \) as \begin{equation} F(\mathbf{u}|U_j=u_j)=\frac{\partial}{\partial u_j} C(u_1, ..., u_d). \label{eq:conditional_copula} \end{equation}

In order to better understand the significance of Definition 1 let us consider a two-dimensional random vector \((X_1, X_2)\) with \(X_j\sim F_j\) and denoting the joint CDF by \(F\). Applying the probability integral transform one can find two marginals \(U_j=F_j(X_j)\) uniform on the unit interval so that one can write \begin{equation} F(x_1, x_2)=C(F_1(x_1), F_2(x_2)). \label{eq1} \end{equation} Therefore, a copula is the recipe needed to link the marginals to the joint distribution.
Exercise 1
Show that \(U_j=F_j(X_j)\) implies \(U_j\sim\mathcal{U}[0,1]\).
Start by considering the CDF \(F_{U_j}\) of \(U_j\). Then: \begin{align*} F_{U_j}(u_j) &= \mathbb{P}(U_j\le u_j) \newline &= \mathbb{P}(F_j(X_j)\le u_j) \newline &= \mathbb{P}(X_j\le F_j^{-1}(u_j)) \newline &= F_j(F_j^{-1}(u_j)) = u_j. \end{align*} Because \(F_j(X_j)\in [0, 1]\) one can see that \(F_{U_j}(u_j) = u_j\) is the CDF of a uniform distribution on the unit interval. Therefore \(U_j\sim\mathcal{U}[0,1]\).

Exercise 2
Show that equation \eqref{eq1} holds.
Notice that \begin{align*} F(x_1, x_2) &= \mathbb{P}(X_1 \leq x_1, X_2 \leq x_2) \newline &= \mathbb{P}(U_1 \leq F_1(x_1), U_2 \leq F_2(x_2)) \newline &= C(F_1(x_1), F_2(x_2)). \end{align*}

Exercise 3
Consider two uniform random variables \(U_1\) and \(U_2\) with known differentiable copula \(C\). Assuming \(U_1=u_1\) is observed, prove \eqref{eq:conditional_copula} in Proposition 3 for this \(d=2\) case.
We have: \begin{align*} \mathbb{P}(U_2\le u_2\ |\ U_1=u_1) &= \lim_{\epsilon\rightarrow 0}\frac{\mathbb{P}(U_2\le u_2,\ U_1\in\ ]u_1-\epsilon, u_1+\epsilon])}{\mathbb{P}(U_1\in\ ]u_1-\epsilon, u_1+\epsilon])} \newline &= \lim_{\epsilon\rightarrow 0}\frac{C(u_1+\epsilon,u_2)-C(u_1-\epsilon,u_2)}{2\epsilon} \newline &= \frac{\partial}{\partial u_1}C(u_1, u_2). \end{align*}



Equation \eqref{eq1} can be formalised into the following theorem.

Theorem 4 — Sklar
Given a \(d\)-dimensional cumulative distribution function \(F\), with marginals \(F_j\), there exists a copula \(C\) such that \begin{equation} F(x_1, ..., x_d) = C(F_1(x_1), ..., F_d(x_d)), \label{eq:sklar} \end{equation} for all \(x_j\in\mathbb{R}\). Similarly, given a copula \(C\) and a set of \(d\) univariate cumulative distribution functions labelled \(F_1, ..., F_d\), equation \eqref{eq:sklar} uniquely defines a joint multivariate cumulative distribution function \(F\) with marginals \(F_1, ..., F_d\).

One has the following corollaries.

Corollary 4.1
From Sklar's theorem it follows that given a joint cumulative distribution function \(F\) and its marginals, the copula is defined by \begin{equation} C(u_1, .., u_d) = F\left(F^{-1}_1(u_1), ..., F^{-1}_d(u_d)\right), \end{equation} where \(F^{-1}_j(u_j) = \inf\{x: F_j(x)\ge u_j\}\).

Corollary 4.2
From Sklar's theorem it follows that given the marginal cumulative distribution functions \(F_i\) and the associated marginal probability density functions \(f_i\), the joint probability density function \(f\) can be written as \begin{equation} f(x_1, .., x_d) = c\left(F_1(x_1), ..., F_d(x_d)\right)\prod_{i=1}^df_i(x_i), \label{eq:density_from_sklar} \end{equation} where \(c\) is the copula density defined in \eqref{eq:copula_density}.

Fundamental copulae

Let us now consider three fundamental examples corresponding to the cases of perfect dependence and independence. As we shall discuss, these are extreme cases which provide bounds to any other copula, and constitute therefore preliminary definitions for some results which will be presented later. First we consider the case of independent random variables.

Definition 5 — Independence copula
The joint distribution of \(d\) independent random variables is defined by the independence copula \begin{equation} C_{\text{ind}}(u_1, ..., u_d) = \prod_{i=1}^d u_i. \label{eq:independence_copula} \end{equation}
When all the random variables considered are deterministically related \(X_j = T_{ji}(X_i)\) for some set of strictly increasing transformations \(T_{ji}=T_{ij}^{-1}\), one can derive a relation between each pair of cumulative distribution functions as
\[F_i(x) = \mathbb{P}(X_i\le x) = \mathbb{P}(T_{ji}(X_i)\le T_{ji}(x)) = \mathbb{P}(X_j\le T_{ji}(x)) = F_j(T_{ji}(x)),\]
which for all \(i,j\) implies \[U_i = F_i(X_i) = F_j(T_{ji}(X_i)) = F_j(X_j) = U_j.\] Similarly, consider two random variables such that \(X_2=D(X_1)\); for any strictly decreasing transformation \(D\) one has
\[F_1(x) = \mathbb{P}(X_1 \leq x) = \mathbb{P}(D(X_1) \geq D(x)) = \mathbb{P}(X_2 \geq D(x)) = 1 - F_2(D(x)),\]
which implies
\[U_1 = F_1(X_1) = 1-F_2(D(X_1)) = 1-F_2(X_2) = 1-U_2.\]
This motivates the definition of the following comonotonicity and counter-monotonicity copulae.

Definition 6 — Comonotonicity copula
The copula associated with \(d\) equal uniformly distribute variables \(U_i=U_j,\ \forall\ i,j\), is called a comonotonicity copula and takes the form \begin{equation} C_{\text{co}}(u_1, ..., u_d) = \min\{u_1, ..., u_d\}. \label{eq:comonotonicity_copula} \end{equation}
Definition 7 — Counter-monotonicity function
The copula associated with \(d=2\) uniformly distributed random variables \(U_2=1-U_1\) is called a counter-monotonicity copula as takes the form \begin{equation} W_{\text{counter}}(u_1, u_2) = \max\{u_1 + u_2 -1, 0\}. \label{eq:counter_monotonicity_copula} \end{equation} A generalisation of \eqref{eq:counter_monotonicity_copula} to arbitrary \(d\) is given by the counter-monotonicity function \begin{equation} W_{\text{counter}}(u_1, ..., u_d) = \max\{\sum_{i=1}^d u_i - d + 1, 0\}; \label{eq:counter_monotonicity_function} \end{equation} notice however that \eqref{eq:counter_monotonicity_function} is not a copula.


Exercise 4
Show that given two independent random variables \(X_1\) and \(X_2\) the associated joint multivariate cumulative distribution function takes indeed the form \eqref{eq:independence_copula} from Definition 3.
The joint multivariate cumulative distribution funtion is \(F(x_1, x_2) = C(F_1(x_1), F_2(x_2))\). The independence of the two random variables \(X_1\perp\!\!\!\perp X_2\) also implies the associated uniform random variables \(U_j=F_j(X_j)\) are also independent: \(U_1 \perp\!\!\!\perp U_2\). Therefore, one has: \begin{align*} C(u_1, u_2) &= \mathbb{P}(U_1 \leq u_1, U_2 \leq u_2) \newline &= \mathbb{P}(U_1 \leq u_1) \mathbb{P}(U_2 \leq u_2) \newline &= u_1 u_2. \end{align*} Hence, \(F(x_1, x_2) = F_1(x_1)F_2(x_2)\).

Exercise 5
Show that Definition 6 indeed is the copula associated with perfectly correlated random variables \(U_1=U_2\) for the case \(d=2\).
Notice that \begin{align*} C(u_1, u_2) &= \mathbb{P}(U_1 \leq u_1, U_2 \leq u_2) \newline &= \mathbb{P}(U_1 \leq u_1, U_1 \leq u_2) \newline &= \mathbb{P}(U_1 \leq \min \{u_1, u_2\}) \newline &= \min \{u_1, u_2\}. \end{align*}

Exercise 6
Show that \eqref{eq:counter_monotonicity_copula} in Definition 7 is indeed the copula associated with two random variables related by \(U_2=1-U_1\).
Notice that \begin{align*} C(u_1, u_2) &= \mathbb{P}(U_1 \leq u_1, U_2 \leq u_2) \newline &= \mathbb{P}(U_1 \leq u_1, 1-U_1 \leq u_2) \newline &= \mathbb{P}(1 - u_2 \leq U_1 \leq u_1) \newline &= \max \{u_1 + u_2 - 1, 0\}. \end{align*}


As mentioned at the beginning of this section, these are extreme cases which are fundamental as they provide bounds to any other copula. This concept is formalised by the following theorem.

Theorem 8 — Fréchet-Hoeffding bounds
Given a copula \(C(\mathbf{u}) = C(u_1,..., u_d)\) one always has \begin{equation} W_{\text{counter}}(\mathbf{u})\le C(\mathbf{u}) \le C_{\text{co}}(\mathbf{u}). \label{eq:Frechet_Hoeffding_bounds} \end{equation}
copulae_grid
Fig.1: The bivariate countermonotonicity, independence, and comonotonicity copulae. All copulae exist as intermediate cases between these three.
Finally, let us consider the following proposition.

Proposition 9
Consider a random vector \(\mathbf{X}\) with copula \(C\), and let \(T_1, ..., T_d\) be a set of \(d\) strictly increasing transformations. Then, the copula of the transformed vector \((T_1(X_1), ..., T_d(X_d))\) is also \(C\).


Parametric families of copulae

The following table provides some examples of notable parametric families.

Family CDF Parameter
Elliptical copulae
Gaussian \(\Phi_2(\Phi^{-1}(u_1),\ \Phi^{-1}(u_2);\ \rho)\) \(\rho \in [0,1] \)
Student-t \(T_{2,\nu}\left(T_{1,\nu}^{-1}(u_1),\ T_{1,\nu}^{-1}(u_2);\ \rho\right)\) \(\rho \in [0,1],\ \nu\ge1 \)
Archimedean copulae
Clayton \( \max\left\{u_1^{-\theta}+u_2^{-\theta} - 1,\ 0\right\}^{-1/\theta} \) \(\theta \in [-1, \infty[ \backslash \{0\} \)
Gumbel \( \exp \left( -\left( (-\log u_1)^{\theta} + (-\log u_2)^{\theta} \right)^{1/\theta} \right) \) \(\theta \in [1, \infty[ \)
Joe \( 1 - \left( (1-u_1)^{\theta} + (1-u_2)^{\theta} - (1-u_1)^{\theta} (1-u_2)^{\theta} \right)^{1/\theta} \) \(\theta \in [1, \infty[ \)
Frank \(-\theta^{-1} \log \left( \frac{1 - e^{-\theta} - (1 - e^{-\theta u_1}) (1 - e^{-\theta u_2})} {1-e^{-\theta}} \right) \) \(\theta\in\mathbb{R}\backslash \{0\}\)

We can recognise two main classes of copulae: elliptical and archimedean. Elliptical copulae are implicit copulae arising from elliptical distributions (such as the normal, Student-t, and mixtures of normal distribution) via Sklar's theorem. Elliptical copulae have the advantage of allowing for flexible modelling of pairwise dependency, are simple to sample from, and usually densities are known explicitely. On the other hand, the copula itself is not explicit, and is characterised by radial symmetry, which implies identical lower and upper tail behaviour, as it we shall discuss.

Archimedean copulae instead are explicit copulae which can be written in terms of some monotone generator \(\psi:[0,\infty[\rightarrow [0,1]\) such that \(\psi(0)=1\), \(\psi(\infty)=0\). The associated archimedean copula is then written as \begin{equation} C(u_1, ..., u_d) = \psi\left(\sum_{i=1}^d\psi^{[-1]}(u_i)\right), \end{equation} with the pseudo-inverse \(\psi^{[-1]}(x)=\psi^{-1}(x)\mathbb{1}_{x\le\psi(0)}\). As an example, the generators of the archimedean copulae in the table above are: \begin{align*} \psi_{\text{Clayton}}(x) &= \theta^{-1}(x^{-\theta}-1),\newline \psi_{\text{Gumbel}}(x) &= (-\log x)^\theta,\newline \psi_{\text{Joe}}(x) &= -\log(1-(1-x)^\theta),\newline \psi_{\text{Frank}}(x) &= -\log\frac{e^{\theta x}-1}{e^{-\theta}-1}. \end{align*} The explicit nature of Archimedean copulae makes them particularly suitable for closed form calculations, allowing for many of their properties to be explicitely computed in term of the generator \(\psi\). Moreover they are not anymore restricted to radial symmetries.
copulae_grid
Fig.2: Bivariate probability density functions with normal marginals and selected copulae.

import numpy as np, matplotlib.pyplot as plt, seaborn as sns
import scipy.stats as st
sns.set()

n = 1000
x1 = x2 = np.linspace(-2.7, 2.7, n)
dx = x1[-1] - x2[-2]
u1, u2 = st.norm.cdf(x1), st.norm.cdf(x2)

# INDEPENDENCE COPULA
def Ind_cop(u1, u2):
    U1, U2 = np.meshgrid(u1, u2)
    return U1 * U2

# GAUSSIAN COPULA
def Gauss_cop(u1, u2, rho=0.5):
    """Rho in [0, 1]"""
    iU1, iU2 = np.meshgrid(st.norm.ppf(u1), st.norm.ppf(u2))
    jnt = st.multivariate_normal.cdf(np.array([iU1, iU2]).T,
                                     mean=[0, 0], cov=[[1, rho], [rho, 1]])
    return jnt

# CLAYTON COPULA
def Clayton_cop(u1, u2, theta=1.):
    """Theta in [-1, inf[ \ {0}"""
    U1, U2 = np.meshgrid(u1, u2)
    jnt = (U1**(-theta) + U2**(-theta) - 1).clip(min=0) ** (-1/theta)
    return jnt

# GUMBEL COPULA
def Gumbel_cop(u1, u2, theta=1.):
    """Theta in [1, inf["""
    U1, U2 = np.meshgrid(u1, u2)
    jnt = np.exp(-((-np.log(U1))**theta
                   +(-np.log(U2))**theta)**(1/theta))
    return jnt

# JOE COPULA
def Joe_cop(u1, u2, theta=1.):
    """Theta in [1, inf["""
    U1, U2 = np.meshgrid(u1, u2)
    jnt = 1 - ((1-U1)**theta + (1-U2)**theta - ((1-U1)*(1-U2))**theta) ** (1/theta)
    return jnt

# FRANK COPULA
def Frank_cop(u1, u2, theta=1.):
    """Theta in [-inf, inf[ \ {0}"""
    U1, U2 = np.meshgrid(u1, u2)
    jnt = -(1/theta) * np.log(1+((np.exp(-theta*U1) - 1)*(np.exp(-theta*U2) - 1))/(np.exp(-theta) - 1))
    return jnt

J = Clayton_cop(u1, u2, theta=5)
dJ = (np.diff(np.diff(J).T).T).clip(min=1e-10)
plt.contour(x1[1:], x2[1:], dJ, 10)
plt.show()


Exercise 7
Given a vector \((X_1, X_2)\) from the standard bivariate normal distribution with correlation \(\rho\), derive the gaussian copula presented in the table above.
The associated copula is given by \begin{align*} C(u_1, u_2) &= \mathbb{P}(U_1 \leq u_1, U_2 \leq u_2) \newline &= \mathbb{P}(X_1 \leq \Phi^{-1}(u_1), X_2 \leq \Phi^{-1}(u_2)) \newline &= \Phi_2(\Phi^{-1}(u_1), \Phi^{-1}(u_2); \rho). \end{align*}

Exercise 8
Show that the limiting copula of the Clayton copula \(C_\theta^{\text{Clayton}}(u_1, u_2)\) for \(\theta\rightarrow\infty\) is the comonotonicity copula \(C_{\text{co}}\) and for \(\theta\rightarrow -1\) is the countermonotonicity copula \(W_{\text{counter}}\). Moreover show that for \(\theta\rightarrow 0\) Clayton's copula tends to the independence copula.
The case \(\theta\rightarrow -1\) is trivial.

Let's now consider the case \(\theta\rightarrow \infty\). Assume \(u_1 \le u_2 \) in \([0,1]\). Then \[ u_1^{-\theta} \le \max\left\{u_1^{-\theta}+u_2^{-\theta} - 1,\ 0\right\} \le 2u_1^{-\theta}. \] This in turns implies \[ 2^{-1/\theta}u_1\le C_\theta^{\text{Clayton}}(u_1, u_2) \le u_1. \] Hence, one has \[ \lim_{\theta\rightarrow \infty}C_\theta^{\text{Clayton}}(u_1, u_2) = \text{min}\{u_1, u_2\} =C_{\text{co}} \]

Let's now consider the case \(\theta\rightarrow 0\). For \(u_1, u_2 \in [0,1]\) one has \[ u_j^{-\theta} = e^{-\theta\log u_j} \xrightarrow[]{\theta\rightarrow 0} 1-\theta\log u_j +o(\theta). \] Hence \begin{align*} \log C_\theta^{\text{Clayton}}(u_1, u_2) &= \frac{1}{\theta}\log(1-\theta\log(u_1u_2) + o(\theta)) \newline &\xrightarrow[]{\theta\rightarrow 0}\log(u_1u_2) \end{align*}


Simulating from Copulae

Let us now consider the essential recipes for drawing uniform random vectors \((U_1, ..., U_d)\) with a desired copula and uniform marginals, which can be useful e.g. to perform Monte Carlo simulations. Clearly, arbitrary marginals can be obtained via the quantile transformation as dictated by the probability integral transform discussed earlier.

Implicit copulae

For implicit copulae derived from a multivariate distribution such as the Gaussian and Student-t the procedure is quite straightforward.

Algorithm 10 — Gaussian Copula
  1. Given a correlation matrix \(\Pi\) perform a Cholesky decomposition, that is, find \(\Lambda\) such that \(\ \Pi = \Lambda^T\Lambda\);
  2. Generate iid standard normals \(\mathbf{G}= (G_1, ..., G_d)^T\);
  3. Compute \(\ \mathbf{X} = \Lambda\mathbf{G}\);
  4. Return the desired vector of \(\ U_i = \Phi(X_i)\), with \(\Phi\) the standard normal cumulative distribution function.
Algorithm 11 — Student-t Copula
  1. Given a correlation matrix \(\ \Pi\), generate a normal vector \(\mathbf{X}\) performing steps (1) to (4) from Algorithm 10;
  2. Generate \(\xi=\sum_{i=1}^\nu Y_i^2\), with \(Y_i \stackrel{\text{iid}}{\sim} \mathcal{N}(0,1)\), so that \(\xi\stackrel{\text{iid}}{\sim}\chi_\nu^2\);
  3. Return the desired vector \(\ U_i = t_\nu(X_i/\sqrt{\xi/\nu})\), with \(t_\nu\) the univariate Student-t cumulative distribution function.

Explicit copulae

The Archimedean copulae we have considered are actually part of a class of copulae known as Laplace Transform Archimedean Copulae (LT-Archimedean for short) where the inverse of the generator \(\phi^{[-1]}\) can be represented as a Laplace transform of some function \(G\): \begin{equation} \phi^{[-1]}(x) = \int_0^\infty e^{-tx}\text{d}G(t). \label{eq:LT-A} \end{equation} More specifically, one can show [4] that all completely monotone Archimedean generators are Laplace-Stieltjes transforms of distribution functions on \(\mathbb{R}^+\). With this in mind, we are now ready to consider an algorithm to sample from such copulae.

Algorithm 12 — LT-Archimedean Copulae
Given an LT-Archimedean copula with generator \(\phi\),
  1. Find a CDF \(G\) which satisfies \eqref{eq:LT-A};
  2. Generate a variate \(V\sim G\);
  3. Generate uniform iid \(X_1, ..., X_d\);
  4. Return the vector with elements \(U_i = \phi^{[-1]}\left(-\frac{\log X_i}{V}\right)\).
See [1] and [5] for more details.
For the special case of \(d=2\) one has however a simpler approach.

Algorithm 13 — 2D Explicit Copulae
Consider the conditional distribution \(F_{1}=\mathbb{P}(U_1\le u_1 | U_2=u_2)\). Recalling \eqref{eq:conditional_copula} from Proposition 3, this is equivalent to \begin{equation*} F_{1} = c_{1}(u_1, u_2) = \frac{\partial}{\partial u_2} C(u_1, u_2). \end{equation*} Therefore one obtains the desired \((U_1,\ U_2)\) via the following algorithm:
  1. Sample \(u_1, t \overset{\text{iid}}{\sim} \mathcal{U}[0,1]\);
  2. Compute \(u_2 = c_{1}^{[-1]}(u_1, t)\);


Example 1: gaussian copula and gaussian marginals
The simplest example is that in which we sample from a gaussian joint distribution, since in most programming languages this is often well implemented in libraries. Fig.3 below presents an example where the marginals follow standard normal distributions and the gaussian copula has linear correlation \(\rho=0.75\).
copulae_grid
Fig.3: Joint distribution with gaussian copula and standard normal marginals.

import pandas as pd, matplotlib.pyplot as plt, seaborn as sns
import scipy.stats as st
N = 1_000_000
mvnorm = st.multivariate_normal(mean=[0, 0], cov=[[1., 0.75],
                                                  [0.75, 1.]])
X = pd.DataFrame(mvnorm.rvs(N), columns=['X1', 'X2'])

jnt_kw = dict(height=5, ratio=5, space=0,
              color="tab:blue", alpha=0)

h = (sns.jointplot(data=X, x='X1', y='X2', marginal_kws=dict(bins=30),
                   xlim=(-3,3), ylim=(-3,3), kind='hex', gridsize=2, **jnt_kw)
        .plot_joint(sns.kdeplot, shade=False, cmap='viridis'))
plt.show()
Example 2: gaussian copula and non-gaussian marginals
In order to produce samples from a joint with gaussian copula but non-gaussian marginals we first need to obtaint the copula's uniform marginals \(U_i = \Phi(X_i)\).
copulae_grid
Fig.4: Gaussian copula with its uniform marginals.

import pandas as pd, matplotlib.pyplot as plt, seaborn as sns
import scipy.stats as st
N = 1_000_000
mvnorm = st.multivariate_normal(mean=[0, 0], cov=[[1., 0.75],
                                                  [0.75, 1.]])
X = pd.DataFrame(mvnorm.rvs(N), columns=['X1', 'X2'])

jnt_kw = dict(height=5, ratio=5, space=0,
              color="tab:blue", alpha=0)

U = pd.DataFrame(st.norm().cdf(X), columns=['U2', 'U1'])

h = (sns.jointplot(data=U, x='U1', y='U2', marginal_kws=dict(bins=30),
                   xlim=(0,1), ylim=(0,1),kind='hex', gridsize=2, **jnt_kw)
        .plot_joint(plt.hexbin, gridsize=30, cmap='Blues'))
plt.show()


Finally, the copula's uniforms can be transformed to the desired marginals. Take as an example \(\widetilde{X}_1\sim \text{Beta}(\alpha=3, \beta=10)\), and \(\widetilde{X}_2\sim \text{Gumbel}(\mu=0, \beta=1)\): Figure 5 below plots the joint density with these marginals and gaussian copula with \(\rho=0.75\), while Figure 6, plots these two marginals on the independence copula.
copulae_grid
Fig.5: Gaussian copula with \(\text{Beta}(\alpha=3, \beta=10)\) and \(\text{Gumbel}(\mu=0,\beta=1)\) marginals.
copulae_grid
Fig.6: Independence copula with \(\text{Beta}(\alpha=3, \beta=10)\) and \(\text{Gumbel}(\mu=0,\beta=1)\) marginals.

import pandas as pd, matplotlib.pyplot as plt, seaborn as sns
import scipy.stats as st
N = 1_000_000
mvnorm = st.multivariate_normal(mean=[0, 0], cov=[[1., 0.75],
                                                  [0.75, 1.]])
X = pd.DataFrame(mvnorm.rvs(N), columns=['X1', 'X2'])

jnt_kw = dict(height=5, ratio=5, space=0,
              color="tab:blue", alpha=0)

U = pd.DataFrame(st.norm().cdf(X), columns=['U2', 'U1'])

# With Gaussian Copula
x1_t = st.beta(a=3, b=10).ppf(U.U1)
x2_t = st.gumbel_l().ppf(U.U2)
X_trans = pd.DataFrame([x1_t, x2_t], index=['XT1', 'XT2']).T

h = (sns.jointplot(data=X_trans, x='XT1', y='XT2', kind='hex', gridsize=2,
                   marginal_kws=dict(bins=30),
                   xlim=(0, 0.6), ylim=(-5, 2.0),
                   **jnt_kw)
        .plot_joint(sns.kdeplot, cmap='viridis'))
plt.show()

# With Independence Copula
x1, x2 = st.beta(a=3, b=10).rvs(N), st.gumbel_l().rvs(N)
h = (sns.jointplot(x=x1, y=x2, kind='hex', marginal_kws=dict(bins=30),
                   xlim=(0,0.6), ylim=(-5,2.0),
                   **jnt_kw)
        .plot_joint(sns.kdeplot, cmap='viridis'))
plt.show()
Example 3: non-gaussian copula and non-gaussian marginals
At last let's replicate the last exercise replacing the gaussian copula with e.g. a Clayton copula. One can find [5] the correct cumulative distribution function \(G\) to be considered in this case is that of a Gamma distribution \(\Gamma(\theta^{-1},1)\).
copulae_grid
Fig.7: Clayton copula with \(\theta=2\).
copulae_grid
Fig.8: Clayton copula with \(\text{Beta}(\alpha=3, \beta=10)\) and \(\text{Gumbel}(\mu=0,\beta=1)\) marginals.

import pandas as pd, matplotlib.pyplot as plt, seaborn as sns
import scipy.stats as st

jnt_kw = dict(height=5, ratio=5, space=0,
              color="tab:blue", alpha=0)

N = 1_000_000
θ = 2
def clayton(θ, N, d=2):
    Γ = st.gamma.rvs(a=1/θ, scale=1, size=N)
    X = st.uniform.rvs(size=(d,N))
    return (1 - np.log(X) / Γ) ** (-1/θ)

U1, U2 = clayton(θ=θ, N=N, d=2)
U = pd.DataFrame([U1, U2], index=['U1', 'U2']).T

h = (sns.jointplot(data=U, x='U1', y='U2', marginal_kws=dict(bins=30),
                   xlim=(0,1), ylim=(0,1),
                   kind='hex', gridsize=2, **jnt_kw)
        .plot_joint(plt.hexbin, gridsize=30, cmap=my_cmap))
plt.show()


x1_t = st.beta(a=3, b=10).ppf(U.U1)
x2_t = st.gumbel_l().ppf(U.U2)
X_trans = pd.DataFrame([x1_t, x2_t], index=['XT1', 'XT2']).T

h = (sns.jointplot(data=X_trans, x='XT1', y='XT2', kind='hex', gridsize=2,
                   marginal_kws=dict(bins=30),
                   xlim=(0, 0.6), ylim=(-6, 2.0),
                   ** jnt_kw)
        .plot_joint(sns.kdeplot, cmap='viridis'))
plt.show()


In Python, the package OpenTURNS provides a fast implementation of copulas. As an example, one can perform sampling from the Clayton copula we have considered so far simply as it follows:
import openturns as ot, numpy as np

X = ot.ComposedDistribution([ot.Uniform(0, 1),
                             ot.Uniform(0, 1)],
                            ot.ClaytonCopula(2))

U1, U2 = np.array(X.getSample(1_000_000)).T
As usual the two uniform marginals can now be transformed into whichever marginals one might desire.

Estimation: Fitting Copulae to Data

The task of fitting a copula to data is a rather challenging one. Generally speaking, one approach could follow a Maximum Likelihood Estimation principle. For example, given the random vector \(\mathbf{X}=(X_1, ..., X_d)\), and assuming parametric models for the copula density \(c_{\mathbf{X}}(\cdot|\mathbf{\theta}_C)\) and for the marginal cumulative distribution functions \(F_1(\cdot|\mathbf{\theta}_1), ..., F_d(\cdot|\mathbf{\theta}_d)\), one can write an expression for the density of \(\mathbf{X}\) as in \eqref{eq:density_from_sklar}: \[ f_\mathbf{X}(\mathbf{x})=c_{\mathbf{X}}(F_{X_1}(x_1), ..., F_{X_d}(x_d))\prod_{i=1}^d f_{X_i}(x_i). \] Given a vector of \(n\) observations \(\mathbf{D}_{1:n}=(\mathbf{D}_1, ..., \mathbf{D}_n)\) one can then attempt to maximise the log-likelihood
\begin{align*} \log L(\mathbf{\theta}_1, ..., \mathbf{\theta}_d, \mathbf{\theta}_C) &= \prod_{i=1}^n f_{\mathbf{X}}(\mathbf{D}_i) \newline &= \sum_{i=1}^n\Big( \sum_{j=1}^d\log f_\mathbf{X_j}(D_{i,j}|\mathbf{\theta}_j) \newline &\phantom{=\sum_{i=1}^n\Big(l} +\log c_{\mathbf{X}}\Big(F_{X_1}(D_{i,1}|\mathbf{\theta}_1),..., F_{X_d}(D_{i,d}|\mathbf{\theta}_d)\Big|\mathbf{\theta}_C\Big) \Big), \end{align*}
so as to obtain the ML estimators \(\hat{\mathbf{\theta}}_1, ..., \hat{\mathbf{\theta}}_d, \hat{\mathbf{\theta}}_C\). The problem with this approach is found in the very large number of parameters to be estimated, which complicates optimisation. Moreover, the mispecification of one single univariate distributions \(F_{X_i}(\cdot|\mathbf{\theta}_i)\) can introduce biases for all other marginals as well as for the copula.
One alternative is provided by pseudo-MLE whereby one splitts the estimation into a two-step procedure, estimating the marginals first, and the copula in a second stage. In particular, one starts by estimating the marginal cumulative distribution functions \(F_{X_i}(\cdot|\mathbf{\theta}_i)\) yielding the estimator \(\hat{F}_{X_i}\), either using a parametric model to obtain a univariate ML-estimator for \(\hat{\mathbf{\theta}}_i\), or even considering the empirical cumulative distribution function. Then, one estimates the copula as a second step, again maximising the log-likelihood
\[ \prod_{i=1}^nf_\mathbf{X}(\mathbf{D}_i) = \sum_{i=1}^n \log\left(c_\mathbf{X}(\hat{F}_{X_1}(D_{i,1}), ..., \hat{F}_{X_d}(D_{i,d})| \mathbf{\theta}_C)\right). \]

Yet another alternative is moment-matching.

Proposition 14
For Gaussian and Student-t distributed \(\mathbf{X}=(X_1, ..., X_d)\), and Gaussian or Student-t copulae with covariance matrix \(\Omega\), the following holds: \begin{equation} \rho_\tau(X_i, X_j) = \frac{2}{\pi}\arcsin(\Omega_{ij}), \end{equation} where \(\rho_\tau\) denotes Kendall's tau.

A number of non-parametric methods exist as well, such as the empirical Bernstein copula [6] as well as kernel smoothing estimators. The Python package OpenTURNS provides utilities to fit these non-parametric copula models.

Estimation in practice: parametric estimates

Practically, python has a number of packages (although still in their infancy) which implement similar methods. Popular ones are Copulas, Copulae and OpenTURNS. Let's consider a few example with OpenTURNS. The simplest case is that of estimating the gaussian copula associated with a multivariate normal distribution, which effectively boils down to estimating the correlation matrix. The code
import openturns as ot, numpy as np, scipy.stats as st

np.random.seed(seed=0)
mvnorm = st.multivariate_normal(mean=[0, 0], cov=[[1., 0.75],
                                                  [0.75, 1.]])
X = mvnorm.rvs(10_000)

dist = ot.NormalCopulaFactory().build(X)
print(dist)
returns
      NormalCopula(R = [[ 1        0.754492 ]
                        [ 0.754492 1        ]])
The parameters can then be extracted as
print(dist.getParameter())
which returns
      [0.754492]
Similarly, for data with gaussian copula but non-gaussian marginals:
import openturns as ot, numpy as np, scipy.stats as st

np.random.seed(seed=0)
mvnorm = st.multivariate_normal(mean=[0, 0], cov=[[1., 0.75],
                                                  [0.75, 1.]])
X = mvnorm.rvs(10_000)
U = st.norm().cdf(X)
X_trans = np.array([st.beta(a=3, b=10).ppf(U[:,0]),
                    st.gumbel_l().ppf(U[:,1])]).T

dist = ot.NormalCopulaFactory().build(X_trans)
print(dist)
one obtains
      NormalCopula(R = [[ 1        0.754492 ]
                        [ 0.754492 1        ]])
Notice that having different non-gaussian marginals does not affect the estimates of the parameters.
copulae_grid
Fig.9: Convergence of the estimate of the linear correlation parameter \(\rho\) as the sample size \(N\) increases. The shaded area represents the one standard deviation error around the mean estimate.
Finally, consider the following example for the non-gaussian Clayton copula with \(\theta=2\) and \(\text{Beta}(\alpha=3, \beta=10)\) and \(\text{Gumbel}(\mu=0,\beta=1)\) marginals: the code
import openturns as ot, numpy as np, scipy.stats as st

J = ot.ComposedDistribution([ot.Uniform(0, 1),
                             ot.Uniform(0, 1)],
                            ot.ClaytonCopula(2))
U1, U2 = np.array(J.getSample(10_000)).T

X_trans = np.array([st.beta(a=3, b=10).ppf(U1),
                    st.gumbel_l().ppf(U2)]).T

dist = ot.ClaytonCopulaFactory().build(X_trans)
print(dist)
returns
      ClaytonCopula(theta = 1.98901)
    

Estimation in practice: non-parametric estimates

As briefly mentioned OpenTURNS also allows us to estimate copulae non-parametrically. Let us start by sampling from a bivariate normal distribution with linear correlation coefficient \(\rho=0.75\):
import openturns as ot, numpy as np, scipy.stats as st

N = 1_000
mvnorm = st.multivariate_normal(mean=[0, 0], cov=[[1., 0.75],
                                                  [0.75, 1.]])
X = mvnorm.rvs(N)
Now, one can extract a Bernstein non-parametric estimate simply as
Bs_Est = ot.EmpiricalBernsteinCopula(X, 20, False)
where the second argument is the number of bins which each dimension of the unit cube \([0,1]^d\) is divided into in order to estimate the empirical copula. Similarly, one can obtain a Kernel Smoothing estimation:
KS_Est = ot.KernelSmoothing().build(X).getCopula()
It is then possible to obtain a new sample of N points from the estimated copulae Bs_Est and KS_Est as Bs_Est.getSample(N) and KS_Est.getSample(N) respectively.
copulae_grid
Fig.10: Left: the gaussian copula density with \(\rho=0.75\). Center: Bernstein non-parametric copula density estimate for a \(N=1000\) sample from a multivariate normal distribution with \(\rho=0.75\). Right: Kernel Smoothing non-parametric copula density estimate for a \(N=1000\) sample from a multivariate normal distribution with \(\rho=0.75\).

References

[1] "Quantitative Risk Management: Concepts, Techniques and Tools", Alexander J. McNeil, Paul Embrechts, and Rüdiger Frey, 2005
[2] "An Introduction to Copulas", Roger B. Nelsen, 2006
[3] "Correlation and Dependence in Risk Management: Properties and Pitfalls", Paul Embrechts, Alexander McNeil, and Daniel Straumann, 1999
[4] "An Introduction to Probability Theory and Its Applications", W. Feller, 1971, volume 2, 2nd edition; John Wiley & Sons; p. 439
[5] “Families of Multivariate Distributions.”, A.W. Marshall, I. Olkin, 1988, Journal of the American Statistical Association, 83, 834–841
[6] “The Bernstein copula and its applications to modeling and approximations of multivariate distributions.”, Sancetta, A. and S. Satchell, 2004, Econometric Theory 20(3), 535–562

Back to Teaching