For convenience, one often desires to summarise
the dependency structure of a multivariate distribution
with a single scalar metric. The most common measures is
linear correlation. However this metric is known to have its
severe pitfalls in many general cases, and better measures should be considered.
Linear correlation
The first measure of dependency we encounter is Pearson's coefficient of
linear correlation:
While very useful in many linear or approximately linear scenarios, this metric however fails
to capture fundamental properties of more complex and realistic distributions.
In addition, thinking in terms of linear correlation can easily make us prey of insidious
pitfalls and fallacies.
First of all we notice that \eqref{eq:lin_corr} depends on the marginals
and in particular, that it is well defined if and only if the second moments exist,
\(\mathbb{E}(X_j)^2<\infty,\ \forall j\). Therefore this metric is not well defined
for a number of marginals such as many power-law distributions where the second moment does not exist.
Moreover, the linear correlation \eqref{eq:lin_corr} is also unable to capture
strong non-linear functional dependencies such as \(X_2=X_1^2\) or \(X_2 = \sin(X_1)\).
Indeed in general one has \(|\rho|\le 1\) and \(|\rho|=1\iff X_2 = aX_1+b\)
for some \(a\in\mathbb{R}\backslash \{0\},\ b\in\mathbb{R} \).
The linear correlation \(\rho\) is also invariant under strictly increasing
linear transformation, but not under more general stricly increasing transformations.
One further source of confusion arise from us being used to reason in terms of
normal distributions. Indeed a number of seemingly intuitive statements on
correlations which are true in the case of normal distributions do not generalise outside of this distribution.
As an example, for normal distributions one finds zero correlation and independence equivalent, which
is no longer true already for e.g. student-t distributed random variables.
Another fallacy is to think that the marginals and the correlations matrix
(\(F_1\), \(F_2\), and \(\rho\) in the bivariate case) are sufficient to determine
the joint distribution \(F\). This is true for elliptical distributions, but wrong in general.
Indeed the only mathematical object encoding all information concerning the dependency structure
is the copula itself.
Yet another fallacy is to think two marginals \(F_1\), \(F_2\),
any value of \(\rho\in[-1,1]\) is attainable. Again, this is true for elliptically
distributed \((X_1, X_2)\) with finite second moment, but wrong in general.
The attainable range can be computed via Hoeffding's formula
\begin{equation}
\text{cov}(X_1, X_2) = \int_{-\infty}^\infty\int_{-\infty}^\infty C(F_1(x_1), F_2(x_2))-F_1(x_1)F_2(x_2)\text{d} x \text{d} y,
\label{eq:Hoeffding_formula}
\end{equation}
where \(\rho_{\text{min}}\) is attained for \(C=W_{\text{counter}}\) and \(\rho_{\text{max}}\) for \(C=C_{\text{co}}\).
This can be arbitrarily small for appropriate choices of the marginals \(F_1\) and \(F_2\).
Fig.1: Attainable range \([\rho_{\text{min}}, \rho_{\text{max}}]\)
of the linear correlation coefficient for two random variables
\(\log X_1 \sim \mathcal{N}(0,1)\) and \(\log X_2 \sim \mathcal{N}(0,\sigma^2)\).
See [1] for more details.
Exercise 1
Consider two independent random variables \(Z, W \sim \mathcal{N}(0,1)\).
The random variables \(X=Z\) and \(Y = ZW\) are clearly not independent.
What's \(\rho(X, Y)\)?
The linear correlation coefficient is
\begin{align*}
\rho(X, Y) &= \text{cov}(X, Y) \newline
&= \mathbb{E}(XY)\newline
&= \mathbb{E}(W)\mathbb{E}(Z^2) = 0
\end{align*}
◻
Exercise 2
Prove \eqref{eq:Hoeffding_formula}.
Start by considering an \(X_j, Y_j \sim F_j\),
with \(X_j \perp\!\!\!\perp Y_j\), for \(j\in\{1,2\}\).
One has:
\begin{align*}
2\text{cov}(X_1, X_2) &= \mathbb{E}((X_1-\mathbb{E}X_1)(X_2-\mathbb{E}X_2)) + \mathbb{E}((Y_1-\mathbb{E}Y_1)(Y_2-\mathbb{E}Y_2)) \newline
&=\mathbb{E}\left(((X_1 - \mathbb{E}X_1)-(Y_1 - \mathbb{E}Y_1))((X_2 - \mathbb{E}X_2)-(Y_2 - \mathbb{E}Y_2))\right) \newline
&= \mathbb{E}((X_1-Y_1)(X_2-Y_2)).
\end{align*}
Now recall that for any \(a,b\in\mathbb{R}\) one has
\[
b-a = \int_{-\infty}^\infty \Theta(x-a) - \Theta(x-b)\text{d}x,
\]
with \(\Theta(x)\) indicating Heaviside's theta function
(with the convention \(\Theta(0)=1\)). Therefore:
\begin{align*}
2\text{cov}(X_1, X_2) &=
\mathbb{E}\int_{-\infty}^\infty\int_{-\infty}^\infty
(\Theta(x_1-Y_1) - \Theta(x_1-X_1))(\Theta(x_2-Y_2) - \Theta(x_2-X_2))
\text{d}x_1\text{d}x_2 \newline
\xrightarrow[]{\text{Fubini}}&=\int_{-\infty}^\infty\int_{-\infty}^\infty
\mathbb{E}\left((\Theta(x_1-Y_1) - \Theta(x_1-X_1))(\Theta(x_2-Y_2) - \Theta(x_2-X_2))\right)
\text{d}x_1\text{d}x_2\newline
&=2\int_{-\infty}^\infty\int_{-\infty}^\infty F(x_1, x_2) - F_1(x_1)F_2(x_2)\text{d}x_1\text{d}x_2.
\end{align*}
◻
Rank correlation
Many of the drawbacks and pitfalls encountered with linear correlation
are resolved when considering rank correlations instead.
As we shall see, rank correlation coefficients are always defined
and are invariant under any strictly increasing transformation, implying
they depend exclusively on the copula.
In the following we
shall present two of the most prominent, namely Kendall's tau and
Spearman's rho.
Definition 1 — Kendall's tau
Consider \(X_j, Y_j \sim F_j\) for \(j \in \{1,2\}\).
Kendall's tau is defined as
\begin{equation}
\begin{split}
\rho_\tau &= \mathbb{E}(\text{sign}((X_1-Y_1)(X_2-Y_2))) \newline
&= \mathbb{P}((X_1-Y_1)(X_2-Y_2)>0) - \mathbb{P}((X_1-Y_1)(X_2-Y_2)<0).
\end{split}
\label{eq:kendall_tau}
\end{equation}
That is, the probability of concordance minus the probability of discordance
(i.e. the probability of two points from \(F\) to have positive or negative slope respectively).
Proposition 1.1 — Formula for Kendall's tau
Consider two random variables \(X_1\) and \(X_2\)
with marginals \(F_1\) and \(F_2\) and copula \(C\). One has:
\begin{equation}
\begin{split}
\rho_\tau &= 4\int_{[0,1]^2}C(u_1, u_2)\text{d}C(u_1, u_2) - 1\newline
&=4\mathbb{E}(C(U_1, U_2)) - 1,
\end{split}
\label{eq:formula_kendall}
\end{equation}
with \((U_1, U_2)\sim C\).
Definition 2 — Spearman's rho
Consider \(X_j, Y_j \sim F_j\) for \(j \in \{1,2\}\).
Spearman's rho is defined as
\begin{equation}
\rho_S = \rho(F_1(X_1), F_2(X_2)).
\label{eq:spearman_rho}
\end{equation}
Proposition 2.1 — Formula for Spearman's rho
Consider two random variables \(X_1\) and \(X_2\)
with marginals \(F_1\) and \(F_2\) and copula \(C\). One has:
\begin{equation}
\begin{split}
\rho_S &=12 \int_0^1\int_0^1 C(u_1, u_2)\text{d}u_1\text{d}u_2 - 3 \newline
&= 12\mathbb{E}(C(U_1, U_2)) - 3,
\end{split}
\label{eq:formula_spearman}
\end{equation}
with \(U_1 \perp\!\!\!\perp U_2\).
Rank correlations are useful to characterise the dependence, providing comparable measures,
and can also serve as a tool for calibration and estimation of a copula's parameter(s).
As we have mentioned, a number of fallacies are avoided, but not all.
Resolved fallacies
For \(\kappa=\rho_\tau\) as well as for \(\kappa=\rho_S\) one has:
\(\kappa\) is always well defined
\(\kappa\) is invariant under any strictly increasing transformation of the random variables
\(\kappa=\pm 1\) if and only if \(X_1\) and \(X_2\) are co-/counter- monotonic.
Any \(\kappa \in [-1,1]\) is attainable
Unresolved fallacies
However, in general for \(\kappa=\rho_\tau\) as well as for \(\kappa=\rho_S\) one has:
The marginals \(F_1, F_2\) and the rank correlation \(\kappa\) are still not sufficient
to uniquely determine \(F\).
While \(X_1 \perp\!\!\!\perp X_2 \implies \kappa=0\), the converse is still not true: \(\kappa=0\) does not imply independence
The last point is something that might still be desirable.
However one can show that this requirement would be be in contraddiction with the
fundamental property of invariance under strictly increasing transformations.
Proposition 3
There exist no dependency measure \(\kappa\) such that:
\(\kappa(X_1, X_2)=0 \iff X_1 \perp\!\!\!\perp X_2\), and
See Exercise 8 for a proof. Nonetheless, it is still possible to define a dependency measure \(\kappa\) such that
\(\kappa(X_1, X_2)=0 \iff X_1 \perp\!\!\!\perp X_2\) as long as one is willing to trade off
other properties. In particular, it can be shown
that one can have \(\kappa(X_1, X_2)=0 \iff X_1 \perp\!\!\!\perp X_2\),
with \(0\le \kappa(X_1, X_2)\le 1\), and
\(\kappa(X_1, X_2)=1 \iff X_1,X_2\) co-/counter- monotonic,
and \(\kappa(T(X_1), X_2)=\kappa(X_1, X_2)\) for \(T\) strictly increasing.
See [1] for more details.
Show that the comonotonic copula \(C_{\text{co}}\) implies \(\kappa=1\)
for both \(\kappa=\rho_\tau\) and \(\kappa=\rho_S\).
Notice that
\[
\mathbb{P}(\text{min}(U_1, U_2) < t) = 1-\mathbb{P}(\text{min}(U_1, U_2) \ge t) = 1- (1-t)^2,
\]
so that
\(
f_{T=\text{min}(U_1, U_2)}(t) = 2(1-t).
\)
Therefore one has
\begin{align*}
\mathbb{E}(\text{min}(U_1, U_2)) &= \int_0^1 tf_{T=\text{min}(U_1, U_2)}(t) \text{d}t \newline
&= 2\int_0^1 t(1-t) \text{d}t =\frac{1}{3}.
\end{align*}
Hence:
\begin{align*}
\rho_S &= 12\mathbb{E}(\text{min}(U_1, U_2)) - 3 = \frac{12}{3} - 3 = 1.
\end{align*}
Let's consider Kendall's tau now. First, the copula viewed as a random variable
has a distribution function called "Kendall distribution function", which is equal to
\[
K_C(t) = t -\frac{\psi(t)}{\psi'(t)},
\]
with \(\psi(t)\) the copula's generator and \(\psi'(t)\) its first derivative.
One can then write the expected value of the copula as
\begin{align*}
\mathbb{E}[C(U_1, U_2)] &= \int_0^1t\text{d}K_C \newline
\xrightarrow[]{\text{by parts}} &= tK_C(t)\Big|_0^1 - \int_0^1K_C(t)\text{d}t.
\end{align*}
Recalling the limiting results for the Clayton copula
(see Exercise 8 here ),
let's consider the generator of
the Clayton copula and its derivative:
\begin{align*}
\psi(t) &= \theta^{-1}(t^{-\theta}-1), \newline
\psi'(t) &= -t^{-(1+\theta)},
\end{align*}
which gives
\[
K_C(t) = t(1+\theta^{-1}(1-t^\theta)).
\]
Then
\begin{align*}
\mathbb{E}[C(U_1, U_2)] &= tK_C(t)\Big|_0^1 - \int_0^1K_C(t)\text{d}t\newline
&= 1-\frac{\theta+3}{2\theta+4}.
\end{align*}
Finally, taking the limits \(\theta\rightarrow\infty\) for \(C_{\text{co}}\) and
\(\theta\rightarrow -1\) for \(W_{\text{counter}}\) one finds:
\begin{align*}
\mathbb{E}[C_{\text{co}}(U_1, U_2)] &= \frac{1}{2},\newline
\mathbb{E}[W_{\text{counter}}(U_1, U_2)] &= 0.\newline
\end{align*}
Therefore,
\begin{align*}
\rho_\tau &= 4\mathbb{E}[C_{\text{co}}(U_1, U_2)]-1 = 1.
\end{align*}
◻
Exercise 6
Show that the countermonotonic copula \(W_{\text{counter}}\) implies \(\kappa=-1\)
for both \(\kappa=\rho_\tau\) and \(\kappa=\rho_S\).
As in the solution of Exercise 5 and with the results therein.
◻
Exercise 7
Write a bivariate joint distribution parametrised by a single parameter \(\lambda\in [0,1]\)
and show that this can attain any \(\kappa \in [-1,1]\) for both \(\kappa=\rho_\tau\) and \(\kappa=\rho_S\).
Consider the following:
\begin{equation*}
F(x_1, x_2) = \lambda C_{\text{co}}(F_1(x_1), F_2(x_2)) + (1-\lambda)W_{\text{counter}}(F_1(x_1), F_2(x_2)).
\end{equation*}
Therefore one has
\[
\kappa = \lambda - (1-\lambda) = 2\lambda -1.
\]
That is, in general one has \(\rho_\tau=\rho_S=2\lambda-1\).
◻
Exercise 8
Prove Proposition 3.
Consider \((X_1, X_2)\) uniformely distributed on the unit circle in \(\mathbb{R}^2\),
so that the vector can be parametrised by \(\phi\sim \mathcal{U}[0,2\pi]\)
as \((X_1, X_2)=(\cos\phi, \sin\phi)\).
Because \((X_1, X_2) \stackrel{\text{d}}{=} (-X_1, X_2)\) one has
\begin{equation*}
\kappa(-X_1, X_2) = \kappa(X_1, X_2) = -\kappa(X_1, X_2).
\end{equation*}
This implies \(\kappa(X_1, X_2)=0\) although \(X_1\) and \(X_2\) are not independent,
which is a contraddiction.
See also [1].
◻
Coefficients of tail dependence
If one wants to study extreme values, asymptotic measures of tail dependence
can be defined as a function of the copula. In what follows we shall
distinguish between upper tail dependence and lower tail dependence.
Definition 4 — Coefficient of tail dependence
Consider two random variables \(X_j\sim F_j\).
The associated coefficients of upper and lower tail dependence are:
\begin{equation}
\begin{split}
\lambda_{u} &= \lim_{\alpha\rightarrow 1^-}\mathbb{P}(X_2> F_2^{-1}(\alpha)|X_1 > F_1^{-1}(\alpha)),\newline
\lambda_{\ell} &= \lim_{\alpha\rightarrow 0^+}\mathbb{P}(X_2\le F_2^{-1}(\alpha)|X_1\le F_1^{-1}(\alpha)).
\end{split}
\end{equation}
If \(\lambda_{u}\in]0,1]\) (\(\lambda_{\ell}\in]0,1]\)), then \((X_1, X_2)\)
is said to be upper (lower) tail dependent, or more generally, asymptotically dependent. Similarly,
if \(\lambda_{u}=0\) (\(\lambda_{\ell}=0\)), then \((X_1, X_2)\)
is said to be upper (lower) tail independent, or more generally, asymptotically independent.
Proposition 4.1
The coefficients of upper and lower tail dependence can be written as a function
of the copula as:
\begin{equation}
\begin{split}
\lambda_{u} &= \lim_{\alpha\rightarrow 1^-}2-\frac{1-C(\alpha, \alpha)}{1-\alpha},\newline
\lambda_{\ell} &= \lim_{\alpha\rightarrow 0^+}\frac{C(\alpha,\alpha)}{\alpha}.
\end{split}
\end{equation}
Proposition 4.2
For radially symmetric copulae one has \(\lambda_{u}=\lambda_{\ell}\).
Proposition 4.3
For Archimedean copulae with strict generator \(\psi\) one has
\begin{equation}
\begin{split}
\lambda_{u} &= 2-2\lim_{\alpha\rightarrow 0^+}\frac{\psi'(2\alpha)}{\psi'(\alpha)},\newline
\lambda_{\ell} &= 2\lim_{\alpha\rightarrow\infty}\frac{\psi'(2\alpha)}{\psi'(\alpha)}.
\end{split}
\end{equation}
Figure 1 below presents the coefficient of tail dependence
for the Student-t copula \(C_{\nu,\rho}^t\) for which one can find \(\lambda_\ell=\lambda_u\).
The tail dependence grows with the correlation coefficient \(\rho\) and quickly decreases
with increasing degrees of freedom \(\nu\). Therefore, recalling the limit \(\nu\rightarrow\infty\)
leads to normality, one can easily see the gaussian copula is
asymptotically independent for all \(\rho\) except \(\rho=1\)).
Fig.2: Coefficient of tail dependence for the Student-t copula \(C_{\nu,\rho}^t\).
Prove Proposition 4.3. Moreover, compute the upper and lower coefficients
for Clayton's and Gumbel's copulae.
Consider the upper coefficient first. One has:
\begin{align*}
\lambda_u &= 2-\lim_{\alpha\rightarrow 1^-}\frac{1-\psi(2\psi^{-1}(\alpha))}{1-\alpha} \newline
\xrightarrow[]{\beta=\psi^{-1}(\alpha)}&= 2-\lim_{\beta\rightarrow 0^+}\frac{1-\psi(2\beta)}{1-\psi(\beta)}\newline
\xrightarrow[]{\text{de l'Hôspital}}&=2-2\lim_{\beta\rightarrow 0^+}\frac{\psi'(2\beta)}{\psi'(\beta)}.
\end{align*}
Now the lower coefficient:
\begin{align*}
\lambda_\ell &= \lim_{\alpha\rightarrow 0^+}\frac{\psi(2\psi^{-1}(\alpha))}{\alpha} \newline
\xrightarrow[]{\beta=\psi^{-1}(\alpha)}&= \lim_{\beta\rightarrow\infty}\frac{\psi(2\beta)}{\psi(\beta)}\newline
\xrightarrow[]{\text{de l'Hôspital}}&=2\lim_{\beta\rightarrow\infty}\frac{\psi'(2\beta)}{\psi'(\beta)}.
\end{align*}
Finally, for the Clayton copula one finds:
\begin{align*}
\lambda_u &= 0,\newline
\lambda_\ell &= 2^{-1/\theta},
\end{align*}
while for the Gumbel copula one has:
\begin{align*}
\lambda_u &= 2-2^{1/\theta},\newline
\lambda_\ell &= 0.
\end{align*}
◻
One major issue with gaussian copulae is their asymptotic independence for any \(|\rho|<1\).
The great financial crisis of 2007-2008 is often partly attributed to the misuse of the gaussian
copula model [2][3],
and in particular to the infamous Li model of credit default
[4].
The inadequacy of gaussian copula is evident:
attempting to model defaults in a portfolio of corporate bonds
leads to crucial underestimation of the likelihood of joint defaults
(because of the gaussian copula asymptotic independence).
This is especially important in times of financial distress when defaults
are clustered and conditional on observing one company defaulting
one has a high likelihood of observing more defaults.
Consequently some have argued one cause of the crisis was to be attributed to
a "misplaced reliance on sophisticated mathematics" [5].
However this could not be further from the truth. Quite to the contrary in fact,
models based on the gaussian copula were enthusiastically embraced by the financial industry
exactly because of their simplicity. In particular, had the industry employed more 'sophisticated'
mathematics, which in this specific case means asymptotically dependent copulae,
the banks' risk management would have been arguably more sound, and capable to cope with
the materialisation of large clusters of defaults.