While for continuous variates there exist numerous distinct correlation metrics, such as Pearson correlation \(\rho\),
Spearman's rho \(\rho_S\), and Kendall's tau \(\tau\),
all of these become equivalent when considering binary variates instead:
\(\rho(X, Y)=\rho_S(X, Y)=\tau(X, Y)\).
The Yule phi coefficient [1]
(also known as the mean square contingency coefficient
or the Matthews correlation coefficient in the ML literature)
is a measure of the association of two binary variables,
which is also equivalent to Pearson's correlation coefficient in the case of dichotomous variables.
When considering two binary variates \(X,Y\in \{0,1\}\times\{0,1\}\),
the correlation coefficient \(\rho\) between the two cannot span the full range \([-1,1]\).
Instead, denoting by \(p_j=\mathbb{P}(j)\) and \(q_j = 1- p_j\), correlations are bounded as
\begin{equation}\label{eq:achievable_corrs}
\rho_{\text{min}} \le \rho \le \rho_{\text{max}},
\end{equation}
where
\begin{equation}
\begin{split}
\rho_{\text{min}} &= \text{max}\left(-\sqrt{\frac{p_X p_Y}{ q_X q_Y}}, -\sqrt{\frac{q_X q_Y}{p_Xp_Y}}\right)\\
\rho_{\text{max}} &= \text{min}\left(\sqrt{\frac{p_X q_Y}{ p_Y q_X}}, \sqrt{\frac{p_Y q_X}{p_X q_Y}}\right).
\end{split}
\label{eq:correlation_bounds}
\end{equation}
To see that, let us start by recalling that a Bernoulli random vector \((X, Y)\)
takes values in the Cartesian product space \(\{0,1\}\times \{0,1\}\), with probability mass function given by:
\begin{equation}\label{eq:bernoulli_bivariate}
f(x, y) = p_{11}^{xy}p_{10}^{x(1-y)}p_{01}^{(1-x)y}p_{00}^{(1-x)(1-y)}
\end{equation}
where \(p_{ij}=\mathbb{P}(X=i, Y=j)\), and \(p_{00}+p_{01}+p_{10}+p_{11}=1\).
The marginal probabilities of \(X\) and \(Y\) are then clearly given by
\begin{equation}
\begin{split}
p_X &= p_{10} + p_{11},\\
p_Y &= p_{01} + p_{11}.\\
\end{split}
\end{equation}
Obviously, \(\mathbb{E}(X) = p_X\) and \(\mathbb{E}(Y) = p_Y\).
Therefore, recalling that
\begin{equation}
\begin{split}
\rho &= \frac{\text{cov}\left(X, Y\right)}{\sigma_X\sigma_Y}\\
&= \frac{\mathbb{E}\left[XY\right]-\mathbb{E}\left[X\right]\mathbb{E}\left[Y\right]}{\sqrt{p_Xq_Xp_Yq_Y}}\\
&= \frac{\mathbb{E}\left[XY\right]-p_Xp_Y}{\sqrt{p_Xq_Xp_Yq_Y}}.
\end{split}
\end{equation}
and noticing that
$$
\mathbb{E}[XY] = \sum_{x,y}xyp_{xy}=p_{11},
$$
(since all terms where either \(x\) or \(y\) are zero, cancel out) one obtains
\begin{equation}
\rho = \frac{p_{11}-p_Xp_Y}{\sqrt{p_Xq_Xp_Yq_Y}}.
\end{equation}
At this point, notice that one must always have \(p_{11}\le \min\left(p_X, p_Y\right)\). Hence:
\begin{equation}
\begin{split}
\rho\sqrt{p_Xq_Xp_Yq_Y}-p_Xp_Y &\le \min\left(p_X, p_Y\right)\\
\rho &\le \min\left(\frac{p_X+p_Xp_Y}{\sqrt{p_Xq_Xp_Yq_Y}}, \frac{p_Y+p_Xp_Y}{\sqrt{p_Xq_Xp_Yq_Y}}\right)\\
&= \min\left(\frac{p_Xq_Y}{\sqrt{p_Xq_Xp_Yq_Y}}, \frac{p_Yq_X}{\sqrt{p_Xq_Xp_Yq_Y}}\right)\\
&= \min\left(\sqrt{\frac{p_Xq_Y}{q_Xp_Y}}, \sqrt{\frac{p_Yq_X}{p_Xq_Y}}\right)\\
&=\rho_{\text{max}}.
\end{split}
\end{equation}
This proves the second bound in \eqref{eq:correlation_bounds}.
To prove the lower bound instead, consider that every joint probability must be non-negative: \(p_{ij}\ge0\) for all \(i\) and \(j\). This means that
\begin{equation}
\begin{split}
p_{00}&=1-p_{01}-p_{10}-p_{11}\\
&=1-p_X-p_Y+p_{11}\ge0\\
p_{11}&\ge p_X+p_Y-1,
\end{split}
\end{equation}
which implies
\begin{equation}
p_{11} \ge \max\left(0, p_X+p_Y-1\right) .
\end{equation}
As before, this results in:
\begin{equation}
\begin{split}
\rho\sqrt{p_Xq_Xp_Yq_Y}-p_Xp_Y &\ge \max\left(0, p_X+p_Y-1\right)\\
\rho &\ge \max\left(-\frac{p_Xp_Y}{\sqrt{p_Xq_Xp_Yq_Y}}, \frac{-p_Xp_Y+p_X+p_Y-1}{\sqrt{p_Xq_Xp_Yq_Y}}\right)\\
&= \max\left(-\sqrt{\frac{p_Xp_Y}{q_Xq_Y}}, \frac{p_Xq_Y-q_Y}{\sqrt{p_Xq_Xp_Yq_Y}}\right)\\
&= \max\left(-\sqrt{\frac{p_Xp_Y}{q_Xq_Y}}, \sqrt{-\frac{q_Xq_Y}{p_Xp_Y}}\right)\\
&=\rho_{\text{min}}.
\end{split}
\end{equation}
The bounds on the correlation \(\rho\) from \eqref{eq:correlation_bounds} are plotted in Figure 1.
Notice in particular, that \(|\rho_{\text{min}}|\) is maximal when \(p_X=q_Y\),
while \(|\rho_{\text{max}}|\) is maximal when \(p_X=p_Y\).
Conversely, the constraint on how negative correlations can get (\(\rho_{\text{min}}\))
is more binding when either both marginals are small \(p_X\approx p_Y\approx0\),
or when both marginals are large \(p_X\approx p_Y\approx1\).
Likewise, the constraint on how positive correlations can get (\(\rho_{\text{max}}\))
is more binding when \(|p_X-p_Y|\approx1\), that is when one is large and the other small.
Finally, the full range of possible correlations \([-1,1]\) is achievable only for \(p_X=p_Y=\frac{1}{2}\).
Fig.1: Correlation bounds (cf. equation \eqref{eq:achievable_corrs}) as a function of marginal probabilities.
Negative correlations are increasingly limited when \(p_X\) and \(p_Y\) are both large or both small;
conversely, positive correlations are limited when \(p_X\) is large and \(p_Y\) is small or vice versa.