Correlation is a Cosine

Share this article

Background

You might have come across the statement, “correlation is a cosine,” but never taken the time to explore its precise meaning. It certainly sounds intriguing—how can the simplest bivariate summary statistic be connected to a trigonometric function you first encountered in sixth grade? What exactly is the relationship between correlation and cosines?

Let me explain.

Diving Deeper

The Law of Cosines

The law of cosines states that in any triangle with sides $x$ , $y$ , and $z$ and an angle (between $x$ and $y$ ) $\theta$ , we have:

(1) $\begin{equation*} z^2 = x^2 + y^2 - 2 x y cos(\theta), \end{equation*}$

In the special case when $\theta=\frac{\pi}{2}$ , the term on the right-hand side equals 0 and the equation reduces to the well-known Pythagorean Theorem.

The Variance of the Sum of Two Random Variables

Let’s imagine two random variables $A$ , $B$ . The variance of their sum is given by:

$\begin{equation*}var(A+B) = var(A)+var(B)+2 cov(A,B),\end{equation*}$

where $cov(\cdot)$ , denotes covariance. We can substitute the last term with its definition as follows:

$\begin{equation*}var(A+B) = var(A)+var(B)+2 corr(A,B) sd(A) sd(B). \end{equation*}$

Next, we know that $var(\cdot)=sd^2(\cdot)$ . Substituting, we get:

(2) $\begin{equation*}sd^2(A+B) = sd^2 (A)+ sd^2 (B)+2 corr(A,B) sd(A) sd(B).\end{equation*}$

Putting the Two Equations Together

Setting $x=sd(A)$ , $y=sd(B)$ , and $z=sd(A+B)$ in equation (1) gives the desired result. With one small caveat – the negative sign on the cosine term. To get around this we can simply look at the complementary angle $\delta = \pi - \theta$ .

That is, we imagine a triangle with sides equal to $sd(A), sd(B)$ and $sd(A+B)$ , where $\theta$ is the angle between $sd(A), sd(B)$ . When this angle is small ( $\theta < \frac{pi}{2}$ ), the two sides point in the same direction and $A$ and $B$ are positively correlated. The opposite is true for $\theta > \frac{pi}{2}$ . As mentioned above, $\theta = \frac{pi}{2}$ kills the correlation term, consistent with $A$ and $B$ being independent.

Where to Learn More

As with anything else, a Google search is your friend here, with many links to helpful Stack Overflow posts explaining this connection from all sorts of angles. However, I do find John D. Cook’s blog post most helpful, and I am following his exposition closely.

Bottom Line

The formula for the variance of two random variables follows the law of cosines.
Substituting and arranging terms shows the desired result.

yasenov.com