Correlation is a Cosine

Share this article

You might have heard a statement of the kind “correlation is a cosine” but you have not bothered enough to investigate what it means precisely. It certainly sounds interesting. How can the simplest bivariate summary statistic be related to a trigonometric function from sixth grade? What is the relation between correlation and cosines?

Let me explain.

Diving Deeper
1. The Law of Cosines

The law of cosines states that in any triangle with sides x, y, and z and an angle (between x and y) \theta, we have:

(1)   \begin{equation*} z^2 = x^2 + y^2 - 2 x y cos(\theta), \end{equation*}

In the special case when \theta=\frac{\pi}{2}, the term on the right-hand side equals 0 and the equation reduces to the well-known Pythagorean Theorem.

2. The Variance of the Sum of Two Random Variables

Let’s imagine two random variables A, B. The variance of their sum is given by:

    \begin{equation*}var(A+B) = var(A)+var(B)+2 cov(A,B),\end{equation*}

where cov(\cdot), denotes covariance. We can substitute the last term with its definition as follows:

    \begin{equation*}var(A+B) = var(A)+var(B)+2 corr(A,B) sd(A) sd(B). \end{equation*}

Next, we know that var(\cdot)=sd^2(\cdot). Substituting, we get:

(2)   \begin{equation*}sd^2(A+B) = sd^2 (A)+ sd^2 (B)+2 corr(A,B) sd(A) sd(B).\end{equation*}

3. Putting the Two Equations Together

Setting x=sd(A), y=sd(B), and z=sd(A+B) in equation (1) gives the desired result. With one small caveat – the negative sign on the cosine term. To get around this we can simply look at the complementary angle \delta = \pi - \theta.

That is, we imagine a triangle with sides equal to sd(A), sd(B) and sd(A+B), where \theta is the angle between sd(A), sd(B). When this angle is small (\theta < \frac{pi}{2}), the two sides point in the same direction and A and B are positively correlated. The opposite is true for \theta > \frac{pi}{2}. As mentioned above, \theta = \frac{pi}{2} kills the correlation term, consistent with A and B being independent.

Where to Learn More

As with anything else, a Google search is your friend here, with many links to helpful Stack Overflow posts explaining this connection from all sorts of angles. However, I do find John D. Cook’s blog post most helpful, and I am following his exposition closely.

Bottom Line
  • The formula for the variance of two random variables follows the law of cosines.
  • Substituting and arranging terms shows the desired result.

Leave a Reply

Your email address will not be published. Required fields are marked *