Background
You might have come across the statement, “correlation is a cosine,” but never taken the time to explore its precise meaning. It certainly sounds intriguing—how can the simplest bivariate summary statistic be connected to a trigonometric function you first encountered in sixth grade? What exactly is the relationship between correlation and cosines?
Let me explain.
Diving Deeper
The Law of Cosines
The law of cosines states that in any triangle with sides ,
, and
and an angle (between
and
)
, we have:
(1)
In the special case when , the term on the right-hand side equals 0 and the equation reduces to the well-known Pythagorean Theorem.
The Variance of the Sum of Two Random Variables
Let’s imagine two random variables ,
. The variance of their sum is given by:
where , denotes covariance. We can substitute the last term with its definition as follows:
Next, we know that . Substituting, we get:
(2)
Putting the Two Equations Together
Setting ,
, and
in equation (1) gives the desired result. With one small caveat – the negative sign on the cosine term. To get around this we can simply look at the complementary angle
.
That is, we imagine a triangle with sides equal to and
, where
is the angle between
. When this angle is small (
), the two sides point in the same direction and
and
are positively correlated. The opposite is true for
. As mentioned above,
kills the correlation term, consistent with
and
being independent.
Where to Learn More
As with anything else, a Google search is your friend here, with many links to helpful Stack Overflow posts explaining this connection from all sorts of angles. However, I do find John D. Cook’s blog post most helpful, and I am following his exposition closely.
Bottom Line
- The formula for the variance of two random variables follows the law of cosines.
- Substituting and arranging terms shows the desired result.
Leave a Reply