Background
Conformal inference is a hot research topic among statisticians but has not made its way into the world of econometrics yet. My goal in this article is to provide a gentle introduction to the main idea behind conformal inference. You will learn a new way of thinking about uncertainty in the context of machine learning (i.e., prediction) models.
Let’s imagine a size i.i.d. sample of an outcome variable and a covariate vector , . Conformal inference is concerned with building a confidence interval for a new outcome observation from a new feature realization .
Importantly, this interval should be valid:
- in finite samples (i.e., non-asymptotically),
- without assumptions on the data generating process, and
- for any estimator of the regression function, .
In mathematical notation, given a significance level , we want to construct a confidence interval satisfying the above properties and such that:
While the technical term for this is a prediction interval, I will loosely be calling it a confidence interval, a term with which most of you are familiar.
As a teaser, the basic idea behind the method rests on a simple result about sample quantiles.
Let me explain.
Diving Deeper
Sample Quantiles
I will start with reviewing sample quantiles. Given an i.i.d. sample, , the th quantile is the value such that approximately of the data is smaller than it. For instance, the 95th quantile (sometimes also called percentile) is the value for which 95% of the observations are at least as small.
So, given a new observation , we know that:
The Naïve Approach
Let’s turn back to the regression example with and . We are given a new observation and our focus is on . Following the fact described above, a naïve way to construct a confidence interval for is as follows:
Here is an estimate of the regression function , is the empirical distribution function of the fitted residuals , and is the th quantile of that distribution.
Put simply, we can look at an interval around our best prediction for (i.e., ) defined by the residuals estimated on the original data.
It turns out this interval is too narrow. In a series of papers Vladimir Vovk and co-authors show that the empirical distribution function of the fitted residuals is often biased downward and hence this interval is invalid. This is where conformal inference comes in.
Conformal Inference
Consider the following strategy. For each we fit a regression on the sample . We calculate the residuals for and and count the proportion of ’s smaller than . Let’s call this number . That is,
where is the indicator function equal to one when the statement in the parenthesis is true and 0 if when it is not.
The test statistic is uniformly distributed over the set , implying we can use as a valid p-value for testing the null that . Then, using the sample quantiles logic outlined above we arrive at the following confidence interval for :
This is summarized in the following procedure:
Algorithm: For each value :
- fit the regression function on using your favorite estimator/learner.
- calculate the residuals.
- calculate the proportion .
- construct
Software Package: conformalInference
Two notes. First, conformal inference guarantees unconditional coverage. This is conceptually different and should not be confused with the conditional statement . The latter is stronger and more difficult to assert, requiring additional assumptions such as consistency of our estimator of .
Second, this procedure can be computationally expensive. For a given value we need to fit a regression model and compute residuals for every which we consider including in the confidence interval. This is where split conformal inference comes in.
Split Conformal Inference
Split conformal inference is a modification of the original algorithm that requires significantly less computation power. The idea is to split the fitting and ranking steps, so that the former is done only once. Here is the algorithm.
Algorithm:
- Randomly split the data in two equal-sized bins.
- Get on the first bin.
- Calculate the residuals for each observation in the second bin.
- Let be the s-th smallest residual, where .
- Construct .
A downside of this splitting approach is the introduction of extra randomness. One way to mitigate this is to perform the split multiple times and construct a final confidence interval by taking the intersection of all intervals. The aggregation decreases the variability from a single data split and, as this paper shows, still remains valid. Similar random split aggregation has also been used in the context of statistical significance in high-dimensional models.
An Example
I used the popular Titanic dataset to try out the conformalInference R package. Like most of my data demos, this is meant to be a mere illustration and you should not take the results seriously.
The outcome variable was age
, and the matrix included pclass
(ticket class), age,
sibsp
(number of siblings aboard), parch
(number of parents aboard), fare
, embarked
(port of Embarkation), and cabin
. Some of these were categorical in which case I converted them to a bunch of binary variables. I used the first 888 observations to estimate the regression function using lasso and the 889th row to form the prediction (i.e., the test set).
The actual age
value in the test set was 32 while the conformal inference approach computed a confidence interval (21.25, 123.75). It did contain the true value but it was rather wide. The splitting algorithm gave similar results.
You can find the code in this GitHub repository.
Bottom Line
- Conformal inference offers a novel approach for constructing valid finite-sample prediction intervals in machine learning models.
Where to Learn More
Conformal inference in statistics is an ongoing research topic and I do not know of any review papers or textbook treatments of the subject. If you are interested in learning more, check the paper referenced below.
References
Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., & Wasserman, L. (2018). Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523), 1094-1111.
Lei, J., Rinaldo, A., & Wasserman, L. (2015). A conformal prediction approach to explore functional data. Annals of Mathematics and Artificial Intelligence, 74, 29-43.
Shafer, G., & Vovk, V. (2008). A Tutorial on Conformal Prediction. Journal of Machine Learning Research, 9(3).
Vovk, V., Gammerman, A., & Shafer, G. (2005). Algorithmic learning in a random world (Vol. 29). New York: Springer.
Leave a Reply