Skip to Main Content
 
Translator disclaimer

Abstract

Random forests are among the most popular machine learning techniques for prediction problems. When using random forests to predict a quantitative response, an important but often overlooked challenge is the determination of prediction intervals that will contain an unobserved response value with a specified probability. We propose new random forest prediction intervals that are based on the empirical distribution of out-of-bag prediction errors. These intervals can be obtained as a by-product of a single random forest. Under regularity conditions, we prove that the proposed intervals have asymptotically correct coverage rates. Simulation studies and analysis of 60 real datasets are used to compare the finite-sample properties of the proposed intervals with quantile regression forests and recently proposed split conformal intervals. The results indicate that intervals constructed with our proposed method tend to be narrower than those of competing methods while still maintaining marginal coverage rates approximately equal to nominal levels.

Acknowledgments

The authors gratefully acknowledge The Iowa State University Plant Sciences Institute Scholars Program.

Login options

Purchase * Save for later
Online

Article Purchase 24 hours to view or download: USD 51.00 Add to cart

Issue Purchase 30 days to view or download: USD 105.00 Add to cart

* Local tax will be added as applicable