Bootstrapping the Out-of-sample Predictions for Efficient and Accurate Cross-Validation

Tsamardinos, Ioannis; Greasidou, Elissavet; Tsagris, Michalis; Borboudakis, Giorgos

Computer Science > Machine Learning

arXiv:1708.07180 (cs)

[Submitted on 23 Aug 2017 (v1), last revised 25 Aug 2017 (this version, v2)]

Title:Bootstrapping the Out-of-sample Predictions for Efficient and Accurate Cross-Validation

Authors:Ioannis Tsamardinos, Elissavet Greasidou, Michalis Tsagris, Giorgos Borboudakis

View PDF

Abstract:Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV's main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation and a method by Tibshirani and Tibshirani, BBC-CV is computationally more efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based hypothesis test we stop training of models on new folds of statistically-significantly inferior configurations. We name the method Bootstrap Corrected with Early Dropping CV (BCED-CV) that is both efficient and provides accurate performance estimates.

Comments:	Added acknowledgments
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1708.07180 [cs.LG]
	(or arXiv:1708.07180v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1708.07180

Submission history

From: Giorgos Borboudakis [view email]
[v1] Wed, 23 Aug 2017 20:30:07 UTC (133 KB)
[v2] Fri, 25 Aug 2017 14:02:02 UTC (133 KB)

Computer Science > Machine Learning

Title:Bootstrapping the Out-of-sample Predictions for Efficient and Accurate Cross-Validation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Bootstrapping the Out-of-sample Predictions for Efficient and Accurate Cross-Validation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators