Expected Validation Performance and Estimation of a Random Variable’s Maximum

Jesse Dodge; Suchin Gururangan; Dallas Card; Roy Schwartz; Noah A. Smith

doi:10.18653/v1/2021.findings-emnlp.342

Expected Validation Performance and Estimation of a Random Variable’s Maximum

Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

Abstract

Research in NLP is often supported by experimental results, and improved reporting of such results can lead to better understanding and more reproducible science. In this paper we analyze three statistical estimators for expected validation performance, a tool used for reporting performance (e.g., accuracy) as a function of computational budget (e.g., number of hyperparameter tuning experiments). Where previous work analyzing such estimators focused on the bias, we also examine the variance and mean squared error (MSE). In both synthetic and realistic scenarios, we evaluate three estimators and find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias; the estimator with the smallest MSE strikes a balance between bias and variance, displaying a classic bias-variance tradeoff. We use expected validation performance to compare between different models, and analyze how frequently each estimator leads to drawing incorrect conclusions about which of two models performs best. We find that the two biased estimators lead to the fewest incorrect conclusions, which hints at the importance of minimizing variance and MSE.

Anthology ID:: 2021.findings-emnlp.342
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4066–4073
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.342/
DOI:: 10.18653/v1/2021.findings-emnlp.342
Bibkey:
Cite (ACL):: Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, and Noah A. Smith. 2021. Expected Validation Performance and Estimation of a Random Variable’s Maximum. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4066–4073, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Expected Validation Performance and Estimation of a Random Variable’s Maximum (Dodge et al., Findings 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.findings-emnlp.342.pdf
Video:: https://aclanthology.org/2021.findings-emnlp.342.mp4

PDF Cite Search Video Fix data