Bernstein–von Mises theorem: Difference between revisions

Content deleted Content added
No edit summary
Correction Typo in the theorem hypothesis theta_0 to theta
 
(7 intermediate revisions by 2 users not shown)
Line 2:
{{Bayesian statistics}}
 
In [[Bayesian inference]], the '''Bernstein–von Mises theorem''' provides the basis for using Bayesian credible sets for confidence statements in [[parametric model|parametric models]]. It states that under some conditions, a posterior distribution converges in the[[Total limitvariation distance of infiniteprobability measures|total variation datadistance]] to a multivariate normal distribution centered at the maximum likelihood estimator <math>\widehat{\theta}_n</math> with covariance matrix given by <math>n^{-1} \mathcal{I}(\theta_0)^{-1} </math>, where <math>\theta_0</math> is the true population parameter and <math>\mathcal{I}(\theta_0)</math> is the [[Fisher information matrix]] at the true population parameter value:<ref>{{cite book | last=van der Vaart|first=A.W.|title=Asymptotic Statistics |year=1998|publisher=Cambridge University Press|isbn=0-521-78450-6|chapter=10.2 Bernstein–von Mises Theorem}}</ref>
:<math>||P(\theta|x_1,\dots x_n)= - \mathcal{N}({\theta_0widehat{\theta}}_n, n^{-1}\mathcal{I}(\theta_0)^{-1}) ||_{{\textmathrm{ for TV}}} n\to xrightarrow{P_{\infty.theta_0}} = 0</math>
 
The Bernstein–von Mises theorem links [[Bayesian inference]] with [[frequentist inference]]. It assumes there is some true probabilistic process that generates the observations, as in frequentism, and then studies the quality of Bayesian methods of recovering that process, and making uncertainty statements about that process. In particular, it states that asymptotically, many Bayesian credible sets of a certain credibility level <math>\alpha</math> will asymptoticallyact beas confidence sets of confidence level <math>\alpha</math>, which allows for the interpretation of Bayesian credible sets.
 
==Statement==
==Heuristic statement==
In a modelLet <math>(P_\theta\,: \,\theta \in \Theta)</math>, underbe certain regularity conditions (finite-dimensional,a well-specified, smooth,statistical existence of tests)model, ifwhere the priorparameter distributionspace <math>\PiTheta</math> onis a subset of <math>\thetamathbb{R}^k</math>. hasFurther, alet densitydata with<math>X_1, respect\ldots, toX_n the\in Lebesgue measure which is smooth enough (near <math>\theta_0mathcal{X}</math> boundedbe awayindependently fromand zero),identically thedistributed total variation distance between the rescaled posterior distribution (by centring and rescaling tofrom <math>\sqrtP_{n}(\theta - \theta_0)}</math>). andSuppose athat Gaussianall distribution centred on any [[efficient estimator]] and withof the inversefollowing Fisherconditions information as variance will converge in probability to zero.hold:
 
# The model admits densities <math>(p_\theta\,:\,\theta\in\Theta)</math> with respect to some measure <math>\mu</math>.
==Bernstein–von Mises and maximum likelihood estimation==
# The Fisher information matrix <math>\mathcal{I}(\theta_0)</math> is nonsingular.
In case the [[maximum likelihood estimator]] is an efficient estimator, we can plug this in, and we recover a common, more specific, version of the Bernstein–von Mises theorem.
# The model is differentiable in quadratic mean. That is, there exists a measurable function <math>f:\mathcal{X}\rightarrow\mathbb{R}^k</math> such that<math>\int\left[\sqrt{p_\theta(x)} - \sqrt{p_{\theta_0}(x)}- \frac{1}{2}(\theta - \theta_0)^\top f(x)\sqrt{p_{\theta_0}(x)}\right]^2 \mathrm{d}\mu(x) = o(||\theta - \theta_0||^2) </math> as <math>\theta \rightarrow \theta_0</math>.
# For every <math>\varepsilon > 0</math>, there exists a sequence of test functions <math>\phi_n:\mathcal{X}^n \rightarrow [0, 1]</math> such that <math>\mathbb{E}_{\mathbf{X} \sim P^n_{\theta_0}}\left[\phi_n(\mathbf{X})\right] \rightarrow 0</math> and <math>\sup_{\theta \,:\, ||\theta-\theta_0||>\varepsilon} \mathbb{E}_{\mathbf{X}\sim P^n_{\theta}}\left[1 - \phi_n(\mathbf{X})\right] \rightarrow 0</math> as <math>n \rightarrow \infty</math>.
# The prior measure is absolutely continuous with respect to the Lebesgue measure in a neighborhood of <math>\theta_0</math>, with a continuous positive density at <math>\theta_0</math>.
 
Then for any estimator <math>\widehat{\theta}_n</math> satisfying <math>\sqrt{n}( {\widehat{\theta}}_n - \theta_0) \xrightarrow{d} \mathcal{N}(0, {\mathcal{I}}^{-1}(\theta_0))</math>, the posterior distribution <math>\Pi_n</math> of <math>\theta\mid X_1, \ldots, X_n</math> satisfies<blockquote><math>{\left|\left|\Pi_n - \mathcal{N}\left(\widehat{\theta}_n, \frac{1}{n}{\mathcal{I}}^{-1}({\theta_0})\right)\right|\right|}_{\mathrm{TV}} \xrightarrow{P_{\theta_0}} 0.</math></blockquote>
as <math>n\rightarrow \infty</math>.
 
==Bernstein–vonRelationship Mises andto maximum likelihood estimation==
Under certain regularity conditions, the [[maximum likelihood estimator]] is an asymptotically efficient estimator and can thus be used as <math>\widehat{\theta}_n</math> in the theorem statement. This then yields that the posterior distribution converges in total variation distance to the asymptotic distribution of the maximum likelihood estimator, which is commonly used to construct frequentist confidence sets.
 
==Implications==
Line 28 ⟶ 37:
Different summary statistics such as the [[Mode (statistics)|mode]] and mean may behave differently in the posterior distribution. In Freedman's examples, the posterior density and its mean can converge on the wrong result, but the posterior mode is consistent and will converge on the correct result.
 
==NotesReferences==
{{Reflist}}
 
==ReferencesFurther reading==
*{{cite book|last=Hartigan |first=J. A. |authorlink=John A. Hartigan |chapter=Asymptotic Normality of Posterior Distributions |title=Bayes Theory |location=New York |publisher=Springer |year=1983 |isbn= |doi=10.1007/978-1-4613-8242-3_11 }}
*{{cite book |last=Le Cam |first=Lucien |authorlink=Lucien Le Cam |title=Asymptotic Methods in Statistical Decision Theory |chapter=Approximately Gaussian Posterior Distributions |pages=336–345 |location=New York |publisher=Springer |year=1986 |isbn=0-387-96307-3 }}
*{{cite book|last=van der Vaart|first=A. W. |title=Asymptotic Statistics
|year=1998|publisher=Cambridge University Press|isbn= 0-521-49603-9|chapter=Bernstein–von Mises Theorem}}