Jump to content

Method of moments (statistics): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Proving the central limit theorem: This proof is not correct!
 
(13 intermediate revisions by 6 users not shown)
Line 6: Line 6:
It starts by expressing the population [[moment (mathematics)|moments]] (i.e., the [[expected value]]s of powers of the [[random variable]] under consideration) as functions of the parameters of interest. Those expressions are then set equal to the sample moments. The number of such equations is the same as the number of parameters to be estimated. Those equations are then solved for the parameters of interest. The solutions are estimates of those parameters.
It starts by expressing the population [[moment (mathematics)|moments]] (i.e., the [[expected value]]s of powers of the [[random variable]] under consideration) as functions of the parameters of interest. Those expressions are then set equal to the sample moments. The number of such equations is the same as the number of parameters to be estimated. Those equations are then solved for the parameters of interest. The solutions are estimates of those parameters.


The method of moments was introduced by [[Pafnuty Chebyshev]] in 1887 in the proof of the central limit theorem. The idea of matching empirical moments of a distribution to the population moments dates back at least to [[Karl Pearson|Pearson]].{{citation needed|date=December 2019}}
The method of moments was introduced by [[Pafnuty Chebyshev]] in 1887 in the proof of the [[Central limit theorem|central limit theorem]]. The idea of matching empirical moments of a distribution to the population moments dates back at least to [[Karl Pearson|Pearson]].{{ref|Pearson, K. (1936), "Method of Moments and Method of Maximum Likelihood", ''Biometrika'' 28(1/2), 35–59.}}


==Method==
==Method==
Suppose that the problem is to estimate <math>k</math> unknown parameters <math>\theta_{1}, \theta_2, \dots, \theta_k</math> characterizing the [[probability distribution|distribution]] <math>f_W(w; \theta)</math> of the random variable <math>W</math>.<ref>[[Kimiko O. Bowman]] and L. R. Shenton, "Estimator: Method of Moments", pp 2092–2098, ''Encyclopedia of statistical sciences'', Wiley (1998).</ref> Suppose the first <math>k</math> moments of the true distribution (the "population moments") can be expressed as functions of the <math>\theta</math>s:
Suppose that the parameter <math>\theta</math> = (<math>\theta_{1}, \theta_2, \dots, \theta_k</math>) characterizes the [[probability distribution|distribution]] <math>f_W(w; \theta)</math> of the random variable <math>W</math>.<ref>[[Kimiko O. Bowman]] and L. R. Shenton, "Estimator: Method of Moments", pp 2092–2098, ''Encyclopedia of statistical sciences'', Wiley (1998).</ref> Suppose the first <math>k</math> moments of the true distribution (the "population moments") can be expressed as functions of the <math>\theta</math>s:


: <math>
: <math>
Line 22: Line 22:
Suppose a sample of size <math>n</math> is drawn, resulting in the values <math>w_1, \dots, w_n</math>. For <math>j=1,\dots,k</math>, let
Suppose a sample of size <math>n</math> is drawn, resulting in the values <math>w_1, \dots, w_n</math>. For <math>j=1,\dots,k</math>, let
:<math>\widehat\mu_j = \frac{1}{n} \sum_{i=1}^n w_i^j</math>
:<math>\widehat\mu_j = \frac{1}{n} \sum_{i=1}^n w_i^j</math>
be the ''j''-th sample moment, an estimate of <math>\mu_j</math>. The method of moments estimator for <math>\theta_1, \theta_2, \ldots, \theta_k</math> denoted by <math>\widehat\theta_1, \widehat\theta_2, \dots, \widehat\theta_k </math> is defined as the solution (if there is one) to the equations:{{citation needed|date=September 2011}}
be the ''j''-th sample moment, an estimate of <math>\mu_j</math>. The method of moments estimator for <math>\theta_1, \theta_2, \ldots, \theta_k</math> denoted by <math>\widehat\theta_1, \widehat\theta_2, \dots, \widehat\theta_k </math> is defined to be the solution (if one exists) to the equations:{{ref|Pearson, K. (1936), "Method of Moments and Method of Maximum Likelihood", ''Biometrika'' 28(1/2), 35–59.}}


: <math>
: <math>
Line 32: Line 32:
\end{align}
\end{align}
</math>
</math>


The method described here for single random variables generalizes in an obvious manner to multiple random variables leading to multiple choices for moments to be used. Different choices generally lead to different solutions [5], [6].


==Advantages and disadvantages==
==Advantages and disadvantages==
Line 40: Line 43:
However, in some cases the likelihood equations may be intractable without computers, whereas the method-of-moments estimators can be computed much more quickly and easily. Due to easy computability, method-of-moments estimates may be used as the first approximation to the solutions of the likelihood equations, and successive improved approximations may then be found by the [[Newton&ndash;Raphson method]]. In this way the method of moments can assist in finding maximum likelihood estimates.
However, in some cases the likelihood equations may be intractable without computers, whereas the method-of-moments estimators can be computed much more quickly and easily. Due to easy computability, method-of-moments estimates may be used as the first approximation to the solutions of the likelihood equations, and successive improved approximations may then be found by the [[Newton&ndash;Raphson method]]. In this way the method of moments can assist in finding maximum likelihood estimates.


In some cases, infrequent with large samples but less infrequent with small samples, the estimates given by the method of moments are outside of the parameter space (as shown in the example below); it does not make sense to rely on them then. That problem never arises in the method of [[maximum likelihood]]{{citation needed|date=March 2018}}. Also, estimates by the method of moments are not necessarily [[sufficiency (statistics)|sufficient statistics]], i.e., they sometimes fail to take into account all relevant information in the sample.
In some cases, infrequent with large samples but less infrequent with small samples, the estimates given by the method of moments are outside of the parameter space (as shown in the example below); it does not make sense to rely on them then. That problem never arises in the method of [[maximum likelihood]]{{ref|Pearson, K. (1936), "Method of Moments and Method of Maximum Likelihood", ''Biometrika'' 28(1/2), 35–59.}} Also, estimates by the method of moments are not necessarily [[sufficiency (statistics)|sufficient statistics]], i.e., they sometimes fail to take into account all relevant information in the sample.


When estimating other structural parameters (e.g., parameters of a [[utility|utility function]], instead of parameters of a known probability distribution), appropriate probability distributions may not be known, and moment-based estimates may be preferred to maximum likelihood estimation.
When estimating other structural parameters (e.g., parameters of a [[utility|utility function]], instead of parameters of a known probability distribution), appropriate probability distributions may not be known, and moment-based estimates may be preferred to maximum likelihood estimation.

== Alternative method of moments ==
The equations to be solved in the method of moments (MoM) are in general nonlinear and there are no generally applicable guarantees that tractable solutions exist{{Citation needed|date=August 2023}}. But there is an alternative approach to using sample moments to estimate data model parameters in terms of known dependence of model moments on these parameters, and this alternative requires the solution of only linear equations or, more generally, tensor equations. This alternative is referred to as the Bayesian-Like MoM (BL-MoM), and it differs from the classical MoM in that it uses optimally weighted sample moments. Considering that the MoM is typically motivated by a lack of sufficient knowledge about the data model to determine likelihood functions and associated ''a posteriori'' probabilities of unknown or random parameters, it is odd that there exists a type of MoM that is ''Bayesian-Like''. But the particular meaning of ''Bayesian-Like'' leads to a problem formulation in which required knowledge of ''a posteriori'' probabilities is replaced with required knowledge of only the dependence of model moments on unknown model parameters, which is exactly the knowledge required by the traditional MoM [1],[2],[5]–[9]. The BL-MoM also uses knowledge of ''a priori'' probabilities of the parameters to be estimated, when available, but otherwise uses uniform priors.{{Citation needed|date=August 2023}}

The BL-MoM has been reported on in only the applied statistics literature in connection with parameter estimation and hypothesis testing using observations of stochastic processes for problems in Information and Communications Theory and, in particular, communications receiver design in the absence of knowledge of likelihood functions or associated ''a posteriori'' probabilities [10] and references therein. In addition, the restatement of this receiver design approach for stochastic process models as an alternative to the classical MoM for any type of multivariate data is available in tutorial form at the university website [11, page 11.4]. The applications in [10] and references demonstrate some important characteristics of this alternative to the classical MoM, and a detailed list of relative advantages and disadvantages is given in [11, page 11.4], but the literature is missing direct comparisons in specific applications of the classical MoM and the BL-MoM.{{Citation needed|date=August 2023}}


==Examples==
==Examples==
An example application of the method of moments is to estimate polynomial probability density distributions. In this case, an approximate polynomial of order <math>N</math> is defined on an interval <math>[a,b]</math>. The method of moments then yields a system of equations, whose solution involves the inversion of a [[Hankel matrix]].<ref name="PolyD2">J. Munkhammar, L. Mattsson, J. Rydén (2017) "Polynomial probability distribution estimation using the method of moments". PLoS ONE 12(4): e0174573. https://doi.org/10.1371/journal.pone.0174573</ref>
An example application of the method of moments is to estimate polynomial probability density distributions. In this case, an approximating polynomial of order <math>N</math> is defined on an interval <math>[a,b]</math>. The method of moments then yields a system of equations, whose solution involves the inversion of a [[Hankel matrix]].<ref name="PolyD2">J. Munkhammar, L. Mattsson, J. Rydén (2017) "Polynomial probability distribution estimation using the method of moments". PLoS ONE 12(4): e0174573. https://doi.org/10.1371/journal.pone.0174573</ref>


=== Proving the central limit theorem ===
=== Proving the central limit theorem ===
{{Disputed section|date=October 2024|Chebyshev proof}}
Let <math>X_1, X_2, \cdots</math> be independent random variables with mean 0 and variance 1, then let <math>S_n := \frac{1}{\sqrt n}\sum_{i=1}^n X_i</math>. We can compute the moments of <math>S_n</math> as<math display="block">E[S_n^0] = 1, E[S_n^1] = 0, E[S_n^2] = 1, E[S_n^3] = 0, \cdots</math>Explicit expansion shows that<math display="block">E[S_n^{2k+1}] = 0; \quad E[S_n^{2k}] = \frac{\binom{n}{k}\frac{(2k)!}{2^k}}{n^{k}} = \frac{n(n-1)\cdots(n-k+1)}{n^k} (2k-1)!!</math>where the numerator is the number of ways to select <math>k</math> distinct pairs of balls by picking one each from <math>2k</math> buckets, each containing balls numbered from <math>1</math> to <math>n</math>. At the <math>n \to \infty</math> limit, all moments converge to that of a standard normal distribution. More analysis then show that this convergence in moments imply a convergence in distribution.
Let <math>X_1, X_2, \cdots</math> be independent random variables with mean 0 and variance 1, then let <math>S_n := \frac{1}{\sqrt n}\sum_{i=1}^n X_i</math>. We can compute the moments of <math>S_n</math> as<math display="block">E[S_n^0] = 1, E[S_n^1] = 0, E[S_n^2] = 1, E[S_n^3] = 0, \cdots</math>Explicit expansion shows that<math display="block">E[S_n^{2k+1}] = 0; \quad E[S_n^{2k}] = \frac{\binom{n}{k}\frac{(2k)!}{2^k}}{n^{k}} = \frac{n(n-1)\cdots(n-k+1)}{n^k} (2k-1)!!</math>where the numerator is the number of ways to select <math>k</math> distinct pairs of balls by picking one each from <math>2k</math> buckets, each containing balls numbered from <math>1</math> to <math>n</math>. At the <math>n \to \infty</math> limit, all moments converge to that of a standard normal distribution. More analysis then show that this convergence in moments imply a convergence in distribution.


Essentially this argument was published by Chebyshev in 1887.<ref>{{Cite book |last=Fischer |first=Hans |url=https://www.worldcat.org/oclc/682910965 |title=History of the central limit theorem : from classical to modern probability theory |date=2011 |publisher=Springer |isbn=978-0-387-87857-7 |location=New York |chapter=4. Chebyshev’s and Markov’s Contributions |oclc=682910965}}</ref>
Essentially this argument was published by Chebyshev in 1887.<ref>{{Cite book |last=Fischer |first=Hans |url=https://www.worldcat.org/oclc/682910965 |title=History of the central limit theorem : from classical to modern probability theory |date=2011 |publisher=Springer |isbn=978-0-387-87857-7 |location=New York |chapter=4. Chebyshev’s and Markov’s Contributions |oclc=682910965}}</ref>


=== Uniform distribution ===
=== Uniform distribution ===
Line 76: Line 85:
{{Reflist}}
{{Reflist}}


===References needing to be wikified===
{{Statistics}}
<!-- The following appear to be referenced from section "Alternative method of moments". -->
[4] Pearson, K. (1936), "Method of Moments and Method of Maximum Likelihood", ''Biometrika'' 28(1/2), 35–59.

[5] Lindsay, B.G. & Basak P. (1993). “Multivariate normal mixtures: a fast consistent method of moments”, ''Journal of the American Statistical Association'' '''88''', 468–476.

[6] Quandt, R.E. & Ramsey, J.B. (1978). “Estimating mixtures of normal distributions and switching regressions”, ''Journal of the American Statistical Association'' '''73''', 730–752.

[7] <nowiki>https://real-statistics.com/distribution-fitting/method-of-moments/</nowiki>

[8] Hansen, L. (1982). “Large sample properties of generalized method of moments estimators”, ''Econometrica'' '''50''', 1029–1054.

[9] Lindsay, B.G. (1982). “Conditional score functions: some optimality results”, ''Biometrika'' '''69''', 503–512.

[10] Gardner, W.A., “Design of nearest prototype signal classifiers”, ''IEEE Transactions on Information Theory'' 27 (3), 368–372,1981

[11] [https://cyclostationarity.com Cyclostationarity]


{{DEFAULTSORT:Method Of Moments (Statistics)}}
{{DEFAULTSORT:Method Of Moments (Statistics)}}

Latest revision as of 13:37, 31 October 2024

In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.

It starts by expressing the population moments (i.e., the expected values of powers of the random variable under consideration) as functions of the parameters of interest. Those expressions are then set equal to the sample moments. The number of such equations is the same as the number of parameters to be estimated. Those equations are then solved for the parameters of interest. The solutions are estimates of those parameters.

The method of moments was introduced by Pafnuty Chebyshev in 1887 in the proof of the central limit theorem. The idea of matching empirical moments of a distribution to the population moments dates back at least to Pearson.[1]

Method

[edit]

Suppose that the parameter = () characterizes the distribution of the random variable .[1] Suppose the first moments of the true distribution (the "population moments") can be expressed as functions of the s:

Suppose a sample of size is drawn, resulting in the values . For , let

be the j-th sample moment, an estimate of . The method of moments estimator for denoted by is defined to be the solution (if one exists) to the equations:[2]


The method described here for single random variables generalizes in an obvious manner to multiple random variables leading to multiple choices for moments to be used. Different choices generally lead to different solutions [5], [6].

Advantages and disadvantages

[edit]

The method of moments is fairly simple and yields consistent estimators (under very weak assumptions), though these estimators are often biased.

It is an alternative to the method of maximum likelihood.

However, in some cases the likelihood equations may be intractable without computers, whereas the method-of-moments estimators can be computed much more quickly and easily. Due to easy computability, method-of-moments estimates may be used as the first approximation to the solutions of the likelihood equations, and successive improved approximations may then be found by the Newton–Raphson method. In this way the method of moments can assist in finding maximum likelihood estimates.

In some cases, infrequent with large samples but less infrequent with small samples, the estimates given by the method of moments are outside of the parameter space (as shown in the example below); it does not make sense to rely on them then. That problem never arises in the method of maximum likelihood[3] Also, estimates by the method of moments are not necessarily sufficient statistics, i.e., they sometimes fail to take into account all relevant information in the sample.

When estimating other structural parameters (e.g., parameters of a utility function, instead of parameters of a known probability distribution), appropriate probability distributions may not be known, and moment-based estimates may be preferred to maximum likelihood estimation.

Alternative method of moments

[edit]

The equations to be solved in the method of moments (MoM) are in general nonlinear and there are no generally applicable guarantees that tractable solutions exist[citation needed]. But there is an alternative approach to using sample moments to estimate data model parameters in terms of known dependence of model moments on these parameters, and this alternative requires the solution of only linear equations or, more generally, tensor equations. This alternative is referred to as the Bayesian-Like MoM (BL-MoM), and it differs from the classical MoM in that it uses optimally weighted sample moments. Considering that the MoM is typically motivated by a lack of sufficient knowledge about the data model to determine likelihood functions and associated a posteriori probabilities of unknown or random parameters, it is odd that there exists a type of MoM that is Bayesian-Like. But the particular meaning of Bayesian-Like leads to a problem formulation in which required knowledge of a posteriori probabilities is replaced with required knowledge of only the dependence of model moments on unknown model parameters, which is exactly the knowledge required by the traditional MoM [1],[2],[5]–[9]. The BL-MoM also uses knowledge of a priori probabilities of the parameters to be estimated, when available, but otherwise uses uniform priors.[citation needed]

The BL-MoM has been reported on in only the applied statistics literature in connection with parameter estimation and hypothesis testing using observations of stochastic processes for problems in Information and Communications Theory and, in particular, communications receiver design in the absence of knowledge of likelihood functions or associated a posteriori probabilities [10] and references therein. In addition, the restatement of this receiver design approach for stochastic process models as an alternative to the classical MoM for any type of multivariate data is available in tutorial form at the university website [11, page 11.4]. The applications in [10] and references demonstrate some important characteristics of this alternative to the classical MoM, and a detailed list of relative advantages and disadvantages is given in [11, page 11.4], but the literature is missing direct comparisons in specific applications of the classical MoM and the BL-MoM.[citation needed]

Examples

[edit]

An example application of the method of moments is to estimate polynomial probability density distributions. In this case, an approximating polynomial of order is defined on an interval . The method of moments then yields a system of equations, whose solution involves the inversion of a Hankel matrix.[2]

Proving the central limit theorem

[edit]

Let be independent random variables with mean 0 and variance 1, then let . We can compute the moments of asExplicit expansion shows thatwhere the numerator is the number of ways to select distinct pairs of balls by picking one each from buckets, each containing balls numbered from to . At the limit, all moments converge to that of a standard normal distribution. More analysis then show that this convergence in moments imply a convergence in distribution.

Essentially this argument was published by Chebyshev in 1887.[3]

Uniform distribution

[edit]

Consider the uniform distribution on the interval , . If then we have

Solving these equations gives

Given a set of samples we can use the sample moments and in these formulae in order to estimate and .

Note, however, that this method can produce inconsistent results in some cases. For example, the set of samples results in the estimate even though and so it is impossible for the set to have been drawn from in this case.

See also

[edit]

References

[edit]
  1. ^ Kimiko O. Bowman and L. R. Shenton, "Estimator: Method of Moments", pp 2092–2098, Encyclopedia of statistical sciences, Wiley (1998).
  2. ^ J. Munkhammar, L. Mattsson, J. Rydén (2017) "Polynomial probability distribution estimation using the method of moments". PLoS ONE 12(4): e0174573. https://doi.org/10.1371/journal.pone.0174573
  3. ^ Fischer, Hans (2011). "4. Chebyshev's and Markov's Contributions". History of the central limit theorem : from classical to modern probability theory. New York: Springer. ISBN 978-0-387-87857-7. OCLC 682910965.

References needing to be wikified

[edit]

[4] Pearson, K. (1936), "Method of Moments and Method of Maximum Likelihood", Biometrika 28(1/2), 35–59.

[5] Lindsay, B.G. & Basak P. (1993). “Multivariate normal mixtures: a fast consistent method of moments”, Journal of the American Statistical Association 88, 468–476.

[6] Quandt, R.E. & Ramsey, J.B. (1978). “Estimating mixtures of normal distributions and switching regressions”, Journal of the American Statistical Association 73, 730–752.

[7] https://real-statistics.com/distribution-fitting/method-of-moments/

[8] Hansen, L. (1982). “Large sample properties of generalized method of moments estimators”, Econometrica 50, 1029–1054.

[9] Lindsay, B.G. (1982). “Conditional score functions: some optimality results”, Biometrika 69, 503–512.

[10] Gardner, W.A., “Design of nearest prototype signal classifiers”, IEEE Transactions on Information Theory 27 (3), 368–372,1981

[11] Cyclostationarity