Talk:P-value: Difference between revisions

Content deleted Content added

Inline

Revision as of 02:07, 18 March 2022

This is the talk page for discussing improvements to the P-value article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1, 2: 90 days

Template:Vital article

This article is of interest to multiple WikiProjects.

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Statistics B‑class Top‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
B	This article has been rated as B-class on Wikipedia's content assessment scale.
Top	This article has been rated as Top-importance on the importance scale.

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Mathematics B‑class Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
B	This article has been rated as B-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-priority on the project's priority scale.

Misleading examples

The examples given are rather misleading. For example in the section about the rolling of two dice the articles says. "In this case, a single roll provides a very weak basis (that is, insufficient data) to draw a meaningful conclusion about the dice. "

However it makes no attempt to explain why this is so - and a slight alteration of the conditions of the experiment renders this statement false.

Consider a hustler/gambler who has two sets of apparently identical dice - one of which is loaded and the other fair. If he forgets which is which - and then rolls one set and gets two sixes immediately then it is quite clear that he has identified the loaded set.

The example relies upon the underlying assumption that dice are almost always fair - and therefore it would take more than a single roll to convince you that they are not. However this assumption is never clarified - which might mislead people into supposing that a 0.05 p value would never be sufficient to establish statistical significance. Richard Cant — Preceding unsigned comment added by 152.71.70.77 (talk)

That cheating gambler would be wrong in his conclusion 1 out of 36 times though Yinwang888 (talk) 16:31, 24 November 2021 (UTC)[reply]

Recent edits

It is rather traditional that values of 5% and 1% are chosen as significance level. In fact the value of p itself is an indication of the strenght of the observed result. Whether or not the null hypothesis may be rejected is also a matter of 'taste'. But anyway does a small p-value suggest that the observed data is sufficiently inconsistent with the null hypothesis.Madyno (talk) 09:47, 8 December 2021 (UTC)[reply]

The .05 level is by far the most conventional level. The .01 level is sometimes used but much more rarely. But in any case, the "Usage" section was mainly just a repetition of what had already been said in the "Basics Concepts" section and the "Definition and Interpretation" section, so I've trimmed it down considerably. A section that's just restating what's already been said doesn't need to give so much detail (if it needs to exist at all). 23.242.195.76 (talk) 02:28, 15 December 2021 (UTC)[reply]

Does the hyphenization indeed vary?

"As far as I'm aware, APA guidelines say you have to italicize every statistic, period. Saying "p value" is no different than saying "DP value". I mean, it's not a symptom of dropping the hyphen, but merely a situation where the topic was the value of p, rather than the p-value. Whether that makes sense, i.e., that there really exists a difference between these situations which justifies the different styling, I do not know. But I'm under the impression that that's how people use it. It's the rationalization that I have been able to do, since I have seen many articles formatted under APA style that use "p-value" at some point. ~victorsouza (talk) 16:57, 17 March 2022 (UTC)[reply]

@@ Line 26: / Line 26: @@
 That cheating gambler would be wrong in his conclusion 1 out of 36 times though [[User:Yinwang888|Yinwang888]] ([[User talk:Yinwang888|talk]]) 16:31, 24 November 2021 (UTC)
-== Alternating Coin Flips Example Should Be Removed ==
-"By the second test statistic, the data yield a low p-value, suggesting that the pattern of flips observed is very, very unlikely. There is no "alternative hypothesis" (so only rejection of the null hypothesis is possible) and such data could have many causes. The data may instead be forged, or the coin may be flipped by a magician who intentionally alternated outcomes.
-This example demonstrates that the p-value depends completely on the test statistic used and illustrates that p-values can only help researchers to reject a null hypothesis, not consider other hypotheses."
-Why would there be "no alternative hypothesis?" Whenever there is a null hypothesis (H0), there must be an alternative hypothesis ("not H0"). In this case, the null hypothesis is that the coin-flipping is not biased toward alternation. Consequently, the alternative hypothesis is that the coin-flipping IS biased toward alternation. It seems that author of this passage did not understand what "alternative hypothesis" means. The same confusion is apparent in the claim that p-values can't help researchers "consider other hypotheses." There are other problems with the passage as well (e.g., the unencyclopedic phrase "very, very" and, as another editor noted, a highly arbitrary description). I suggest getting rid of the whole section, which is completely unsourced, is full of questionable claims, is likely to cause confusion, and serves no apparent function in the article.  <!-- Template:Unsigned IP --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/23.242.198.189|23.242.198.189]] ([[User talk:23.242.198.189#top|talk]]) 01:50, 24 July 2019 (UTC)</small> <!--Autosigned by SineBot-->
-Also, the very concept of coin-flipping that is biased toward alternation is quite odd and not particularly realistic outside of a fake-data scenario. The examples of trick coins that are biased towards one side or the other are much more intuitive, and thus much more useful in my opinion. [[Special:Contributions/23.242.198.189|23.242.198.189]] ([[User talk:23.242.198.189|talk]]) 06:55, 24 July 2019 (UTC)
-:What on Earth is "in my opinion" supposed to mean in an unsigned "contribution"?
-:FWIW, I agree with that opinion. I have neither seen nor ever heard of a coin being biased to alternate and cannot imagine how one might be made.
-:[[User:David Lloyd-Jones|David Lloyd-Jones]] ([[User talk:David Lloyd-Jones|talk]]) 08:19, 4 May 2020 (UTC)
-:I imagine that "in my opinion" means the same thing in an unsigned contribution that it means in a signed contribution. I don't see why that should be confusing or why there would be a need to put quotes around "contribution." [[Special:Contributions/99.47.245.32|99.47.245.32]] ([[User talk:99.47.245.32|talk]]) 20:16, 2 January 2021 (UTC)
-Actually, most of the examples are problematic, completely unsourced, and should be removed. For instance, the "sample size dependence" example says: "If the coin was flipped only 5 times, the p-value would be 2/32 = 0.0625, which is not significant at the 0.05 level. But if the coin was flipped 10 times, the p-value would be 2/1024 ≈ 0.002, which is significant at the 0.05 level." Huh? How can you say what the p-value will be without knowing what the results of the coin-flips will be? And the "one roll of a pair of dice" example appears to be nonsensical; it's not even clear how the test statistic (the sum of the rolled numbers) is supposed to relate to the null hypothesis that the dice are fair, and the idea of computing a p-value from a single data point is very odd in itself. Thus, the example doesn't seem very realistic or useful for understanding how p-values work and actually risks causing confusion and misunderstanding about how p-values work. Therefore, I suggest that the article would be improved by removing all the "examples" except for the one entitled "coin flipping." [[Special:Contributions/131.179.60.237|131.179.60.237]] ([[User talk:131.179.60.237|talk]]) 20:42, 24 July 2019 (UTC)
-:This dreadful article nowhere tells us what a p-value test is, nor how one is calculated. It merely pretends to. The whole thing is just a lot of blather about some p-value tests people have reported under the pretence of telling us "what p-values do" or something of the sort.
-:The promiscuous and incompetent use of commas leaves two or three lists of supposed distinctions muddy and ambiguous.
-:Given the somewhat flamboyant and demonstrative use of X's and Greek letters, my impression is that this was written by a statistician of only moderate competence who regards himself, almost certainly a ''him'' self, as so far above us all that he need not actually focus on the questions at hand.
-:[[User:David Lloyd-Jones|David Lloyd-Jones]] ([[User talk:David Lloyd-Jones|talk]]) 08:19, 4 May 2020 (UTC)
-:: Indeed,the article is hopeless. I made some changes a year ago (see "Talk Archive 2") and explained on the talk pages why and what I had done, but that work has been undone by editors who did not understand the difficulties I had referred to. I think the article should start by describing the concept of statistical model: namely a family of possible probability distributions of some data. Then one should talk about a hypothesis: that's a subset of possible probability distributions. Then a test statistic. Finally one can give the correct definition of p-value as the largest probability which any null hypothesis model gives to the value of the statistic actually observed, or larger. I know it is a complex and convoluted definition. But one can give lots of examples of varying level of complexity. Finally one can write statements about p-values which are actually true, such as for instance the fact that *if* the null hypothesis *fixes* the probability distribution of your statistic, and if that statistic is continuously distributed, *then* your p-value is uniformly distributed between 0 and 1 if the null hypothesis is true. I know that "truth" is not a criterion which Wikipedia editors may use. But hopefully, enough reliable sources exist to support my claims. What is presently written in the article on this subject is nonsense. [[User:Gill110951|Richard Gill]] ([[User talk:Gill110951|talk]]) 14:52, 22 June 2020 (UTC)
-::I have made a whole lot of changes. [[User:Gill110951|Richard Gill]] ([[User talk:Gill110951|talk]]) 16:48, 22 June 2020 (UTC)
-The article is moving in a good direction, thanks Richard Gill. A point about reader expectations with regards to the article: talk of p-values almost always occurs in the context on NHST; the 'Basic concepts' section is essentially an outline of NHST, but the article nowhere names NHST and [[Null hypothesis significance testing]] is a redirect to [[Statistical inference]], an article that is probably not the best introduction to the topic (we also have a redirect from the mishyphenated [[ Null-hypothesis significance-testing]] to [[Statistical hypothesis testing]]). I suggest tweaking the 'Basic concepts' section so that NHST is defined there and have NHST redirect to this article. &mdash; [[User:Chalst|''Charles Stewart'']] <small>[[User_talk:Chalst|(talk)]]</small> 19:51, 22 June 2020 (UTC)
-: Thank Chalst; I have made some more changes in the same direction, namely to distinguish between original data X and a statistic T. This also led to further adjustments to the material on one-sided versus two-sided tests and then to the example of 20 coin tosses. I'm glad more people are looking at this article! It's very central in statistics. The topic is difficult, no doubt about it. [[User:Gill110951|Richard Gill]] ([[User talk:Gill110951|talk]]) 12:28, 30 June 2020 (UTC)
-I reconfigured the Basic Concepts section, building on Gill110951's work and Chalst's comments. I tried to clarify what null hypothesis testing is, what we do in it, and the importance of p-values to it. I focused on stating p-values as rejecting the null hypothesis, and tried to explain the importance of also looking at real-world relevance. (I'm not sure if I should put this here or in a separate section, but it seemed a continuation of what Gill110951 did)  [[User:TryingToUnderstand11|TryingToUnderstand11]] ([[User talk:TryingToUnderstand11|talk]]) 09:55, 20 August 2021 (UTC)
 == Recent edits ==