Jump to content

Talk:P-value: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
 
(14 intermediate revisions by 8 users not shown)
Line 1: Line 1:
{{Talk header}}
{{Talk header|archive_age=3|archive_units=months|archive_bot=Lowercase sigmabot III}}
{{WikiProject banner shell|collapsed=yes|class=B|vital=yes|1=
{{Vital article|level=4|topic=Mathematics|class=B}}
{{WikiProject banner shell|collapsed=yes|1={{WikiProject Statistics|class=B|importance=top}}
{{WikiProject Statistics|importance=top}}
{{maths rating|frequentlyviewed=yes|field=probability and statistics|class=B|importance=mid}}
{{WikiProject Mathematics|importance=mid}}
}}
}}
{{annual readership}}
{{annual readership}}
Line 14: Line 14:
}}
}}


== Null hypothesis vs. "our hypothesis" ==
== Does the hyphenization indeed vary? ==


I am referring to the sentence:
"As far as I'm aware, APA guidelines say you have to italicize every statistic, period. Saying "''p'' value" is no different than saying "''DP'' value". I mean, it's not a symptom of dropping the hyphen, but merely a situation where the topic was the value of ''p'', rather than the ''p''-value. Whether that makes sense, i.e., that there really exists a difference between these situations which justifies the different styling, I do not know. But I'm under the impression that that's how people use it. It's the rationalization that I have been able to do, since I have seen many articles formatted under APA style that use "p-value" at some point. [[User:Victorvscn|~victorsouza]] ([[User talk:Victorvscn|talk]]) 16:57, 17 March 2022 (UTC)


<code>As our statistical hypothesis will, by definition, state some property of the distribution, the [[null hypothesis]] is the default hypothesis under which that property does not exist.
In short, yes, hyphenization does indeed vary. I've seen "p value" with and without a hyphen in APA journals. In AMA journals (such as JAMA), I've typically seen "p value" or "P value" unhyphenated. But in American Statistical Association sources, I nearly always see "p-value" hyphenated. Regarding your claim that "APA guidelines say you have to italicize every statistic, period," there's no such guideline. In fact, the official APA style blog (https://blog.apastyle.org/apastyle/hyphenation/) explicitly recommends "t test" not be hyphenated unless used as an adjective (e.g., "t-test results"). [[Special:Contributions/172.91.120.102|172.91.120.102]] ([[User talk:172.91.120.102|talk]]) 05:01, 25 April 2022 (UTC)
</code>


In subsequent examples, the Null hypothesis is always stated explicitly, e.g. as "data comes from the standard normal distribution", "the coin is fair", etc. There is no example when a property of the distribution is stated and the null hypothesis would be defined as non-existence or a logical negation of the property. Furthermore, "data comes not from N(0,1)" makes little sense as "our statistical hypothesis" because it is too unspecific.
== continuous variables ==


This is especially confusing in the beginning when the reader does not know what is going to be tested.
Note that in the statistics of continuous variables, the probability that a variable will have any specific value is zero. (Unless it comes from a delta function.) In a statistical sense < and <= are the same. In numerical approximations, one might have to be more careful, but then that comes from the process of doing the approximation, not from the statistics itself. [[User:Gah4|Gah4]] ([[User talk:Gah4|talk]]) 20:56, 25 April 2022 (UTC)
:
:True. And in nearly all real-word circumstances, saying something like p ≤ .05 is indeed equivalent to saying something like p < .05. But not all variables are continuous. For some situations involving count data, you can end up with p-values that are rational numbers, and in theory the p-value could even be exactly .05. So I see the purpose of the "less than or equal to" language for the sake of more universal technical correctness, even if only to accommodate unlikely theoretical cases. [[Special:Contributions/172.91.120.102|172.91.120.102]] ([[User talk:172.91.120.102|talk]]) 06:06, 26 April 2022 (UTC)
::
::Hmm, OK. In most problems that I know, either the p variable is continuous, or close enough to continuous that assuming it is, is close enough. In the cases were it isn't, I am not so sure it makes sense either way. That is, if you have a problem where the difference between < and <= seems important, there is probably something else to worry about more. [[User:Gah4|Gah4]] ([[User talk:Gah4|talk]]) 05:04, 27 April 2022 (UTC)


Would it be better to stick with the Null hypothesis only, state that the test can reject it or not reject it and leave the logic implications to the reader? Or perhaps add a clear example where we can infer acceptance of "our hypothesis" based on rejection of the null hypothesis? [[User:Alexander Shekhovtsov|Alexander Shekhovtsov]] ([[User talk:Alexander Shekhovtsov|talk]]) 12:38, 14 June 2023 (UTC)
:I agree that there is likely no practical example where there is a consequential distinction between p < .05 and p ≤ .05. Even when the p-value is from a discrete distribution and is a rational number, I don't think it's plausible for it to be exactly .05 except in a highly contrived theoretical scenario. That said, I don't really see a drawback to using the ≤ symbol rather than the < symbol if that placates some theoretical quibble. [[Special:Contributions/134.69.229.134|134.69.229.134]] ([[User talk:134.69.229.134|talk]]) 19:24, 27 April 2022 (UTC)


== Errors in the article ==
== p-value, P-value, p value, P value? ==


What is the best way to use here? I have seen all of these. [[Special:Contributions/130.226.41.15|130.226.41.15]] ([[User talk:130.226.41.15|talk]]) 11:54, 16 June 2023 (UTC)
The article says the p-value is the 'Probability of obtaining a real-valued test statistic at least as extreme as the one actually obtained'. That's only true under the assumption that the null hypothesis actually holds. This is key point!


:This is addressed at [[P-value#cite_note-2]].
https://amstat.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108?needAccess=true is a good source on all this.. [[User:Sciencecritical|Sciencecritical]] ([[User talk:Sciencecritical|talk]]) 23:44, 28 September 2022 (UTC)
:However, all three examples are from the USA. There should be more balance, preferably with an international source (ISO standard, or IMS, or IMU, or ISI?), or at least from some other countries/regions.
:''Cf.'' [https://doi.org/10.2307/2681417 Recommended Standards for Statistical Symbols and Notation. COPSS Committee on Symbols and Notation] from 1965.
:Personally I prefer lowercase ("''p''-value" or "''p'' value"), which I believe is more common, except when ''p'' has already been assigned to another variable. Italics should always be used for the "''p''"!
:—DIV ([[Special:Contributions/1.145.104.186|1.145.104.186]] ([[User talk:1.145.104.186|talk]]) 01:50, 14 August 2024 (UTC))


== Unnecessary hedging: "Usually, T is a test statistic." ==
First paragraph after "p-value as the statistic for performing significance tests" confuses significance tests with hypotheses tests. The p-value is a result of a significance test, not a statistic for performing statistical tests.


As a reader, if I read "usually" that suggests an exception. But there is no counter-example. A p-value is ALWAYS derived from a a test statistic. Therefore, Wikipedia should drop "usually" in this sentence.
A good reference clarifying the differences between significance tests and hypothesis tests:

Biau DJ, Jolles BM, Porcher R. P value and the theory of hypothesis testing: an explanation for new researchers. Clin Orthop Relat Res. 2010 Mar;468(3):885-92. doi: 10.1007/s11999-009-1164-4. PMID: 19921345; PMCID: PMC2816758.
I propose the sentence say "As stated above, T is a test statistic." This matches a sentence earlier in the article, this sentence is present: "The p-value is a function of the chosen test statistic and is therefore a random variable." [[User:DavidCJames|DavidCJames]] ([[User talk:DavidCJames|talk]]) 22:09, 29 June 2024 (UTC)

:[[WP:BOLD|Be bold]]. —DIV ([[Special:Contributions/1.145.104.186|1.145.104.186]] ([[User talk:1.145.104.186|talk]]) 01:54, 14 August 2024 (UTC))

Latest revision as of 01:54, 14 August 2024

Null hypothesis vs. "our hypothesis"

[edit]

I am referring to the sentence:

As our statistical hypothesis will, by definition, state some property of the distribution, the null hypothesis is the default hypothesis under which that property does not exist.

In subsequent examples, the Null hypothesis is always stated explicitly, e.g. as "data comes from the standard normal distribution", "the coin is fair", etc. There is no example when a property of the distribution is stated and the null hypothesis would be defined as non-existence or a logical negation of the property. Furthermore, "data comes not from N(0,1)" makes little sense as "our statistical hypothesis" because it is too unspecific.

This is especially confusing in the beginning when the reader does not know what is going to be tested.

Would it be better to stick with the Null hypothesis only, state that the test can reject it or not reject it and leave the logic implications to the reader? Or perhaps add a clear example where we can infer acceptance of "our hypothesis" based on rejection of the null hypothesis? Alexander Shekhovtsov (talk) 12:38, 14 June 2023 (UTC)[reply]

p-value, P-value, p value, P value?

[edit]

What is the best way to use here? I have seen all of these. 130.226.41.15 (talk) 11:54, 16 June 2023 (UTC)[reply]

This is addressed at P-value#cite_note-2.
However, all three examples are from the USA. There should be more balance, preferably with an international source (ISO standard, or IMS, or IMU, or ISI?), or at least from some other countries/regions.
Cf. Recommended Standards for Statistical Symbols and Notation. COPSS Committee on Symbols and Notation from 1965.
Personally I prefer lowercase ("p-value" or "p value"), which I believe is more common, except when p has already been assigned to another variable. Italics should always be used for the "p"!
—DIV (1.145.104.186 (talk) 01:50, 14 August 2024 (UTC))[reply]

Unnecessary hedging: "Usually, T is a test statistic."

[edit]

As a reader, if I read "usually" that suggests an exception. But there is no counter-example. A p-value is ALWAYS derived from a a test statistic. Therefore, Wikipedia should drop "usually" in this sentence.

I propose the sentence say "As stated above, T is a test statistic." This matches a sentence earlier in the article, this sentence is present: "The p-value is a function of the chosen test statistic and is therefore a random variable." DavidCJames (talk) 22:09, 29 June 2024 (UTC)[reply]

Be bold. —DIV (1.145.104.186 (talk) 01:54, 14 August 2024 (UTC))[reply]