Statistical significance testing and p-values: defending the indefensible? A discussion paper and position statement
Statistical significance testing and p-values: defending the indefensible? A discussion paper and position statement
Much statistical teaching and many research reports focus on the ‘null hypothesis significance test’. Yet the correct meaning and interpretation of statistical significance tests is elusive. Misinterpretations are both common and persistent, leading many to question whether significance tests should be used at all. While most take aim at the arbitrary declaration of p<0.05 as a threshold for determining ‘significance’, others extend the critique to suggest the ‘p-value’ should be dispensed with entirely.
P-values and significance tests are still widely used as if they give a measure of the size and importance of relationships even though the misunderstanding has been observed and discussed for many years. We argue that they are intrinsically misleading. Point estimates of relationships and confidence intervals, give direct information about the effect and the uncertainty of the estimate without recourse to interpreting how a particular p-value might have arisen or indeed referring to them at all.
In this paper we briefly outline some of the problems with significance testing, offer a number of examples selected from a recent issue of the International Journal of Nursing Studies and discuss some proposed responses to these problems. Our paper concludes by offering some guidance to authors reporting statistical tests in journals and presents a position statement that has been adopted by the International Journal of Nursing Studies to guide its’ authors in reporting the results of statistical analyses.
While stopping short of calling for an outright ban on reporting p-values and significance tests we urge authors (and journals) to place more emphasis on measures of effect and estimates of precision / uncertainty and, following the position of the American Statistical Association emphasise that authors (and readers) should avoid using 0.05 or any other cut off for a p-value as the basis for a decision about the meaningfulness/importance of an effect. If point estimates and confidence intervals are used then the p-value may be redundant, and can be omitted from reports. When authors talk about ‘significance’ they need to be explicit when referring to statistical significance and we recommend authors adopt the language of ‘importance’ when talking about effect sizes.
Griffiths, Peter
ac7afec1-7d72-4b83-b016-3a43e245265b
Needleman, Jack
6ed963ce-6d89-456a-bcab-0f3e732bde09
Griffiths, Peter
ac7afec1-7d72-4b83-b016-3a43e245265b
Needleman, Jack
6ed963ce-6d89-456a-bcab-0f3e732bde09
Griffiths, Peter and Needleman, Jack
(2019)
Statistical significance testing and p-values: defending the indefensible? A discussion paper and position statement.
International Journal of Nursing Studies.
(doi:10.1016/j.ijnurstu.2019.07.001).
Abstract
Much statistical teaching and many research reports focus on the ‘null hypothesis significance test’. Yet the correct meaning and interpretation of statistical significance tests is elusive. Misinterpretations are both common and persistent, leading many to question whether significance tests should be used at all. While most take aim at the arbitrary declaration of p<0.05 as a threshold for determining ‘significance’, others extend the critique to suggest the ‘p-value’ should be dispensed with entirely.
P-values and significance tests are still widely used as if they give a measure of the size and importance of relationships even though the misunderstanding has been observed and discussed for many years. We argue that they are intrinsically misleading. Point estimates of relationships and confidence intervals, give direct information about the effect and the uncertainty of the estimate without recourse to interpreting how a particular p-value might have arisen or indeed referring to them at all.
In this paper we briefly outline some of the problems with significance testing, offer a number of examples selected from a recent issue of the International Journal of Nursing Studies and discuss some proposed responses to these problems. Our paper concludes by offering some guidance to authors reporting statistical tests in journals and presents a position statement that has been adopted by the International Journal of Nursing Studies to guide its’ authors in reporting the results of statistical analyses.
While stopping short of calling for an outright ban on reporting p-values and significance tests we urge authors (and journals) to place more emphasis on measures of effect and estimates of precision / uncertainty and, following the position of the American Statistical Association emphasise that authors (and readers) should avoid using 0.05 or any other cut off for a p-value as the basis for a decision about the meaningfulness/importance of an effect. If point estimates and confidence intervals are used then the p-value may be redundant, and can be omitted from reports. When authors talk about ‘significance’ they need to be explicit when referring to statistical significance and we recommend authors adopt the language of ‘importance’ when talking about effect sizes.
Text
Statistical significance testing and p-values defending the indefensible A discussion paper and position statement
- Accepted Manuscript
More information
Accepted/In Press date: 13 July 2019
e-pub ahead of print date: 22 July 2019
Identifiers
Local EPrints ID: 432737
URI: http://eprints.soton.ac.uk/id/eprint/432737
ISSN: 0020-7489
PURE UUID: fcc33c0d-0a11-4b36-bffd-e13e1bc2aaa7
Catalogue record
Date deposited: 25 Jul 2019 16:30
Last modified: 16 Mar 2024 08:02
Export record
Altmetrics
Contributors
Author:
Jack Needleman
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics