Blog Post

The significance of p

After 20 years or so in research, I think that I am finally 95% sure that I understand the significance of p. 

It is often said (as I have done) in many a Methods Section that a “p-value of 0.05 was considered statistically significant”.  I now think this is sloppy writing. 

First an aside about the dangers of a threshold.   When p>0.05 there’s often several (bad) things we might do: 

But this all seems to be because we all use p<0.05 when we mean that α=0.05 is our threshold for significance. 

The p-value is just a probability value for a test- it is the probability of the null hypothesis being true. 

Remember that hypothesis testing is working to reject the null hypothesis, we do not validate the hypothesis. 

If I hypothesize that A causes B, the Null hypothesis (H0) is that A does not cause B.  The working hypothesis (H1) is that A does cause B). 

So if I say my threshold for statistical significance is 0.05, I’ve made an a-priori determination that I need at least 95% confidence that the null hypothesis is false. 

This means I’ll say A causes B if I have a 95% or greater chance of not making a Type I error.  (A Type I error is the probability of a false positive).  Thus, I have a less than 5% chance of being wrong when I say A causes B. 

(Note, I didn’t say find anything a Type II error…  For that, we’ll need to look at Power (1-β)…  stay tuned…) 

Realizing this has made me try to put real p-values in papers instead of ranges. 

(Well, I’ll put p<0.001 because a less than 1:1000 chance of an error starts to get unhelpful.  But a p=0.01 means a 1:100 chance of a Type I error, a p=0.005 means a 5:1000 or 1:200 chance of a Type I error, etc.  I personally think those values are good to know!) 

Why does this matter? Compare a p=0.051 versus a p=0.049.  You have a 5.1% chance of making an error versus a 4.9% chance of making an error.   

This is a difference that doesn’t seem important to me, but many times, a p=0.051 leads to p-hacking- I.e. a researcher will add one more sample (or exclude a sample) to get p<0.05.  That’s picking an answer you want, not doing research. 

Nonetheless, we typically report a p<0.05 as statistically significant.  It’s a good short hand, but don’t forget what it means.  It is only a threshold and we can always make the threshold bigger (more likely to make an error) or smaller (less likely to  make an error). 

Instead of saying a “p-value of 0.05 was considered statistically significant”, I’ll try to write, “α was set at 0.05”.  Regardless of the p-value, I can compare my p to my α. 

No promises that I’ll do this perfectly, but I’ll start trying to edit my stats sections to read that “α was set to 0.05”. 

tl/dr:  α is a threshold that is arbitrarily set.  A p-value is the probability a test is a true positive and is for an individual test, not the set.  Being bigger than 0.05 (or α) doesn’t make a p-value wrong.

Related Posts