Science
Related: About this forumLive by statistics, die by statistics
Pretty interesting result -- scientists doing some science on themselves
Now a paper has come out that ought to make some psychologists, who use that p value criterion a lot in their work, feel a little concerned. The researchers analyzed the distribution of reported p values in 3 well-regarded journals in experimental psychology, and described the pattern.
The circles represent the actual distribution of p values in the published papers. Remember, 0.05 is the arbitrarily determined standard for significance; you dont get accepted for publication if your observations dont rise to that level.
Notice that unusual and gigantic hump in the distribution just below 0.05? Uh-oh.
I repeat, uh-oh. That looks like about half the papers that report p values just under 0.05 may have benefited from a little adjustment.
What that implies is that investigators whose work reaches only marginal statistical significance are scrambling to nudge their numbers below the 0.05 level. Its not necessarily likely that theyre actually making up data, but there could be a sneakier bias: oh, we almost meet the criterion, lets add a few more subjects and see if we can get it there. Oh, those data points are weird outliers, lets throw them out. Oh, our initial parameter of interest didnt meet the criterion, but this other incidental observation did, so lets report one and not bother with the other.
http://scienceblogs.com/pharyngula/2012/08/13/live-by-statistics-die-by-statistics/
Notice that if you are reading a bunch of science papers reporting p-values around 0.5%, the implication is that about 1/20 of those papers got a spurious result! The only question is, which ones...
Another fun fact about the famous "0.05 pval threshold": Suppose you are running an algorithm (like decision tree model training) that does *many* p-value tests. For example, training a single interior tree node split can easily involve thousands of such tests on a large data set, each of which has a 5% chance of being spurious, not "true." A pval threshold of 0.05 is a common parameter in such training algorithms, and if you run the numbers on this, you find that the odds of your chosen split being "truly significant" can be a lot less than 95%. An adjustment such as the Bonferroni adjustment is often employed to compensate for this problem.
tblue
(16,350 posts)I got no idea what this is. I might even totally agree with it, but I'd have to have it explained to me first.
You must have a lot of brilliant friends. Good for you, really! I have a BA and an MBA, but this is a few grade levels above me.
phantom power
(25,966 posts)A lot of science experiments are designed this way: "I'm going to collect two samples of data (under two different conditions). If my theory is correct, those two samples will have a different average. Or, a different standard deviation, etc."
So, you can imagine: if you collect two samples like that, there is *some* probability that by bad luck, you'll get two different averages by random chance, and you'll be reporting that your theory is correct when it actually wasn't.
There's an entire (enormous) sub-field of statistics that does nothing except provide mathematical ways for us to measure that probability we were just unlucky. That's what is often called the p-value. So if you measure a p-value of 0.05, then it's saying the probability is 5% (1/20) that you just got unlucky.
As you can see, you'd like your experiments to give you p-values as small as you can get them, because that means your probability of getting a spurious result is as small as possible. The particle physics guys typically won't say they've "confirmed" a new particle until their statistics are reporting p-values of 0.000001, or something. Other branches of science make do with larger p-values like 0.05, partly because smaller sample sizes result in larger p-values and collecting very large samples in things like psychology, or biology, or crashing cars for safety measures, is sometimes impossible to do.
FiveGoodMen
(20,018 posts)Nice, clear, brief, jargon-free.
Scuba
(53,475 posts)drm604
(16,230 posts)Maybe researchers are less likely to publish "statistically insignificant", i.e. p>=0.05, results. That could account for the sudden drop at higher values. Of course the sudden drop to the left of the 'hump" isn't explained by that.
phantom power
(25,966 posts)if it was a thing like "we aren't going to publish unless it's < 0.05" then I would expect to see all the dots to the left of 0.05 being measurably higher than the black line of prediction.
As Myers points out, there are a lot of non-deliberate ways this might be happening, but that are technically forms of cooking the data. One thing that could be happening is that you collect some data. You find you didn't quite make 0.05. you collect some more, and retest, and keep doing that. There isn't really anything wrong with that, however it happens that this injects a kind of experimental bias, because each time you run the experiment, you are increasing your odds of getting the p-value you want. The Bonferroni adjustment is meant to try and correct for that. If you're *really* being a stickler, you would apply the Bonferroni adjustment to account for how many times you ran your experiment before you reported your final p-value.
To give a better answer, I'd have to read more about how the authors collected *their* data
bananas
(27,509 posts)Tarmo Toikkanen 3:25 PM
Umm, APA guidelines instruct to use the prechosen signifigance value, most often 0.05 in the text, rather than the exact p-value. So this result is obvious and misleading.
Jim__
(14,075 posts)p value.
Thanks for this. Only papers that quoted the precise p value were included in the analysis. I'll add a clarification to make that clear.
Thor_MN
(11,843 posts)there's bound to be some splashes near the toilet.