Home » Physics » Scientific method » Data dredging and p-hacking

Data dredging and p-hacking

Data dredging (also called p-hacking) is the practice of mining data to uncover patterns that can be presented as statistically significant, without first devising a specific hypothesis as to the underlying causality.

Causality is the actual relationship between causes and effects.

Finding patterns in data is a major part of science. But if one is disingenuous, one can always look for random pieces of information, and find when – by sheer coincidence – they are similar.

If one isn’t careful, one could incorrectly infer that one variable is causing a change in the other.


Correlation and causation is important – otherwise we get data dredging

Interactive statistics app : Hack Your Way To Scientific Glory

p-Hack Your Way To Scientific Glory



How easy is it to fool others with p-hacking?

John Bohannon writes

I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here’s How.

“Slim by Chocolate!” the headlines blared. A team of German researchers had found that people on a low-carb diet lost weight 10 percent faster if they ate a chocolate bar every day. It made the front page of Bild, Europe’s largest daily newspaper, just beneath their update about the Germanwings crash. From there, it ricocheted around the internet and beyond, making news in more than 20 countries and half a dozen languages. It was discussed on television news shows. It appeared in glossy print, most recently in the June issue of Shape magazine (“Why You Must Eat Chocolate Daily,” page 128). Not only does chocolate accelerate weight loss, the study found, but it leads to healthier cholesterol levels and overall increased well-being. The Bild story quotes the study’s lead author, Johannes Bohannon, Ph.D., research director of the Institute of Diet and Health: “The best part is you can buy chocolate everywhere.”

I am Johannes Bohannon, Ph.D. Well, actually my name is John, and I’m a journalist. I do have a Ph.D., but it’s in the molecular biology of bacteria, not humans. The Institute of Diet and Health? That’s nothing more than a website.

Other than those fibs, the study was 100 percent authentic. My colleagues and I recruited actual human subjects in Germany. We ran an actual clinical trial, with subjects randomly assigned to different diet regimes. And the statistically significant benefits of chocolate that we reported are based on the actual data. It was, in fact, a fairly typical study for the field of diet research. Which is to say: It was terrible science. The results are meaningless, and the health claims that the media blasted out to millions of people around the world are utterly unfounded.

Here’s how we did it…. [read the article!] … I know what you’re thinking. The study did show accelerated weight loss in the chocolate group—shouldn’t we trust it? Isn’t that how science works?

Here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives.

Think of the measurements as lottery tickets. Each one has a small chance of paying off in the form of a “significant” result that we can spin a story around and sell to the media. The more tickets you buy, the more likely you are to win. We didn’t know exactly what would pan out—the headline could have been that chocolate improves sleep or lowers blood pressure—but we knew our chances of getting at least one “statistically significant” result were pretty good.

Whenever you hear that phrase, it means that some result has a small p value. The letter p seems to have totemic power, but it’s just a way to gauge the signal-to-noise ratio in the data. The conventional cutoff for being “significant” is 0.05, which means that there is just a 5 percent chance that your result is a random fluctuation. The more lottery tickets, the better your chances of getting a false positive. So how many tickets do you need to buy?

P(winning) = 1 – (1 – p)n

With our 18 measurements, we had a 60% chance of getting some“significant” result with p < 0.05. (The measurements weren’t independent, so it could be even higher.) The game was stacked in our favor.


Why do scientists need to understand statistics?

Dorothy Bishop, Professor of Developmental Neuropsychology, writes:

The Amazing Significo: why researchers need to understand poker

…Quite simply p-values are only interpretable if you have the full context: if you pull out the ‘significant’ variables and pretend you did not test the others, you will be fooling yourself – and other people – by mistaking chance fluctuations for genuine effects. As we showed with our simulations, it can be extremely difficult to detect this kind of p-hacking, even using statistical methods such as p-curve analysis, which were designed for this purpose. This is why it is so important to either specify statistical tests in advance (akin to predicting which people will get three of a kind), or else adjust p-values for the number of comparisons in exploratory studies…

The Amazing Significo: why researchers need to understand poker

Additional resources

The Extent and Consequences of P-Hacking in Science. Megan L. Head , Luke Holman, Rob Lanfear, Andrew T. Kahn, Michael D. Jennions. March 13, 2015

A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until non-significant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.


Director’s Blog: P-Hacking, Thomas Insel, 11/14/14, National Institutes of Mental Health

Common misconceptions about data analysis and statistics, Motulsky H. J.
J Pharmacol Exp Ther. 2014 Oct;351(1):200-5. doi: 10.1124/jpet.114.219170.

Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, however, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1) P-hacking, which is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want; 2) overemphasis on P values rather than on the actual size of the observed effect; 3) overuse of statistical hypothesis testing, and being seduced by the word “significant”; and 4) over-reliance on standard errors, which are often misunderstood.


Learning Standards

2016 Massachusetts Science and Technology/Engineering Standards
Science and engineering practices:
Construct, analyze, and/or interpret graphical displays of data and/or large data sets to identify linear and nonlinear relationships.
• Use graphical displays (e.g., maps, charts, graphs, and/or tables) of large data sets to identify temporal and spatial relationships.
• Distinguish between causal and correlational relationships in data.
• Analyze and interpret data to provide evidence for phenomena.



%d bloggers like this: