I recently read this post about p-hacking (see also: data dredging, fishing, snooping). Two things that I found to be noteworthy were an interactive example of how p-hacking works, and a description of an experiment where different research teams analyzed the same data set:
“Twenty-nine teams with a total of 61 analysts took part. The researchers used a wide variety of methods, ranging — for those of you interested in the methodological gore — from simple linear regression techniques to complex multilevel regressions and Bayesian approaches. They also made different decisions about which secondary variables to use in their analyses.
Despite analyzing the same data, the researchers got a variety of results. Twenty teams concluded that soccer referees gave more red cards to dark-skinned players, and nine teams found no significant relationship between skin color and red cards.”
To reiterate, all of the methods used were justifiable. There wasn’t any fudging or fabricating data. A group of skilled analysts sat down and came up with 29 defensible methods for analyzing the same data that gave different answers. To me, this is the stuff of existential crises. To quote the article, “[e]very result is a temporary truth”. Which I think is pretty concerning if you’re working in a situation where temporary truths don’t cut it.
Joshua Tewksbury is a biologist who spent 10 years as a professor at the University of Washington before moving to a position with the World Wildlife Fund. About a year ago, he wrote a post about transitioning to an NGO position where, he writes, “[s]cience shows up as just another wrench in the toolkit.” A deeply malleable tool, apparently. On the one hand, it’s troubling to think about making decisions with temporary truths. On the other hand, and this strikes me as almost heretical to type, if you deeply believe in your cause, maybe it’s not so bad to (ethically and with full disclosure) make subjective decisions in how you analyze your data to advance your cause.
After thinking about it for a while, I’m still not sure how bad my crisis should be. In the first post, one of the project leaders is quoted as saying:
“On the one hand, our study shows that results are heavily reliant on analytic choices,” Uhlmann told me. “On the other hand, it also suggests there’s a there there. It’s hard to look at that data and say there’s no bias against dark-skinned players.”
At first pass, this didn’t help me. As somebody who takes comfort in certainty (and don’t most scientists?) the “squint at it” method of assessing data is an endless source of frustration. But I’ve also realized that we might feel confident about one other thing from the soccer data set. No groups concluded that lighter skinned players received more red cards. Maybe there are some relatively permanent truths, it’s just that they don’t answer the question we set out to answer.