The Dangers of Faith In Data
Here are the notes from my Warm Gun SF keynote, based on one of the stories from The Year Without Pants (An Amazon.com best book of the year). Thanks to folks that were there for being a great crowd.
“Confidence and faith in data grows in relation to your distance from the collection of it”
- The Data Paradox. The more data you have, the larger the role intuition plays in deciding how to interpret, explain and apply the data. Intuition must be used to decide which data to focus on, how to collect or organize the data, and what samples to use and exclude. For example, in A/B testing, you use intuition to decide what B is. Underneath all of our rational intellect is intuition, which influences our “rational” behavior far more than we admit. Often data yields unavoidable tradeoffs where two or more options are equally viable and someone must make a judgment call beyond the data. In strict paradox form: the more data you have the less you know.
- No team or organization is truly data-driven. Data is not conscious: it is merely a list of inert, dead numbers. Data doesn’t have a brain and therefore can’t drive or lead anything. At best you want to be data influenced, where (living) decision makers have good data available that they can use to help answer good questions about what they’re doing, how well it’s being done and what perhaps they should be doing in the future. All data has bias and blind-spots and a truly data-driven organization will drive itself into the ground chasing the illusion of purely objective truth.
- Data is a flashlight. Data gives you specific information about a singular vector of information. Data, like a flashlight, is only as useful as the person wielding it and the person interpreting what it shows. It has no magical powers. To get good information you want multiple sources so you can triangulate information and compensate for the inherent biases each kind of data has. For example, A/B testing can tell you things customer interviews can’t and vice versa. Clickthrough data does not tell you what happened between clicks (Did they punch a wall or jump for joy?) One analytical model suggests one hypothesis but a different method can suggest another with the same data.
- Ban the phrase “The data says.” Data can’t say anything for the same reason it can’t drive anything: data is inert. People, including data experts or growth hackers, can never speak singularly for the data. At best they are interpreters, offering one interpretation of what the useful narrative story derived from the data is (if there is one at all). Better experts yield better interpretations but never is their interpretation the only one available. If anyone utters “the data says” they are pretending data can have a singular interpretation which it never does, and this false faith prevents the asking of good questions, such as: is there an equally valid hypothesis based on this data that suggests a different conclusion than yours? (The answer is often yes).
- Cognitive Bias pollutes our view of data. We know our brains are kludges, vulnerable to optical illusions. We also have blind spots in our cognition called cognitive biases. The most common one regarding data is confirmation bias, where we seek only to validate our preconceptions and stop doing analysis as soon as we have a singular hypothesis that supports our assumptions. Another dangerous bias is narrative bias, which is our attraction to stories. We love stories that are easy to understand, easy to say and that makes us feel good, and will prefer telling these stories over more complex ones even if the complex ones are truer.
- Cui Bono -“who benefits?” Who paid for this data? What was their reason for paying for it? What ambitions do they have? Certain outcomes of data benefit the people asking for the data and the people who capture the data, biasing the results. In political elections it’s common to see competing campaigns find very different data for who is in the lead, each finding their own candidate in front. Another example is how company founders will select data that makes them sound the best when pitching for funding (And VC firms will listen for the kinds of data they want to hear). Generally in life when you’re confused about why a strange decision was made, or there is grand incompetence, or nothing is happening at all, ask cui bono?
Also see: Data Death Spiral, How To Call BS On A Guru.
You can watch the actual keynote presentation below:
Hi Scott, I thought this was up to your usual standard, as was the Hiroshima one.
If people failed to comment then it is not because you failed to communicate, it’s because you communicated so well.
Kind of like how people don’t clap after a really excellent performance, or don’t speak after a good quote. I for one was silent after reading Hiroshima.
Great post, Scott!
Addendum: If you use data in your organization to make decisions, you should be required to demonstrate you understand Simpson’s Paradox.
I particularly like #4, the attribution of agency to a non-agentive (inert) object. Reminds me of the way so many financial analysts – and non-financial [non-]analysts – attribute agency to the stock market, e.g., referring to what “the market thinks” or “the market says”.