When is good data impossible?

Posted on June 1, 2011October 24, 2012 in Culture, Philosophyby Scott Berkun

I was thinking recently about skepticism in small sample sizes. How its wise to doubt big claims based on scant data.

But what things in life can never have large samples? Some ideas or beliefs will never be shared by many people no matter how useful or right they are.

The scientific method is based in part of repeatability. That an experiment should produce consistent results and that’s how we know a discovery is true. But what about things that are true, but are simply hard to repeat? To be provocative, what if perpetual motion is possible, but only once every 100 years? Or if UFOs exist, but they have equipment to ensure they only appear when crazy people with bad cameras are around? Sure, these things are very unlikely, but are impossible to prove.

The lack of data about a premise does not guarantee it isn’t true, it only guarantees it hasn’t been proven to be true. And my point is, some things that are true will always fall in that gap.

More specific to my half-baked line of inquiry: What situations in life have no possibility for good data, yet demand we make decisions anyway? I think there are more of these situations than we realize.

15 Responses to “When is good data impossible?”

nick June 1, 2011 at 4:56 pm. Permalink.

I have found that it is usually the more important decisions in our lives that we have to make with insufficient or incomplete data.
Reply
Kevin June 1, 2011 at 5:35 pm. Permalink.

Scott, if the hypothesis cannot be duplicated, it has failed. If, per your example, perpetual motion only works every 100 years, the hypothesis should contain the information to duplicate it in that setting. This kind of things happen all the time. Hypothesize, fail to prove, adjust, repeat until it cannot be broken (thus becoming law).
Reply
Greg Linster June 1, 2011 at 5:59 pm. Permalink.

I’ll state the obvious answer: economics. No where is data and statistical analysis more abused than in econometric studies. Good data is nearly impossible to attain in any situation in which there is a high level of complexity (most things in life) and it’s even harder to analyze complexity correctly. Furthermore, good data can still be susceptible to biased uses and interpretations. I actually think that most situations in life have no possibility for good data, but yet we have to make decisions anyway.
Reply
Scott Berkun June 1, 2011 at 9:05 pm. Permalink.

Nick: I agree with you. Care to share what you think some of these ‘important’ decisions are?
Reply
Simon June 2, 2011 at 5:22 am. Permalink.

I think what you are talking about goes pretty much in the direction of Nassim Nichoal Talebs Black Swan (https://www.amazon.com/exec/obidos/ASIN/1400063515/scottberkunco-20/

I happen to be more and more inclined to accepts Karl Poppers concept of falsification. According to that, theories can only be divided into two:
– Hypothesis that were falsified already and
– Hypothesis that were not yet falsified
Reply
Eric Nehrlich June 2, 2011 at 6:47 am. Permalink.

I went to a talk by Bob Sutton once where he cited a quote by Andy Grove, CEO of Intel:
“I think it is very important for you to do two things: act on your temporary conviction as if it was a real conviction; and when you realize that you are wrong, correct course very quickly.

Investment decisions or personnel decisions and prioritization don’t wait for the picture to be clarified. You have to make them when you have to make them. You take your shots and clean up the bad ones later.

(So you have to keep your own spirits up even though you well understand that you don’t know what you’re doing)”

I think this is one of the hardest things to learn as I progress in the business world – many situations I’m asked to handle are novel, because routine decisions are handled by bureaucracy in the form of established processes or at a more junior level. Taking action when I know I don’t have enough data requires a leap of fath.
Reply
Ben Kovitz June 2, 2011 at 7:17 am. Permalink.

Don’t pretty much all real-life decisions involve massively incomplete information?

Every event in human history is unique. Every person is unique. Entrepreneurship usually aims to capitalize on an opportunity that has never existed before and will never exist again—and might not really exist now, for all the entrepreneur knows. The state of U.S. culture that produced the exact combination of shoes, chair upholstery, rugs, doilies, kitchen colors, bread-box decals, and newspapers on the floor in this photo happened only once, in 1952: http://www.shorpy.com/node/2778?size=_original . Could people have intelligently addressed the situation between the U.S. and the U.S.S.R. during the Cold War merely by referring to gigantic data sets of identical other situations?

Usually, the more important the decision, the less information: choosing a career, getting married, accepting a job (especially your first job), getting into a conversation with a stranger at a coffeehouse (sometimes that leads in completely unexpected and life-changing directions and sometimes it doesn’t). Will going to that new restaurant give you an enjoyable meal, an awful meal, an argument, a new friend, an allergic reaction of a kind you’ve never heard of, a new artistic inspiration, nothing memorable, a law suit, a fatal disease?

Everyone who is alive and paying attention already knows all this.

The idea of repeatability or statistical extrapolation as the essence of science is just one of those dumb philosophical theories, like logical positivism, where the reality is ignored in favor of an imaginary world chosen to be much more suitable for management. All that really exists are the individual cases. Samples are small sets of individual cases from which you mathematically extrapolate to other, much larger sets of individual cases, by assuming a convenient mathematical regularity of a sort that is well suited to extrapolation. The real thinking in statistical extrapolation is not crunching the numbers, it’s intelligently choosing the sample and the math appropriately, and making a reasonable judgement about how much that’s really telling you about the big population you’re making inferences about. Reasoning about how to statistically extrapolate from a sample is not itself statistical extrapolation from a sample.
Reply
nick June 2, 2011 at 10:49 am. Permalink.

Scott, it’s hard to come up with a definition, but I’ve noticed that unobtainable information itself makes decisions harder and important. (I realize this is circular reasoning, but life isn’t always logical) Also, decisions with a long-term effect are definitely more important than those with a short-term effect (e.g. choosing a city/country to relocate vs. deciding what to buy at the grocery store and what to cook for dinner).

I think the information that is unobtainable is usually information which relies on events that are currently happening, or that we know will happen. Even for events that have happened, sometimes it is difficult to see the effect until some time has passed.

Also, sometimes the availability of new information can even turn good decisions made with sufficient information into bad ones.
Reply
Carol Cartwright June 4, 2011 at 4:55 pm. Permalink.

Actually the scientific method is based on formulating falsifiable and testable hypotheses…then having at it. The problem is that 20th century research methods have been too crude and simplistic to answer questions that involve something other than manipulating people in shopping. :D

The social “sciences” have developed “quasi experiemental” methods, and there are a vast array of qualitative research methods to help someone test hypotheses and draw good conclusions based on observations. The problem with that is that it’s generally a lot more work than people want to go through. Folks like easy and fast answers. Immediate gratification. And I’d guess that most people are not rational and empirical. That is to say, they don’t want to reserve judgment till the facts are in. They want mirrors, not knowledge.
Reply
Mike Nitabach June 5, 2011 at 2:53 pm. Permalink.

A corollary to this that many (most?) people have trouble with is that a decision that you find out after-the-fact to have been wrong might nevertheless have been exactly the right decision when you made it. Second-guessing baseball fans are the absolute worst with this.
Reply
Aase June 10, 2011 at 1:06 am. Permalink.

Many decisions, private and societal, are made based onm insufficient data. For instance: Choosing a career, getting married, changing jobs or having a baby in the private sphere, legislation against drugs, economy in general and especially involving predictions of the future and many other political and ideological issues in the public sphere. I think the most important factor here is to know the validity of the (often meagre) data you have, and in this respect a lot of sins are made. For instance, economic theory is often presented as if it was as “scientifically sound” as physics, but this is not the case, of course. Thus: The most important thing is to have data of known quality. Often, decisions are made giving too much weight to unreliable data.
Reply
rodica June 12, 2011 at 6:29 pm. Permalink.

All data is good, but you need to ask the right questions in order to receive the data parts of an answer that makes sense within a context.

I agree with Nick and others on this thread that in business and in life, you don’t always have all the data to make decisions about life, career or which new product you should launch. Many of these, you’ll notice, are complex questions that depend on a lot of unknowns (for which no one will have all the data).

In science, when you have a big question, such as “what makes disease X happen?”, people tend to break that big question down into as simple questions as possible and gather data on those small questions in order to build up a better picture.

The same principles applies everywhere, really. Good data isn’t impossible, I just think a more constructive way to look at the problem is to learn how to ask better questions. :) Better questions = better data :)
Reply
Andrew June 22, 2011 at 4:32 am. Permalink.

As a Physics student (with training in statistical methods), I find it very bizarre that the application of statistics is so far ranging these days. Don’t get me wrong, statistics is a very powerful tool, but I find myself increasingly cynical towards its usage in everyday life.

For example, in Physics you might deal with a (fairly basic) study of background radiation. You measure the number decays over an extended period, and you get an average decay rate (how often your geiger counter makes a click). This is a simple statistical scenario, because I only have one variable to measure, the number of decays.

In real life, there are hundreds and thousands of variables. With every extra variable you take into account, you increase the likelihood of discovering a pattern in your data that is totally random and therefore useless.

Us in particular (humans) are exceedingly complicated. People often forget that we have no ‘Theory of the Human’; we don’t know how we work (unlike the background radiation case I mentioned earlier). With radiation, we know the decays happen randomly (you have no idea when any one decay might happen) but you can average it out over a period of time.

Similarly, imagine you want to study how often a people go food shopping. Understanding the reasons as to why some goes shopping when they do is insanely complicated: when was their last visit? How much food do they have? Do they eat out much? How many people is this food for? How much can they spend? Are they busy? Are they tired? Do they work late? Are the shops close? How many shops are in the area? Are there any deals on? How was the last harvest? What season is it? To name a few…

There are too many questions obviously, so you throw your hands up in the air, say we are as good as random, and work out the average time between food sopping trips. This is the essence of why I believe statistics will not prove as beneficial outside of Physics as it has within Physics.

The reasoning is a simplification of the problem, I admit. Our computers do not have enough processing power, our theories of ourselves are not good enough (yet?). Are we clever enough to understand something as complicated as a human in the first place? Even if we are, can we understand how billions of humans interplay amongst each other and the world around them? Will we ever be able to generate statistically valid theories of society, economics, etc.? (I’m not saying give up by any means, I merely questioning the abilities of a human, not the possibility of a genius (i.e. Einstein) or some superior being/ genetically engineered human being able to do so).

This is (ironically) why subjects such as sociology, psychology, economics are actually harder (to find answers for) in my opinion than, say, Physics. I don’t think people realise this, as the big questions in these subjects have never been asked, let alone answered. Statistical data can take us as far as we like, the limitation is the person manipulating that data.
Reply
Erik Larson (@erikdlarson) October 24, 2012 at 10:09 pm. Permalink.

It seems evident that all decisions that have a large and long-term impact on our lives are impossible for individuals to make based on ‘data’ or ‘information.’ Life decisions are beyond analysis in a scientific or mathematical sense, as several have mentioned here. Even if a super-human genius were to build a model of human behavior and the universe that we could all use to make the right decisions, it might work for all of us but it would rarely if ever work for any one of us.

In the end we still must make decisions. But our criteria must live above the level of science and math, at the level of wisdom, so that we make the best decisions we can while avoiding regret and feeling the greatest sense of happiness available to us.

I’m working on figuring out what decisions best fit this mold. I’ll let you know what I find.
Reply