A player would stand and face three big doors: Door no.1, Door no. 2, and Door no.3. There is a highly desirable prize behind one of the doors-something like a new car- and a goat behind the other two. The player choose one of doors and would get the contents behind that door.
As each player stand facing the doors, he or she have 1 in 3 chance of choosing the door taht would be openned to reveal the valuable prize. After the player choose a door, one of remaining doors which the goat behind will be opened. The player will be asked whether he would like to change his mind and switch doors. Reminder, both doors are still closed, and the only new information the contestant had received is that a goat showed up behind on one of the doors that he didn’t pick.
Should he switch ?
Statistics can be overly accessible in the sense that anyone with data and a computer can do sophisticated statistical procedures with a few keystrokes.
The problem is that if the data are poor, or if the statistical techniques are used improperly, the conclusions can be wildly misleading and even potentially dangerous.
People Who Take Short Breaks at Work Are Far More Likely to Die of Cancer. Imagine that headline popping up while you are surfing the Web. According to a seemingly impressive study of 36,000 office workers (a huge data set!), those workers who reported leaving their offices to take regular ten-minute breaks during the workday were 41 percent more likely to develop cancer over the next five years than workers who don’t leave their offices during the workday.
Clearly we need to act on this kind of finding—perhaps some kind of national awareness campaign to prevent short breaks on the job ?
Or maybe we just need to think more clearly about what many workers are doing during that ten-minute break ?
In fact, many of those workers who report leaving their offices for short breaks are huddled outside the entrance of the building smoking cigarettes (creating a haze of smoke through which the rest of us have to walk in order to get in or out). Hence it’s probably the cigarettes, and not the short breaks from work, that are causing the cancer.
Performance of Quarterback in US Football
The National Football League’s “passer rating”—a statistic that condenses a quarterback’s performance into a single number—is a somewhat flawed and arbitrary measure of a quarterback’s game day performance.
The same data (completion rate, average yards per pass attempt, percentage of touchdown passes per pass attempt, and interception rate) could be combined in a different way, such as giving greater or lesser weight to any of those inputs, to generate a different but equally credible measure of performance.
Yet anyone who has watched football recognizes that it’s handy to have a single number that can be used to encapsulate a quarterback’s performance.
Is the quarterback rating perfect? No. Statistics rarely offers a single “right” way of doing anything. Does it provide meaningful information in an easily accessible way? Absolutely.
It’s a nice tool for making a quick comparison between the performances of two quarterbacks on a given day.
During the 2011 playoffs, the Bears played the Packers; the Packers won. Chicago Bears quarterback Jay Cutler had a passer rating of 31.8. In contrast, the Packers quarterback Aaron Rodgers had a passer rating of 55.4.
Similarly, Jay Cutler’s performance to that in a game earlier in the season against Green Bay, when he had a passer rating of 85.6. That tells you a lot of what you need to know in order to understand why the Bears beat the Packers earlier in the season but lost to them in the playoffs.
It is both the strength and the weakness of any descriptive statistic. One number tells you that Jay Cutler was outgunned by Aaron Rodgers in the Bears’ playoff loss. On the other hand, that number won’t tell you whether a quarterback had a bad break, such as throwing a perfect pass that was bobbled by the receiver and then intercepted, or whether he “stepped up” on certain key plays (since every completion is weighted the same, whether it is a crucial third down or a meaningless play at the end of the game), or whether the defense was terrible. And so on.
Gini Index
The Gini index measures how evenly wealth (or income) is shared within a country on a scale from zero to one. The statistic can be calculated for wealth or for annual income, and it can be calculated at the individual level or at the household level. (All of these statistics will be highly correlated but not identical.) The Gini index, like the passer rating, has no intrinsic meaning; it’s a tool for comparison.
A country in which every household had identical wealth would have a Gini index of zero. By contrast, a country in which a single household held the country’s entire wealth would have a Gini index of one.
The Gini index for the United States was .41 in 1997 and grew to .45 over the next decade. (The most recent CIA data are for 2007.) This tells us in an objective way that while the United States grew richer over that period of time, the distribution of wealth grew more unequal.
stretch. Sweden has had significant economic growth over the past two decades, but the Gini index in Sweden actually fell from .25 in 1992 to .23 in 2005, meaning that Sweden grew richer and more equal over that period.
Is the Gini index the perfect measure of inequality? Absolutely not—just as the passer rating is not a perfect measure of quarterback performance. But it certainly gives us some valuable information on a socially significant phenomenon in a convenient format.
What is the point? The point is that statistics helps us process data, which is really just a fancy name for information. Sometimes the data are trivial in the grand scheme of things, as with sports statistics. Sometimes they offer insight into the nature of human existence, as with the Gini index.
Consider the following disparate questions:
How can we catch schools that are cheating on their standardized tests?
How does Netflix know what kind of movies you like?
How can we figure out what substances or behaviors cause cancer, given that we cannot conduct cancer-causing experiments on humans?
Does praying for surgical patients improve their outcomes?
Is there really an economic benefit to getting a degree from a highly selective college or university?
What is causing the rising incidence of autism? Statistics can help answer these questions
The world is producing more and more data, ever faster and faster. Statistics is the most powerful tool we have for using information to some meaningful end, whether that is identifying underrated baseball players or paying teachers more fairly.
A bowling score is a descriptive statistic. So is a batting average. Most American sports fans over the age of five are already conversant in the field of descriptive statistics. We use numbers, in sports and everywhere else in life, to summarize information.
Of course, baseball fans have also come to recognize that descriptive statistics other than batting average may better encapsulate a player’s value on the field.
We evaluate the academic performance of high school and college students by means of a grade point average, or GPA. By graduation, when high school students are applying to college and college students are looking for jobs, the grade point average is a handy tool for assessing their academic potential.
But it’s not perfect . The GPA does not reflect the difficulty of the courses that different students may have taken. How can we compare a student with a 3.4 GPA in classes that appear to be relatively nonchallenging and a student with a 2.9 GPA who has taken calculus, physics, and other tough subjects?
Descriptive statistics exist to simplify, which always implies some loss of nuance or detail. Anyone working with numbers needs to recognize as much.
One key function of statistics is to use the data we have to make informed conjectures about larger questions for which we do not have full information. In short, we can use data from the “known world” to make informed inferences about the “unknown world.”
Political Poll
*a methodologically sound poll of 1,000 households will produce roughly the same results as a poll that attempted to contact every household in America.
Does smoking cigarettes cause cancer?
We have an answer for that question—but the process of answering it was not nearly as straightforward as one might think.
The scientific method dictates that if we are testing a scientific hypothesis, we should conduct a controlled experiment in which the variable of interest (e.g., smoking) is the only thing that differs between the experimental group and the control group.
If we observe a marked difference in some outcome between the two groups (e.g., lung cancer), we can safely infer that the variable of interest is what caused that outcome.
We cannot do that kind of experiment on humans. If our working hypothesis is that smoking causes cancer, it would be unethical to assign recent college graduates to two groups, smokers and nonsmokers, and then see who has cancer at the twentieth reunion.
Now, you might point out that we do not need to conduct an ethically dubious experiment to observe the effects of smoking. Couldn’t we just skip the whole fancy methodology and compare cancer rates at the twentieth reunion between those who have smoked since graduation and those who have not?
No. Smokers and nonsmokers are likely to be different in ways other than their smoking behavior.
We cannot treat humans like laboratory rats. As a result, statistics is a lot like good detective work. The data yield clues and patterns that can ultimately lead to meaningful conclusions.
Even in the best of circumstances, statistical analysis rarely unveils “the truth.” We are usually building a circumstantial case based on imperfect data. As a result, there are numerous reasons that intellectually honest individuals may disagree about statistical results or their implications. At the most basic level, we may disagree on the question that is being answered.
There are limits on the data we can gather and the kinds of experiments we can perform.
We conduct statistical analysis using the best data and methodologies and resources available. The approach is not like addition or long division, in which the correct technique yields the “right” answer and a computer is always more precise and less fallible than a human.
Statistical analysis is more like good detective work (hence the commercial potential of CSI: Regression Analysis ). Smart and honest people will often disagree about what the data are trying to tell us.
But who says that everyone using statistics is smart or honest? The reality is that you can lie with statistics. Or you can make inadvertent errors. In either case, the mathematical precision attached to statistical analysis can dress up some serious nonsense.
To summarize huge quantities of data.
To make better decisions.
To answer important social questions.
To recognize patterns that can refine how we do everything.
To evaluate the effectiveness of policies, programs, drugs medical procedures, and other innovations.
Let us ponder for a moment two seemingly unrelated questions:
A reasonable answer—though by no means the “right” answer—would be to calculate the change in per capita income in the United States over the course of a generation, which is roughly thirty years.
Per capita income is a simple average: total income divided by the size of the population.
By that measure, average income in the United States climbed from $7,787 in 1980 to $26,487 in 2010.
To begin with, the figures above are not adjusted for inflation. (A per capita income of $7,787 in 1980 is equal to about $19,600 when converted to 2010 dollars.)
Per capita income merely takes all of the income earned in the country and divides by the number of people, which tells us absolutely nothing about who is earning how much of that income—in 1980 or in 2010. Explosive growth in the incomes of the top 1 percent can raise per capita income significantly without putting any more money in the pockets of the other 99 percent. In other words, average income can go up without helping the average American.
The batting average is a gross simplification of baseball player’s seasons. It is easy to understand, elegant in its simplicity—and limited in what it can tell us.
Baseball experts have a bevy of descriptive statistics that they consider to be more valuable than the batting average.
What the two questions have in common is that they can be used to illustrate the strengths and limitations of descriptive statistics, which are the numbers and calculations we use to summarize raw data.
From baseball to income, the most basic task when working with data is to summarize a great deal of information.
We perform calculations that reduce a complex array of data into a handful of numbers that describe those data, just as we might encapsulate a complex, multifaceted Olympic gymnastics performance with one number: 9.8.
The good news is that these descriptive statistics give us a manageable and meaningful summary of the underlying phenomenon. That’s what this chapter is about.
The bad news is that any simplification invites abuse. Descriptive statistics can be like online dating profiles: technically accurate and yet pretty darn misleading.
we should not gauge the economic health of the American middle class by looking at per capita income. Because there has been explosive growth in incomes at the top end of the distribution—CEOs, hedge fund managers, and famous athletes—the average income in the United States could be heavily skewed by the megarich, making it look a lot like the bar stools with Bill Gates at the end.
The data was collected on the weights of 250 people on an airplane headed for Boston, and The other data was collected on the weights of a sample of 250 qualifiers for the Boston Marathon. Now assume that the mean weight for both groups is roughly the same, say 155 pounds. On the basis of the descriptive tools introduced so far, the weights of the airline passengers and the marathoners are nearly identical. But they’re not.
Suppose the mean score on the SAT math test is 500 with a standard deviation of 100. As with height, the bulk of students taking the test will be within one standard deviation of the mean, or between 400 and 600. How many students do you think score 720 or higher? Probably not very many, since that is more than two standard deviations above the mean.
Suppose that Granola Cereal A contains 31 milligrams more sodium than Granola Cereal B. Unless you know an awful lot about sodium (and the serving sizes for granola cereal), that statement is not going to be particularly informative.
Or what if my friend Al earned $53,000 less this year than last year? Should we be worried about Al? Or is he a hedge fund manager for whom $53,000 is a rounding error in his annual compensation?
Suppose that last year the firm earned 27 cents—essentially nothing. This year the firm earned 39 cents—also essentially nothing. Yet the company’s profits grew from 27 cents to 39 cents, which is technically a 46% increase.
The advantage of any index is that it consolidates lots of complex information into a single number. We can then rank things that otherwise defy simple comparison
Alas, the disadvantage of any index is that it consolidates lots of complex information into a single number. There are countless ways to do that; each has the potential to produce a different outcome.
To anyone who has ever contemplated dating, the phrase he’s got a great personality usually sets off alarm bells, not because the description is necessarily wrong, but for what it may not reveal, such as the fact that the guy has a prison record or that his divorce is “not entirely final.”
We don’t doubt that this guy has a great personality;
we are wary that a true statement, the great personality, is being used to mask or obscure other information in a way that is seriously misleading (assuming that most of us would prefer not to date ex-felons who are still married).
The statement is not a lie per se, meaning that it wouldn’t get you convicted of perjury, but it still could be so inaccurate as to be untruthful.
So it is with statistics. Although the field of statistics is rooted in mathematics, and mathematics is exact, the use of statistics to describe complex phenomena is not exact.
That leaves plenty of room for shading the truth. Mark Twain famously remarked that there are three kinds of lies: lies, damned lies, and statistics.
The crucial distinction between precision and accuracy
Example: Many of the Wall Street risk management models prior to the 2008 financial crisis were quite precise. The concept of value at risk allowed firms to quantify with precision the amount of the firm’s capital that could be lost under different scenarios. The problem was that the supersophisticated models. The math was complex and arcane. The answers it produced were reassuringly precise. But the assumptions about what might happen to global markets that were embedded in the models were just plain wrong, making the conclusions wholly inaccurate in ways that destabilized not only Wall Street but the entire global economy.
Even the most precise and accurate descriptive statistics can suffer from a more fundamental problem: a lack of clarity over what exactly we are trying to define, describe, or explain.
Even when we agree on a single measure of success, say, student test scores, there is plenty of statistical wiggle room.
Our old friends the mean and the median can also be used for nefarious ends.
A great many statistical shenanigans arise from apples and oranges comparisons.
Example: Hollywood studios may be the most egregiously oblivious to the distortions caused by inflation when comparing figures at different points in time—and deliberately so. What were the top five highest-grossing films (domestic) of all time as of 2011?
The most accurate way to compare commercial success over time would be to adjust ticket receipts for inflation. Earning $100 million in 1939 is a lot more impressive than earning $500 million in 2011. So what are the top grossing films in the U.S. of all time, adjusted for inflation
In real terms, Avatar falls to number 14; Shrek 2 falls all the way to 31st.
Example: In a similar vein, your kindhearted boss might point out that as a matter of fairness, every employee will be getting the same raise this year, 10 percent. What a magnanimous gesture—except that if your boss makes $1 million and you make $50,000, his raise will be $100,000 and yours will be $5,000. The statement “everyone will get the same 10 percent raise this year” just sounds so much better than “my raise will be twenty times bigger than yours.” Both are true in this case.
There is a common business aphorism: You can’t manage what you can’t measure. True. But you had better be darn sure that what you are measuring is really what you are trying to manage.
Statistics measure the outcomes that matter; incentives give us a reason to improve those outcomes. But in some cases, just to make the statistics look better. That’s the bad news.
Example: If school administrators are evaluated—and perhaps even compensated—on the basis of the high school graduation rate for students in a particular school district, they will focus their efforts on boosting the number of students who graduate. Of course, they may also devote some effort to improving the graduation rate, which is not necessarily the same thing. For example, students who leave school before graduation can be classified as “moving away” rather than dropping out.
This is not merely a hypothetical example; it is a charge that was leveled against former secretary of education Rod Paige during his tenure as the Houston school superintendent. Paige was hired by President George W. Bush to be U.S. secretary of education because of his remarkable success in Houston in reducing the dropout rate and boosting test scores.
Example: Cardiologists obviously care about their “scorecard.” However, the easiest way for a surgeon to improve his mortality rate is not by killing fewer people; presumably most doctors are already trying very hard to keep their patients alive. The easiest way for a doctor to improve his mortality rate is by refusing to operate on the sickest patients.
According to a survey conducted by the School of Medicine and Dentistry at the University of Rochester, the scorecard, which ostensibly serves patients, can also work to their detriment: 83 percent of the cardiologists surveyed said that, because of the public mortality statistics, some patients who might benefit from angioplasty might not receive the procedure; 79 percent of the doctors said that some of their personal medical decisions had been influenced by the knowledge that mortality data are collected and made public. The sad paradox of this seemingly helpful descriptive statistic is that cardiologists responded rationally by withholding care from the patients who needed it most.
Example: Rankings of Universities
Statistical malfeasance has very little to do with bad math. If anything, impressive calculations can obscure nefarious motives. The fact that you’ve calculated the mean correctly will not alter the fact that the median is a more accurate indicator. Judgment and integrity turn out to be surprisingly important. A detailed knowledge of statistics does not deter wrongdoing any more than a detailed knowledge of the law averts criminal behavior. With both statistics and crime, the bad guys often know exactly what they’re doing!