Stop Pissing on Nate Silver
Until a few months ago, Nate Silver was a nerd hero. If you’ve never heard of him, perhaps you’ve heard of sabermetrics (or you’ve seen Moneyball, a movie about sabermetrics). Well, Silver is best known for applying sabermetric models to the study of politics, especially presidential races. The accuracy of Silver’s predictions for the 2008 and 2012 elections earned him accolades and fame.
Silver runs FiveThirtyEight.com, a blog about number crunching and prediction covering most aspects of life including sports and politics. The folks at FiveThirtyEight are basically odds makers and people tend to misunderstand what their products really mean. That’s what this post is about.
Before Tuesday’s election, Silver took quite a bit of criticism for his low certainty in a Clinton win relative to other prediction models. Focusing on one piece in particular that kicked up a lot of dust, HuffPo’s Ryan Grim actually accused Silver of “…Unskewing Polls — All Of Them — In Trump’s Direction”, injecting too much subjective bias into his model. Silver fought back, noting that the way his model weights polls is based on empirical data, but lots of people didn’t buy it.
Silver’s final model gave Clinton a 71.4% chance of winning, but when Grim published his criticism, the number was closer to 65%. This was in stark contrast to others, many of whom gave her odds in the 90s. The final prediction of the New York Times was 85% and the HuffPo model that Grim was so confident in put her at 98%. You might be thinking that Grim is feeling pretty low right now, but I’m not so sure since he hedged his criticisms with:
If [Silver is] right, though, it was just a good guess…
Well, one could argue that nobody was right (except Allan Lichtman, who correctly predicted Trump using only 13 ‘key’ factors, some of which are highly subjective). But that doesn’t mean that everybody was wrong, either, despite the prediction-bashing that I’ve been seeing in my Facebook feed since the election. In Grim’s criticism, he said some things that made me question if he understood the very thing he was writing about and those comments also provide some insight into why people are so upset with Silver post-election.
First, let’s talk very briefly about what these prediction models do. Now, keep in mind that I’m going to oversimplify the process a lot for several reasons, but the details are not particularly relevant to my points.
Most of these models use an aggregate of polls to come up with a prediction. Some, like Silver’s model, include other information (e.g., economic factors) and/or they weight each poll based on its past performance. Silver’s model is highly complex, including a measure of inter-dependency between states. It is in the weighting of polls that Grim’s problem with Silver’s model lies; he felt that Silver’s weights were too subjective and biased toward a Trump win. One Facebook comment I saw even suggested that 538 purposefully made the election look close in order to create anxiety that would keep people clicking on their site.
But here’s what people seem to misunderstand the most about this process. A 55% chance of winning doesn’t necessarily mean “a close race” in terms of votes. Yes, those things go hand-in-hand, but you need to separate them for a deeper understanding. Most people see that 65% probability and their brains translate that number into frequencies — they imagine 65% of people voting for Clinton, which is of course not what the numbers mean. But even if your brain corrects that part of it, you may not be able to shake the concept completely. More on that in a bit.
Most of the modern prediction models do not spit out a victor. The Huffington Post, for example, did not say “Clinton will win”. Well, okay, Grim did:
If you want to put your faith in the numbers, you can relax. She’s got this.
but the model’s output was “Clinton has a 98% chance of winning”. It’s not actually a prediction, but an estimate of the probability that an event will occur. Now, if you have ever read any of my notes on The Odds Must Be Crazy or listened to a segment of it on Skepticality, you know that low-odds events happen all the time. And, quite frankly, the nearly 30% that Silver gave Trump is not even low.
But my point about this is that it is not accurate to say that a prediction like this was “wrong”. I might use that word in the case of HuffPo, whose 98% confidence in Clinton was misplaced, but even then, they gave Trump a 2% chance. That Trump won doesn’t make their prediction “wrong”. They predicted a 2% chance of a Trump win and that’s what happened. That’s how odds work. And in our evaluation of those processes, we have to fight our human brains that want to put things into boxes in irrational ways. [EDIT: If you roll a die, you have only a one-in-six chance of rolling a 6. So if you roll it and a six comes up, will you now say that prediction was wrong? That wouldn’t make sense, right?]
Grim should have known better, but he repeatedly suggested that he didn’t understand the difference between providing odds and picking a winner. He also seemed to think that the election outcome would tell us whose models were “right” — he said so three times in his piece.
Here’s the thing: the outcome of the election doesn’t even tell us for sure which model(s) was better. There just isn’t enough data, which is ironically one of the reasons the models appear to have failed (again, they didn’t ‘fail’).
Yet people are cursing the models and there’s a well-known psychological phenomenon to explain why. Human beings are notoriously bad at understanding and applying information about probability and concepts of uncertainty.
In general, when people hear there is a 10% chance of rain they expect drizzle and when they hear there is a 85% chance of rain, they expect a downpour. But that’s not what the prediction means. An 85% chance of rain means it might not rain, but if I were you, I’d take an umbrella.
Silver’s model also changed as often as new information came in, usually many times during a single day, which does not help people to understand it. At this point, political prediction is more complex than weather prediction, and humans have a difficult time fully understanding that. For example, when asked:
If there is a 55% chance of rain on Saturday and a 75% chance of rain on Sunday, what is the chance that it will rain at some point during the weekend?
you might be tempted to average the two and say 65%. However, the probability of rain over the weekend cannot be less than the probability of rain on a given day, so it must be at least 75%. Or you might be tempted to add them together; there is a famous story of the weatherman who said, “There’s a 50% chance of rain on Saturday and a 50% chance on Sunday, so there’s a 100% chance it will rain this weekend. For those curious, the answer is actually ~88%.
The bottom line is that Nate Silver’s model spit out a 71% chance of a Clinton win. The fact that she did not win is not a failure of the model. It’s just what happened. He also said there was a 10.5% chance that she would win the popular vote but lose in the electoral college. That’s not an insignificant probability, so it’s not a ‘fluke’ that it happened.
I have also seen many people, both before and after the election, suggest that these models are not scientific. I disagree. What makes something a science — hell, what makes it good or bad science — is not its ability to uncover truths or predict outcomes. What makes it a science (and the difference between good and bad science) is the method used.
Now all of this said, there are good reasons that we should not bet on these models for presidential races. The biggest reason is that we simply do not have enough information.
Models are built using information about what has happened in the past. That is true for everything from baseball to weather. And we simply do not have enough real world data about presidential races because:
- The outcomes for presidential races are nearly dichotomous, at least the one measure that most people are interested in: who will win.
- We only have one trial/case every four years.
- The farther back in time we go, the less input info (e.g., polls and economic indicators). While some of this information can be filled in using various techniques such as bootstrapping, the less raw data we have, the less reliable the model will be.
These models will get better as time goes on, but with a race only every four years, it will be a very, very slow process of perfecting. And even then there will always be times in which the predicted underdog wins. Always. Because even a 1% chance is a chance.
But the absolute most important reason these models didn’t tell us that Trump would win? The polls were wrong. You’ve heard “Garbage In, Garbage Out”? That’s what happened here. Most of the polls were just plain wrong (and that’s a whole other topic of discussion I won’t get into here). If the numbers you put into a good model are not accurate, the output is going to be inaccurate. Silver accounted for some of the reliability of the polls, but only in relation to each other, and very few appear to have been accurate.
Finally, even the best models will always have a hard time accounting for the complexity of human behavior. I often felt the futility of it all as a researcher; as soon as we’ve figured out something, it changes. We do keep trying and we do make progress, but I think there will always be surprises. Humans will always be at least a little unpredictable.
This election was unprecedented in a myriad of ways. Very little about it was “normal”. It shouldn’t be surprising that the outcome wasn’t within the expected range.
I said it before the election and I will say it again now: I have more faith in Nate Silver’s model than in any other. The approach, the math has proven itself (no pun intended) in other applications in which there is a lot more data (e.g., baseball). But no model can overcome the problem of too little data.
One more thing before I go. You might notice that I have made no statement about the value of these predictions. I’ve heard and read a lot of comments, both before and after the election, suggesting that sites like 538 provide only useless information. On the one hand, I agree. I think that we love these sites because human beings just love information. It makes us feel more in control if we can predict outcomes, even if we are helpless to change them. On the other hand, they seem to be useful in sports and other applications, so I have to believe that they are useful in politics somehow. I don’t know, though; I’m not a politician. And thank goodness for that.
Seems a lot of people don’t make a distinction between the pollster and the metapollsters. As far as I’m aware, Silver doesn’t do any of his own polling. I’m also curious if the combination of obsessive poll watching and long early voting, worked to suppress turn out if people read, as Grimm wrote “She’s got this”.