Friday, October 31, 2008

Bouncy Bouncy

As a physicist, I can't resist the analytical approach of the work of baseball analyst Nate Silver at fivethirtyeight.com. Below is a graph of data from his projections from mid July through Thursday, October 30, five days before the vote. Although this might pass as his "prediction" of the outcome, my interest today is in the dynamics of this function. We can all look at the last few days of polling and the final - actual - poll about a week from now.


The curves indicate the projected electoral vote count from his analysis (more on that below the fold) with a few key dates flagged on the graph. The red arrow near the middle is the day Sarah Palin was announced as McCain's VP choice. The green brackets on either side span the Democratic and Republican conventions, respectively, while the green band with red lines indicates when the four debates took place. The VP debate (2nd of the four) is shown with a longer line.

You can clearly see the "bounce" from some of those events, but it seems to be delayed by a week or two. I don't know how much of this is the time it takes to execute a poll, how much is due to the damping factors Nate has in his model, and how much is due to the time it takes human beings to process information.

By the way, the now-infamous Katie Couric interviews started airing on September 24, just a few days before the first Presidential debate, so it really can't be separated from those other events.

What interests me is that large delay between what have been identified so far as 'critical' events and the response of the dynamical system. An engineer would probably say there is a lot of 'lash' in the system. If so, is there really any justification for all of the money spent on insta-polling right after a debate or a convention? Doesn't look like it to me.

The "bounce" that Obama got from his convention started before the convention (all that lead-up talk?) and continued on for at least two weeks, right through the Republican convention and beyond. The short-lived bounce that McCain got from his VP pick and his convention (one week going up and then one week going down) was also delayed by at least one and maybe two weeks. (For reference, the tick marks on my X axis are 14 days apart.)

You can certainly see that the electorate has settled in after the last debate, with only small statistical fluctuations over the last month. Of course, these are people, not a mechanical system, so only time will tell what they will actually do and what changes might occur between now an election day. However, it is also true that a lot of those people have already voted.

Nate Silver's analysis

Read his FAQ for details.

I've been following Nate's work with great interest. His Monte Carlo technique for simulating the results of an election are common in the world of physics, where they are used to simulate physical systems ranging from quark-gluon interactions to atoms in a lattice (such as a silicon chip) to the flow of radiation during the explosion of a nuclear weapon. His maps (see below for a sample) are based on the analysis of individual states, but his overall prediction of the electoral vote count summarized above comes from a simulation. He produces 10,000 "elections" by a Monte Carlo process, picking a possible result for each state based on the odds that it will vote for a particular candidate (as determined by his analysis of all polls in that state, their interaction with national polls, and a projection toward election day). This incorporates the inherent "fuzziness" of the poll numbers, which each come with a substantial sampling uncertainty.

This is a very powerful technique, since it replaces individual interpretation of those uncertainties on a state-by-state basis with the blind sledgehammer of massive statistical sampling. This method has proved extremely effective in experimental physics, particularly experimental high energy physics, for predicting what reactions will look like in a new detector and determining if the needle of a new particle can be found within the haystack of normal events.

The one caveat, of course, is GIGO. Nate is not a pollster, he is an analyst of polls. His results are only as good as the polls themselves, despite his attempts to have his model learn about and account for the individual characteristics of those polls. But how can he (or even a pollster) account for my decision this evening to blow off someone from the U of Iowa polling about my attitudes about the election because I didn't feel like giving them a quarter hour of my time? Must drive them nuts.

Techy detail update:

My quick guess is that his half life is responsible for some of the 'lash' in the response of his model to the driving force of various campaign effects, but that the rest is due to processing time by the electorate. Even with the trend-line adjustment of a given pollster's data, his model requires some some time to reflect sudden changes in the views of voters. Although this might hide last-minute changes, all of the raw poll data are shown for any given state along with his model's analysis of those data, so you can draw your own conclusions about, say, Pennsylvania.

Finally, for reference, the projected electoral map from Thursday is shown below.


This projection, and all of the data in the graph at the top, comes before his final fine-tuning of the model to include a 2-week half life.

No comments: