After-election exchange with John Shumaker

Date: Wed, 03 Nov 2004 12:19:43 -0600

From: John Shumaker

Subject: Re: Lies, Damn Lies, and Statistics

Hi Sam,

Per your request, I am writing back to you on Wednesday. I am attempting to provide you with constructive criticism that your political bias has indeed (100% probability) contaminated your otherwise solid scientific judgment and objective analysis. Did you see the movie Scarface? If so recall the scene when Al Pacino shot dead, the terrorist who was about to blow up a car with 2 children it it ("Look at you now, you stupid "#$%&"!). That is worse than having "egg on your face". Very appropriate I think. :-)

Going back two weeks ago, your web site was probably the most convincing and objective web site on the proper way to analyze the Electoral College and president elections. Then all of a sudden during the last two weeks, your HOPE and WISHES for a Kerry victory clouded your otherwise solid/scientific/unbiased analysis. Example:

Tuesday, November 2, 3:00PM: Early exit polls on Drudge (above the title). If you plot them on my brochure graph, their median is around +5% bias. This may regress as Republicans get to the polls, but I think this is a telling sign. Not sure if I will be blogging the returns - just a heads-up for you. I wonder if my prediction was too cautious! That's what I get for paying too much attention to my email.

You can almost see all your emotion coming out here, trying to "will" Kerry on to victory via any statistical method possible from your bag of tricks. Add in your +2% for Kerry via the incumbent OPINION (not fact) and your +2% for Kerry via the turnout OPINION (not fact). It is clear your analysis was in fact contaminated/biased. Another point worth mentioning is your hedging of 99.9% Kerry win probability (Monday) down to a 98% win probability (yesterday) despite almost every political expert predicting a very close election. This has put you in the same category as another left-wing liberal analyst (http://www.geocities.com/electionmodel/index.htm) from TruthIsAll who also predicted a Kerry victory with a 99.8% probability.

Professors at major IVY league colleges like Princeton have (perhaps unfairly) a double standard of utmost integrity and OBJECTIVITY in making assumptions, performing analysis, and in presenting their results to the community. The (obviously unfair) perception from the conservative right is the following. Why should I continue to support Princeton with financial contributions, when their extremely biased left-wing professors continue to teach their liberal political agenda (via their research and curriculum) to the young (and easily impressionable) college students?

However, your bias was only documented during the last 2 weeks. I for one know that for several weeks before 10/15, your assumptions, analysis, and presentation was in fact completely objective. In my opinion, your web site was the best one on the internet for predicting the 2004 president election (up until about 10/15).

Finally, I have three suggestions for you to improve your objectivity for the 2008 president elections:

1. When attempting to analyze assumptions that are based on OPINIONS not fact (ie. the +2% incumbent rule and the +2% turnout rule)...... Document those assumptions, analysis, and results on a SEPARATE web page, clearly separate and distinct from your main web page.

2. Have a small refresher lesson on basic statistics (again have this on a separate web page with a link from your main web page). This lesson should explain the standard normal bell shape curve, the area under that bell curve, Z scores/metrics, the standard error of the mean, ect.... The bottom line objective is that your users will then be able to convert estimated poll results (and MoE) to individual state probabilities. For example, this lesson would clearly explain the math in detail on how you computed the probability that Kerry would win Ohio at 30% (or 83% with your assumptions added in)....

3. Allow each user to customize their own assumptions and poll estimates and enter in their own data into the MATLAB program interactively and online. Not sure if you have enough computer CPU and/or memory resources for this flexibility??? Alternatively you could develop a program that can be downloaded from your site that would allow the user's to analyze their assumptions offline.

Thanks for listening to my 2 cents,

John

Date: Wed, 3 Nov 2004 14:53:12 -0500 (EST)

From: Sam Wang

To: John Shumaker

Subject: Re: Lies, Damn Lies, and Statistics

Dear John,

Thanks for the long and thoughtful note. I like your quantitative/ technical suggestions, but right now I'd like to address your other sentiments.

I agree with you that my analysis was purely numbers-based until the end. I am still sure about the overall validity of the undecideds/incumbent rule. This year seems to have been an exception, but I did point out that there was uncertainty in it in the first place - witness the +/-2% that I placed on it. The >99% figures I cited were nominal, and I didn't explain those very clearly - failure as a teacher. Of course, my turnout estimate was also wrong.

Your paragraph that professors must be objective is very thought- provoking. As a lab scientist I am continually testing ideas, some cherished. At the same time I have to be prepared when I am wrong, and I am often wrong. Normally these errors are corrected by factual evidence, at which point I revise my thinking. In this case I am in a public sphere, and my error is visible to many readers, including you.

However, I need to point out something else. The reason that you, an astute reader, could detect this bias is that I documented everything. I provided data, code, explanations, and drew a bright line between data and assumptions. You did not find that on other sites, which is presumably why you read my site. In my view, total openness is an essential part of good critical reasoning.

In my work and with students here, I am similarly open. I have found that the most thoughful ones respond as you have - if a problem exists, they call me on it. This is what you are doing. If you have doubts about elite universities, keep in mind that what you are asking for involves a dispassion that I don't have about anything, including this problem.

In short, I admit to personal bias, and admit that my hope induced me to make a false assumption. However, that bias stemmed from interest in what would otherwise be a fairly dry estimation problem. I think nearly all your sources of information have some bias or another. Unlike many of those other sources, I put my biases where you can see them.

Yours sincerely,

Sam Wang

P.S. Attached is a note I just sent to a reporter's follow-up query. Of possible interest.

>>>>>>>>>>>>>>>

Date: Wed, 3 Nov 2004 14:13:56 -0500 (EST)

From: Samuel Wang

Subject: Re: mop-up

Charles, in case we do not connect:

I think the purely statistical aspects of the analysis did extremely well. The electoral outcome looks like it will be close to the decided-voter outcome predicted by the polls. Victory margins are quantitatively close to the pre-election polls: out of 23 battlegrounds, the direction of the outcome was predicted in 22 (the exception was Wisconsin, where the polling margin was 0.4% for Bush and the actual margin was about 0.4% for Kerry). Quantitatively, 12 victory margins were within one standard error and 17 were within the 95% confidence interval. Not perfect, but not bad.

The most significant errors had to do the net effect of other factors not encompassed by polls. To make a final prediction, I used previous patterns of uncommitted voters breaking for the challenger as a guide, but this break either did not occur or was cancelled by other factors. My assumption of high turnout was flat out wrong! In the end, the likely-voter models of pollsters were not too far off.

There has been talk of other factors, but a parsimonious explanation may be that the net effect of all other factors was zero. This isn't always true - in past years the outcome seems to have not matched final polls. There seems to be some mystery offset that varies a bit. On the other hand, this year we had more data - maybe it's just a question of having enough data and the right answer falls out.

One advantage of rigorous statistical modeling is that you can see a clear separation between factual information and assumptions of less certainty. In this case my baseline calculation was quite accurate, but the intangibles were wrong. As I said, in previous years at least one of the assumptions would have worked. What happened this year is a question for the political and policy people - in the end it goes to show that I am at my best with the numbers!

All the best,

Sam

> Sam,

>

> Left you a cell message to this effect. Am doing a brief modeling mop-up

> item. Would love to chat if you are awake. (I'm barely conscious...)

> Best sometime before 4 p.m....

>

> Thanks kindly.