Saturday, January 26, 2013

Michael Barone - a biased estimator (in a statistical sense)

A statistical estimator is a rule or formula - which you can apply to a sample of data and using the estimator infer  the properties of the whole population from which the sample was drawn.  For example imagine I have a huge box filled with an infinite number of red and black ping pong balls.  I want to know know what percentage of all the balls in the box are red.  One estimator might be I draw $N$ balls out of the box, count the number of these that are red and divide that number by $N$.  In this case $\frac{NumberOfRed}{N}$ would be my estimator of the percent of all the balls in the box that are red.  A second estimator of the percent of the balls in the box that is red might be to draw $N$ balls out of the box, count the number that are red, subtract 2, and then divide by $N$.  So that estimator would be $\frac{NumberOfRed-2}{N}$.  Perhaps that latter estimator is not a good estimator but it is still an estimator.

Typically the quality of an estimator is judged by two criteria (1) consistency and (2) bias.  An estimator is said to be consistent if as the size of my sample $N$ increases the estimator will converge to the true population value,  So if $N=1$ my first estimator $\frac{NumberOfRed}{N}$ may be very inaccurate.  With $N=30$ my first estimator should be fairly representative of the percentage of red balls in the box.  With $N=1000$ my first estimator should be extremely close to the true percentage of red balls in the box.  Interestingly my second estimator $\frac{NumberOfRed-2}{N}$ is also consistent - in the sense that as $N$ increases we expect it will do a better and better job of inferring the true percentage of red balls in the box.  Now the second estimator will always infer slightly low (because we subtracted 2) but it will still get more an more accurate as $N$ increases.

Bias refers to whether an estimator systematically mis-infers.  In the example above the first estimator $\frac{NumberOfRed}{N}$ is unbiased in the sense that we do not expect it to over or under estimate the percentage of balls that are red.  Now it is possible that the first estimator produces a poor result (say we set $N=1$ and our first draw is black) but the first estimator has no tendency to systematically over or underestimate the percentage of red balls in the box.  In contrast we would expect the second estimator $\frac{NumberOfRed-2}{N}$ to slightly underestimate the percentage of balls in the box that are red (because we subtracted 2 in the numerator).  Now as $N$ gets large this underestimation will have less and less impact but for any $N$ we still expect to slightly underestimate the percentage of red balls in the box.

It is possible that you may have two different estimators one which converges faster (ie is more consistent) but it is biased - whereas the other estimator may be unbiased but converge very slowly.  Choosing which estimator is better under which conditions is what statistics is about.

Which brings us to this.

Washington Examiner:  Barone: Going out on a limb: Romney beats Obama, handily November 2, 2012

I always thought that Michael Barone was a bit of a Republican shill but for some reason he gets respect among some serious people.  He is the editor of the annual Almanac of American Politics so you would think that he has some reputation to lose by making extremely bad predictions.  Here are his November 2nd predictions followed by the actual results.
  • Indiana - Romney.  Romney 54-44.
  • Florida - Romney.  Obama 50-49.
  • Ohio - Romney.  Obama 50-47
  • Virginia - Romney.  Obama 51-47.
  •  Colorado - Romney.  Obama 51-46.
  • Iowa - Romney.  Obama 52-46.
  • Minnesota - Obama.  Obama 53-45
  • New Hampshire - Romney.  Obama 52-46.
  • Pennsylvania - Romney.  Obama 52-46.
  • Nevada - Obama.  Obama 52-46.
  • Wisconsin - Romney.  Obama 53-46.
  • Oregon  / New Mexico / New Jersey - Obama.  Obama 54-42 / 53-43 / 58-41.
  • Michigan - Obama.  Obama 54-45
So it looks like the final result had to be Obama 53-46 before Barone would call a state for Obama.  Next election if Barone makes percentage predictions subtract 3-4 points from the Republican and add it to the Democrat.  Which leads me to ponder...  If Michael Barone had an infinitely large sample would he have still predicted a win for Romney?

No comments: