Boosting Groundhogs

It’s that time of year again! Everyone’s favorite holiday, Groundhog Day! Where stats and probability nerds and the media unite to remind everyone that well actually poor Punxsutawney Phil is usually wrong. Luckily for us, we know about the innards of AdaBoost. If we can rely on Phil to be a weak classifier (insert hand-waving around actual number of samples and Phil being statistically different from random in performance), then we can put him in our ensemble.

At the very least, It’s an excuse to look at one of my favorite, subtle little equations in machine learning:

\theta\textsubscript{m}=\frac{1}{2}\ln\frac{1 - \epsilon\textsubscript{m}}{ \epsilon\textsubscript{m}}

Whoa, I can’t read that. Let’s render and embiggen:

Epsilon, for the m-th classifier, is that model’s error rate. Our m right now is Phil. Since Phil’s error rate is around 0.6, this means that when he goes into our ensemble, a prediction of an early spring from him becomes a little drop in the “early spring seems less likely” bucket.

Why yes clever reader that is a Google calculator plot

To see the end result, we’d need a real sum of weighted model predictions, but you can see that this effectively flips the sign of Phil’s vote. So if Phil says “yes, 100% spring!” then the AdaBoost weighting formula will use the historical error rate and say: “Ok, Phil is wrong slightly more often than not, so we’ll put a small “no” in our pile of likely outcomes to match his confident “yes.” Given a sufficient mass of weak (and uncorrelated) learners like Phil, we can actually get a fairly good model out–much like it can be an OK investment strategy to have a tiny fraction of your overall holdings in gold, even though a lump of gold will never invent a jetpack. This is the magic of ensemble learning.

If you want to read more about AdaBoost, the wikipedia article is preferred by n out of m groundhogs.

Happy Groundhog Day!