Fizz Buzz in TensorFlow: Remix

Joel Grus posted a pretty funny bit of satire yesterday, using TensorFlow to solve the classic Fizz Buzz interview problem.

Now, the point of the exercise is obviously not to build an accurate model. But he did end the post with this:

I guess maybe I should have used a deeper network.

Which got me thinking. So naturally I cloned the repo and attempted to get the model’s accuracy as high as possible! And in the end, I did come up with a learning strategy that when trained on numbers other than 1 to 100, was able to make out of fold/test predictions on the 1 to 100 range in the high nineties or in some cases at 100% accuracy:

['1' '2' 'fizz' '4' 'buzz' 'fizz' '7' '8' 'fizz' 'buzz' '11' 'fizz' '13'
 '14' 'fizzbuzz' '16' '17' 'fizz' '19' 'buzz' 'fizz' '22' '23' 'fizz'
 'buzz' '26' 'fizz' '28' '29' 'fizzbuzz' '31' '32' 'fizz' '34' 'buzz'
 'fizz' '37' '38' 'fizz' 'buzz' '41' 'fizz' '43' '44' 'fizzbuzz' '46' '47'
 'fizz' '49' 'buzz' 'fizz' '52' '53' 'fizz' 'buzz' '56' 'fizz' '58' '59'
 'fizzbuzz' '61' '62' 'fizz' '64' 'buzz' 'fizz' '67' '68' 'fizz' 'buzz'
 '71' 'fizz' '73' '74' 'fizzbuzz' '76' '77' 'fizz' '79' 'buzz' 'fizz' '82'
 '83' 'fizz' 'buzz' '86' 'fizz' '88' '89' 'fizzbuzz' '91' '92' 'fizz' '94'
 'buzz' 'fizz' '97' '98' 'fizz' 'buzz']

So what did I change? Well you can have a look at the full diff if you’d like, but here’s a short summary:

  • I noticed that the model spent a lot of time stuck in local minima, so I used RMSProp with momentum.

  • I noticed that even as it converged on a better solution it jerked around quite a bit. Again, RMSProp to the rescue with learning rate decay.

  • I added dropout and reduced the minibatch size further to reduce overfitting.

  • I added a hidden layer and more neurons. Why? In short:

The more neurons, the more better!

In the end, I only needed 125 epochs to get a “perfect” solution. If you’re interested, my little fork is available here.