Tuesday, November 18, 2008

Algorithmic Simplicity + Data > Algorithmic Complexity

Decades ago (during the era of Rodney Brooks, Takeo Kanade, and other such great computational thinkers) computer vision researchers were manually designing complex AI programs for image analysis. Back then, if the algorithm was able to work on a single real image it was publishable. The parmaters of some of these complicated models were often tuned by hand -- and that was okay -- there simply wasn't enough image data to fit these models from examples.

We are now living in a Machine Learning generation where hand tweaked parameters are looked down upon and if you want to publish an object recognition paper you'll need to test your algorithm on a standard dataset containing hundreds of images spanning many different types of objects. There is still a lot of excitement about Machine Learning in the air and new approaches are constantly being introduced as the new 'state-of-the-art' on canonical datasets. The problem with this mentality is that researchers are introducing a lot of complicated machinery and it is often unclear whether these new techniques will stand the test of time.

Peter Norvig -- now at Google -- advocates an alternative view. Rather than designing more advanced machine to work with a measly 20,000 or so training images for an object recognition task -- we shouldn't be too eager to make conclusions when dealing with such paltry training sets. In a recent Norvig video lecture I watched he showed some interesting results where the algorithms that obtained the best performance on a small dataset no longer did the best when the size of the training set was increased by an order of magnitude. In some cases, when fixing the test set, the simplest algorithms provided with an order of magnitude more training data outperformed the most advanced 'state-of-the-art.' Also, the mediocre algorithms in the small training size regime often outperformed their more complicated counterparts once more data was utilized.

The next generation of researchers will inevitably be using much more training data than we are at the moment, so if we want our scientific contributions to pass the test of time, we have to focus on designing simple yet principled algorithms. Focus on simplicity. Consider a particular recognition task, namely car recognition. Without any training data we are back in the 1960/1970s generation where we have to hard-code rules about what it means to be a car in order for an algorithm to work on a novel image. With a small amount of labeled training data, we can now learn the parameters of a general parts-based car detector -- we can even learn the appearance of such parts. But what can we do with millions of images of cars? Do we even need much more than a large scale nearest neighbor lookup?

As Rodney Brooks once said, "The world is its own best representation," and perhaps we should follow Google's mentaly and simply equip our ideas with more, more, more training data.


  1. Anonymous10:01 AM

    Hi Thomasz,

    nice article pointing out that our 20k or so data sets might not be a good representation of reality.

    How do you think we should handle this issue? Using 200k pictures would mean, I could only evaluate 3 feature combinations in the time I could evaluate 30 with 20k.


  2. With a large dataset the similarity measure (or feature combination) becomes less and less important. For small datasets the learning method and feature representation are of utmost importance; however, there is a danger of overfitting. Nonparametric approaches can outperform their parametric equivalents quite often and I think a very promising research direction is combining learning with data-driven approaches to reduce test time complexity.

    We should always compare the cost of developing and training a new method designed to work on small training sets versus being nonparametric and gathering more data.