Showing posts with label data-driven. Show all posts
Showing posts with label data-driven. Show all posts

Sunday, March 29, 2009

My 2nd Summer Internship in Google's Computer Vision Research Group

This summer I will be going for my 2nd summer internship at Google's Computer Vision Research Group in Mountain View, CA. My first real internship ever was last summer at Google -- I loved it.

There are many reasons for going back for the summer. Being in the research group and getting to address the same types of vision/recognition related problems as during my PhD is very important for me. It is not just a typical software engineering internship -- I get an better overall picture of how object recognition research can impact the world at a large scale, the Google-scale, before I finish my PhD and become set in my ways. Being in an environment where one can develop something super cool and weeks later millions of people see a difference in the way they interact with the internet (via Google's services of course) is also super exciting. Finally, the computing infrastructure that Google has set up for its researchers/engineers is unrivaled when it comes to large scale machine learning.

Many Google researchers (such as Fernando Periera) are big advocates of the data-driven mentality, where using massive amounts of data coupled with simple algorithms has more promise than complex algorithms with small amounts of training data. In earlier posts I already mentioned how my advisor at CMU is a big advocate of this approach in Computer Vision. This Unreasonable Effectiveness of Data is a powerful mentality yet difficult to embrace with the computational resources offered by one's computer science department. But this data-driven paradigm is not only viable at Google -- it is the essence of Google.

Friday, July 11, 2008

Learning Per-Exemplar Distance Functions == Learning Anisotropic Per-Exemplar Kernels for Non-Parametric Density Estimation

When estimating probability densities, one can be parametric or non-parametric. A parametric approach goes like this: a.) assume the density has a certain form, then b.) estimate parameters of that distribution. A Gaussian is a simple example. In the parametric case, having more data means that we will get "better" estimates of the parameters. Unfortunately, the parameters will only be "better" if the modeling distribution is close to "the truth."

Non-parametric approach make very weak assumptions about the underlying distribution -- however the number of parameters is a non-parametric density estimator scales with the number of data points. Generally estimation proceeds as follows: a.) store all the data, b.) use something like a Parzen density estimate where a tiny Gaussian kernel is placed around each data point.

The theoretical strength of the non-parametric approach is that *in theory* we can approximate any underlying distribution. In reality, we would need a crazy amount of data to approximate any underlying distribution -- the pragmatic answer to "why be non-parametric?" is that there is no real learning going on. In a non-parametric approach, we don't a parameter estimation stage. While estimating the parameters of a single gaussian is quite easy, consider a simple gaussian mixture model -- the parameter estimation (generally done via EM) is not so simple.

With the availability of inexpensive memory, data-driven approaches have become popular in computer vision. Does this mean that by using more data we can simply bypass parameter estimation?

An alternative is to combine the best of both worlds. Use lots of data, make very few assumptions about the underlying density, and still allow learning to improve density estimates. Usually a single isotropic kernel is used in non-parametric density estimation -- if we really have no knowledge this might be the only thing possible. But maybe something better than an isotropic kernel can be used, and perhaps we can use learning to find out the shape of the kernel.

The local per-exemplar distance function learning algorithm that I presented in my CVPR 2008 paper, can be thought of as such a anisotropic per-data-point kernel estimator.

I put some MATLAB distance function learning code up on the project web page. Check it out!