Tombone's Computer Vision Blog: October 2005

Sunday, October 30, 2005

80's night good vibe + The New Deal in NYC?

I attended 80's night at The Upstage in Oakland this past Thursday. It was a lot of fun!

Even though I had to miss my Thursday night lift, the dance floor gave my legs a good workout. Dancing is a lot of fun; it reminded me of how much I love The New Deal shows. By the way, The New Deal is playing in NYC this new years and I might go. Below is the info

BB Kings ++Late Night New Year's Eve THE NEW DEAL @ BB Kings
Doors 1am Show 1:30am
$25.50 advance $30.00 day of show
This is an All Ages Show

Advance tix available through ticketmaster.com, ticketmaster phone: 212-307-7171, bb king box office located at BB King's 237 west 42nd street NYC - Box Office Hours - 11am to 11pm daily.

Monday, October 24, 2005

Semantic Segmentation, Omnipotency Problem, Ill-Posing, and Futility

Disclaimer: I found this text file on my computer and I am deciding to post it as is. I probably never finished writing it, and meant to edit things later. But we all know how these things go (meaning that it would have never been finished).

So here it is:

First, let us discuss the notion of an object detector, then think about how the problem of image understanding is generally posed and finally look at a simple gedanken experiment.

I. The life of an object detector
An object detector's role is to localize an object in an image. Generally a large training set is obtained which consists of manually labelled images. These images contain an instance (or several instances) of the object of interest and additionally the location of the object of interest is also known. The training set is usually large enough to capture view of the object of interest under different orientations, scales, and lighting conditions. The training set has to generally be much larger if we want to detect a member of an object class such as a car versus the recognition of a particular object such as my car. Object detection refers to finding a member of a class while object recognition refers to finding that particular instance.

Why object detection? Generally one is interested in a particular vision task such as creating an autonomous vehicle that can drive on highways. In this case, the vision researcher can reason about the types of objects that are generally seen in the particular application and train object detection modules for each object type. A car detector, a road sign detector, a tree detector, a bridge detector, a road lane detector, a person detector, a grass detector, a cloud detector, a gas station detector, a police car detector, and a sun detector could be used in combination to create a pretty decent scene understanding system. This system would look at at an image and segment it by assigning each pixel in the image as belonging to one of those classes or the 'unclassified' category. This 'unclassified category' is also known as the background category, or the clutter category; it represents the 'uninteresting' stuff.

II. What comes after seeing
Awesome! I just concocted a recipe for creating an autonomous vehicle!

Unfortunately, there are several problems with this approach. First of all, segmenting an image into the categories spelled out above falls short of having a car known what it should do to navigate this visual world. I didn't talk about the segmentation of a image captured by a camera on top of a vehicle relates to navigation. Apparently, this recipe is only good for asking queries of he type, 'Where is object O in the image I?' Unfortunately, the only thing that is really interesting is the question, 'What do I do once i see image I?'

III. Semantic Segmentation
The problem of computer vision is traditionally posed as something like this: Given an image I, segment it into semantic categories and give me the 3D position and orientation of the objects found in the image with respect to the camera center. In addition we could also want information about lighting in the scene so that we can recover the true appearance of the objects we found in the image. I want to call this mapping of image into a set of locations/orientations of objects, object labels, and lighting conditions a semantic segmentation.

It seems that if we could obtain this semantic segmentation, we could then learn a mapping from a semantic segmentation to an action in order to have a real vision system.

--Omnipotency problem of vision
The problem with this approach is what I will refer to as the omnipotency problem of vision. This problem is that a vision system is required to know everything about the visual world in order to know what action to take. I honestly don't believe that we need to all this information about the world to know what to do. A vision system should only care about extracting the minimal amount of information from an image in order to know what to do next within some small error threshold.

-- Scaling problems with unbounded growth of object categories
Another problem with this the semantic segmentation approach is that it doesn't scale when you start looking at vision systems that can perform a large number of vision tasks. The number of object categories is extremely large!

-- Ill-defined object categories
The big problem that I want to talk about is the problem of defining the objects that our system would detect. Do we have a separate object for 'baby' and 'old-man' or treat them as just large geometric deformation of the concept 'human'? When you take one tire off of a car, it is still a car; but when you start taking more and more pieces off of it when does it cease to be a 'car'? Should a tree be considered one object or should we treat it as an assembly of {leaves,branches,trunk}? Clearly the notion of 'object' is ill-defined. I think the biggest problem with contemporary vision is that not enough people really see how grand of a problem it is. Computer Vision isn't only concerned with hacking out a driving system; the deep questions that arise are some of the deepest philosophical questions that have been around since the start of man's inquiry.

--Naive desires
Will we ever solve the problem of computer vision? When somebody thinks of this problem in a naive 'hack-out-a-system' kind of way, then one would also think 'why not?' However, when one sees beyond the systems, beyond the geometry, and beyond the statistical modeling then one can see that the problem of computer vision isn't really about computers at all! How do we (humans) live so effortlessly in this complex world around us? This question has many nuances, and every generation of great thinkers has asked a slightly different question. Of course this makes perfect sense, since each generation has been thinking within the paradigm of their time and it is probably not a good idea to even think of this as a variation of the same question when we consider the incommensurability of ideas across paradigm shifts.

--The big problem: The bold answer
Will we ever solve the problem of computer vision? Of course not. If you (the reader) still think that you can solve this problem, then you need to get out more.

maybe the world isn't made up of objects

Disclaimer: I found this text file on my computer and I don't remember when I wrote this, but probably sometime in September.

The classic problem of computer vision is centered around the sub-problem of object recognition?
However, one key observation about vision is that work in this field has produced any stunning results in over 30 years of work. Perhaps people have been attacking this problem from the wrong direction.

When somebody is trying to do good object-recognition they would like to extract the locations of objects in an image. Recently, researchers have begun using machine learning techniques to learn the space of all object appearances; however, maybe the answer to this problem isn't in extracting objects from images.

When a human sees an image they see they can easily decompose the image into its constituent parts, namely the objects present in the scene. But is this statement really true about the nature of the human visual system? Language and society have introduced 'objects' into our understanding of the world, but perhaps we can still do vision without focusing on objects.

If we treat the image as a holistic entity, perhaps once we have seen enough images so that 'object' segmentation naturally happens. If a human was never given a language-based context for describing what they see, would they see objects?

Are objects an artifact of the fact that human experience is in most-part dominated by the language we know. Clearly language is concerned with objects.

We should perhaps focus on image understanding as a memory-based approach. Thus to understand a scene we might not really need to segment it into object categories. Perhaps we only need to associate a given scene with some other scene we have encountered.

Wednesday, October 19, 2005

Reproducing Kernels

I'm starting to get a handle on this RKHS business. In fact, I can't wait to learn about SVMs in my ML class!

Latent Dirichlet Allocation

I have begun work on a MATLAB implementation of Latent Dirichlet Allocation ala Blei(who is currently at CMU). Jon and I will be looking at classification/clustering/searching of a data set made up of 20,000 newsgroup articles.

LDA does not model a document as belonging to one topic. Instead, each document from the corpus is modeled as a finite mixture of topics. In this generative probabilistic model, parameter estimation and inference are the two main algorithms that we will implement.

I am interested in LDA for several reasons. Primarily I want experience implementing machine learning algorithms; however, LDA is actually of interest to vision researchers. I think this project will teach me useful things about Machine Learning and I look forward to having a MATLAB implementation of LDA that I can later adopt for vision tasks.

Sunday, October 16, 2005

NSF application + Gamma Function Awesomeness

I already mailed in a transcript request for RPI to send out my undergraduate transcript to three fellowship agencies. I'm applying to NSF Graduate Research Fellowship Program, DoD National Defense Science and Engineering Graduate Fellowship, and DOE Computational Science Graduate Fellowship. I usually refer to them as NSF,NDSEG, and CSGF. I received Honorable Mention from NSF last year, so I think I have a decent chance of winning it this time around.

On another note, I've been summing up infintie series like there's no tomorrow. I've grown rather fond of the gamma function which allows me to deal with integrals of polynomials over the positive real line when they are weighted by the e^(-x). In fact, I have experimented with freely adding n!/n! terms into my infinite series and re-writing the numerator as a gamma integral. Then I would reverse summation and integration, rearrange, and play with the newly created beast.

I've also been reading up on RKHS From Poggio's course website. I have made a comprehensive list of machine learning links on my CMU webpage.

Friday, October 14, 2005

Machine Teaching instead of Machine Learning

Being a disciple of the Machine Learning paradigm, I am not-so-proud to state that what is called 'Machine Learning' these days is actually more like Machine Teaching. Being a student who has decided to dedicate most of his time to his academic endeavours, I can honestly that it is I who is 'Learning.' Each time I learn something new I program it into the little box next to me. Believe me, this box is not learning anything. Until I feed it training data it doesn't really have any motivation to do anything on its own.

The problem is that these classifers and clustering units are very dependent on humans giving them data. It take a lot of intelligence (on the part of the human) to make a decently smart (on the part of the machine) algorithm, but it is far from intelligent.

Wednesday, October 12, 2005

Discovering Laguerre Polynomials and Shortcomings of Darwinian Evolution

I was recently playing with the Gamma Function, and realized how it could be used to help evaluate a certain type of weighted inner product between polynomials defined over [0,inf). After a few hours of playing, I finally googled this type of expansion and was non-surprised to find these polynomials called Laguerre Polynomials. These orthonormal polynomials could be used to define a Laguerre-Fourier expansion of functions with a wider support than the other orthonormal polynomials such as the Legendre polynomials.

I recently had a interesting conversation with my friend Mark about the 'missing link' in evolutionary theory. We both agreed that one problem with Darwinian Evolution is that it requires some special mechanism for humans to possess that would explain their superiority in the modern world. We pretty much agreed on the fact that an advanced theory of mating partner selection can be ruled out on the basis of empirical evidence. Although a theory of intelligent partner selection could explain man's dominance in the modern world, the empirical evidence shows that modern man's selection algorithm is rather arbitrary. It brings back the 'big' question, "Why are we so advanced?"

Sunday, October 09, 2005

zeta strikes again!

I found my Riemann Zeta book Friday night. I say 'found' because after a few weeks of tripping on the Zeta, I try to 'hide' this book so that I forget about Zeta for a little bit.

Now I've been summing up series like crazy for a day or two. A recurring these in my own little math sessions is the hyperbolic(not in sense of 'hyperbola') use of infinite series and orthogonal function expansions.

Now I want Hardy's Divergent Series book!

Saturday, October 08, 2005

the problem of neural nets: the problem of people

The basic problem with artificial neural networks is very similar to the problem with people in the year 2005. A neural network is very sensitive to the order of training inputs it is presented. In fact, it is possible for a neural network to be presented training inputs in such an order that it forgets 'old input-output' pairs. This type of behavior is known as overfitting.

When did I start to overfit?

Sometime back in 10th grade of high school I was presented with a lot of input relating to my current scholastic endeavours. This overstimulation of the quantitative part of my brain has left me in a rather peculiar situation.

Why didn't anybody present me with a test set?

The problem of overstimulation of the quantitative part of the brain is that the distribution of analytic reasoning tasks is not representative of the distribution of tasks in the real-world. My schooling is analogous to the training of a neural network, where the purpose of the GPA is well aligned with the notion of a performance. However, as in the case of overfitting, the notion of a GPA fails to generalize to non-scholarly tasks and thus the performance of a pupil in a school systems falls short of predicting performance on common real-world tasks.

In the year 2005, many people overfit some aspect of life. I prefer to use the term 'overfit' as opposed to overspecialize because it draws upon the context of regression (fitting). When each person is treated independently of others, overfitting can only be seen as a bad thing. When does one person really want a neural network to overfit? However, in the context of society's machine, overfits are the ones who expand the horizon of modern life. It is as if overfitting was brought about by evolution. Evolution probably favored species who steadily produced a small, yet non-infinitesimal, percent of overfitters. Since people can be seen as the cream of the crop with respect to many evolutionary metrics, it is of no surprise that we are overfitting so well.

Overfitters unite!

I think people can learn about themselves by studying machine learning. In a good article titled "The Parallel Distributed Processing Approach to Semantic Cognition" by James L. McClelland and Timothy T. Rogers, degradation of semantic knowledge in humans (a condition known as sementic dementia) was compared to the behavior of a neural network. Traditionally scientists would study humans in an attempt to develop better computational techniques for tasks such as machine learning and machine vision, but it is important to study computational techniques because they can tell us something about ourselves.