Tombone's Computer Vision Blog

Tuesday, May 20, 2008

dude, where's my image?

Check out IM2GPS: estimating geographic information from a single image. This is CVPR2008 work done by James Hays and Alexei Efros. Some crazy titles that have been suggested to James can also be seen on his project site -- some of them are rather funny too!

Anyways, you can just read his abstract and browse his results if you are interested in the kind of computer vision research that uses millions of images. The basic idea is to predict the location of an image using only information embedded inside the image (and a training set of over 6 million geo-tagged Flickr images.)

Saturday, May 17, 2008

what is recognition?

I want to briefly discuss what the terms recognition, classification, and categorization mean to me and how they relate to the fields such as computer vision, machine learning, and psychology.

From my understanding, "category" == "class" and thus categorization and classification are the same thing! It is correct to say that when we categorize, we affix a label to some entity. But these labels do refer to categories, or classes. One can attribute the popularity of the term 'classification' to the field of machine learning. Categorization is a term that was more heavily used in psychology and only recently it is popping up in computer vision papers.

Because I see classification and categorization as the same thing, I don't agree that only one can be hierarchical.

Regarding the term recognition, the answer is a bit more complicated. In the field of computer vision, when one says that they are interested in recognition they are usually interested in recognizing novel instances from some predefined list of classes. To stress the interest in discrimination between a large number of object classes, vision researchers have recently begun using terms such as "a visual categorization system" or they talk about "object class recognition."

In all places that I have seen this term pop up, "identification" refers to specific instances. A face identification system might be designed to find faces of George Bush and might work on top of a face-class recognition system. The problem is that early work in computer vision was usually concerned with a fixed number of objects and the goal was to find those exact object instances inside an image -- and this was referred to as simply "recognition." Nowadays, we often use the term "recognition" to refer to category-level recognition and not specific objects.

In conclusion, recognition is a very general term that has been applied to both category-level recognition (dog vs. cat vs. car vs. person) and recognition of specific object instances (this particular blue ball vs. this particular face). To be more precise, one can use the terms "category-level recognition" and "identification."

This post has been written in response to Vidit Jain's blog post titled "Etymology of common learning-related words such as recognize."

Wednesday, April 23, 2008

newton's method fractal

Back in high school I was 'into' newton's method fractals. Some old images can be seen by clicking on the following image

When people make fractal videos (check them out on youtube), they are usually zooming into a fixed fractal. I have generated a fractal where the axis is fixed and the equation is changing. Check it out!

Tuesday, April 08, 2008

Recognition by Association via Learning Per-exemplar Distances

Tomasz Malisiewicz, Alexei A. Efros. Recognition by Association via Learning Per-exemplar Distances. In CVPR, June 2008.

Abstract:

We pose the recognition problem as data association. In this setting, a novel object is explained solely in terms of a small set of exemplar objects to which it is visually similar. Inspired by the work of Frome et al., we learn separate distance functions for each exemplar; however, our distances are interpretable on an absolute scale and can be thresholded to detect the presence of an object. Our exemplars are represented as image regions and the learned distances capture the relative importance of shape, color, texture, and position features for that region. We use the distance functions to detect and segment objects in novel images by associating the bottom-up segments obtained from multiple image segmentations with the exemplar regions. We evaluate the detection and segmentation performance of our algorithm on real-world outdoor scenes from the LabelMe dataset and also show some promising qualitative image parsing results.

http://www.cs.cmu.edu/~tmalisie/projects/cvpr08/

Thursday, April 03, 2008

Vocabulary Lesson: Transductive Learning

The goal of this blog post isn't to necessarily provide new insights into the relationship between Transductive Learning versus Semi-Supervised Learning. I will attempt to simply answer the question: "What is Transductive Learning?" To understand what Transductive means, we have to understand what induction (or Inductive Learning) means.

Induction, as opposed to deduction, is a form of reasoning that makes generalizations based on individual instances. It is important to note that induction isn't the kind of reasoning that predicate calculus or any other logic system was meant to handle. The conclusions produced from induction might have a high probability of being true but are never as certain as the inputs. The generalizations obtained from induction can be propagated onto newly observed inputs. One can think of a generalization obtained from induction as a function -- an abstract entity that can always map inputs to outputs.

The Marriam-Webster definition of Transduction states that it is: the transfer of genetic material from one microorganism to another by a viral agent (as a bacteriophage). While this definition has its roots in one particular branch of science, the crucial component of this definition is still present. Transduction is the transfer of something from entity A to entity B.

The Machine Learning definition of Transduction states that it is reasoning from observed inputs to specific test inputs. The key difference between induction and transduction is that induction refers to learning a function that can be applied to any novel inputs, while transduction is only concerned with transferring some property onto a specific set of test inputs.

Rather than paraphrasing Wikipedia, the interested reader should do some follow research of their own into the merits of Transductive Learning.

To conclude, a WILLOW Research Team member -- Olivier Duchenne -- gave a talk about their CVPR 2008 work on applying Transductive Learning to the problem of image segmentation. This was my first exposure to the concepts of transductive learning and it is always a good thing to learn new things.

Monday, March 31, 2008

keyword analysis

I've been looking at my logs, and here are the top things people searched for when they stumbled across my blog within the last month or so. It seems everybody wants to know about the ndseg fellowship -- especially when they will hear back. And I unfortunately can't provide any advice regarding the ndseg fellowship. Good luck to you, graduate students.

33.87% ndseg
4.84% ipod turn on
3.23% latent dirichlet allocation
3.23% bmvc 2007
3.23% ipod won't turn on
3.23% logistic normal latent dirichlet allocation
1.61% burton clash
1.61% ndseg fellowship 2007
1.61% park city utah blog
1.61% ipod turning on
1.61% 2008 ndseg winners
1.61% my dream car
1.61% ndseg thegradcafe
1.61% ndseg fellowship offers 2008
1.61% cvpr 2007
1.61% ndseg, heard back
1.61% ndseg anyone
1.61% computer vision grad school
1.61% ndseg forum
1.61% ipod display support url
1.61% nsf graduate fellowship hear back
1.61% my ipod wont turn on
1.61% latent dirichlet allocation gibbs sampling
1.61% jogging in pittsburgh, squirrel hill
1.61% ndseg 2008 winners
1.61% eye inverse optics
1.61% multiple segmentations
1.61% nsf graduate fellowship heard yet?
1.61% burton clash guitar
1.61% nsf graduate research fellowships
1.61% thegradcafe ndseg
1.61% nsf grf march
1.61% my first paper on cvpr conference
1.61% ndseg fellowship
1.61% burton clash 2005
1.61% ndseg anyone heard

Wednesday, March 19, 2008

Understanding the past

While a certain degree of advancement is possible when working in isolation on a scientific problem, interaction with the scientific community can drastically hasten one's progress. Most people have their own experiences with 'isolation' and 'interaction with a community' but I should explicitly delineate how I intend to use these terms. While 'interaction with a community' usually implies two-sided communication such as directly working together on a problem or simply discussing one's research with a group of other scientists, I want to consider a subtler form of interaction.

By reading about past accomplishments and former ideologies in a particular field, one is essentially communicating with the ideas of the past. While many scholarly articles -- in a field such as Computer Vision -- are mostly devoted to algorithmic details and experimental evaluations, it isn't too difficult to find manuscripts which reveal the philosophic underpinnings of the proposed research. It is even possible to find papers which are entirely devoted to understanding the philosophical motivations of a past generation of research.

A prime example of interaction with the past is the paper "Object Recognition in the Geometric Era: A Retrospective," by Joseph L. Mundy from Brown University. Such a compilation of ideas -- perhaps even a mini-summa -- is quite accessible to any researcher in the field of Computer Vision. Avoiding the specific details of any algorithm developed in the so-called Geometric Era of Computer Vision, this text is both entertaining and highly educational. By reading such texts one is effectively communicating (albeit one-way) with a larger scientific community of the past.

To conclude, I would like to point out that neither do I agree with some of the past paradigms of Computer Vision, nor am I a die-hard proponent of the modern statistical machine learning school of thought. However, to explore new territories what better way to scope the world around you than by standing on the shoulders of giants? We should be aware of what has been done in the past, and sometimes de-emphasize algorithmic minutiae in order to understand the philosophical motivations behind former paradigms.