Thursday, June 17, 2010

more papers to check out from cvpr

Here are more CVPR 2010 papers which I either found interesting or plan on reading when I get back to PIT.  Enjoy!

Connecting Modalities: Semi-supervised Segmentation and Annotation of Images Using Unaligned Text Corpora  
Authors:  Richard Socher (Stanford University) , Li Fei-Fei (Stanford University) 

Cascade Object Detection with Deformable Part Models  
Authors:  Pedro Felzenszwalb (University of Chicago) , Ross Girshick (University ) , David McAllester (Toyota Technological Institute, Chicago) 

Beyond Active Noun Tagging: Modeling Contextual Interactions for Multi-Class Active Learning  
Authors:  Behjat Siddiquie (UMIACS) , Abhinav Gupta (Carnegie Mellon University)  

Tiered Scene Labeling with Dynamic Programming  
Authors:  Pedro Felzenszwalb (University of Chicago) , Olga Veksler (University of Western Ontario) 

Layered Object Detection for Multi-Class Segmentation  
Authors:  Yi Yang (UCI) , Sam Hallman () , Deva Ramanan () , Charless Fowlkes (UC Irvine) 

Efficiently Selecting Regions for Scene Understanding  
Authors:  M. Pawan Kumar (Stanford University) , Daphne Koller (Stanford)   

Image Webs: Computing and Exploiting Connectivity in Image Collections
Authors:  Kyle Heath (Stanford) , Natasha Gelfand (Nokia Research - Palo Alto, CA) , Maks Ovsjanikov (Stanford University) , Mridul Aanjaneya (Stanford University) , Leonidas Guibas (Stanford University)

Sunday, June 13, 2010

constrained parametric min-cuts: exciting segmentation for the sake of recognition

I would like to introduce two papers about Constrained Parametric Min-Cuts from C. Sminchisescu's group.  These papers are very relevant to my research direction (which lies at the intersection of segmentation and recognition).  Like my own work, these papers are about segmentation for recognition's sake.  The segmentation algorithm proposed in the paper is a sort of "segment sliding approach", where many binary graph-cuts optimization problems are solved for different Grab-Cut style initializations.  These segments are then scored using a learned scoring function -- think regression versus classification.  They show that these top segments are actually quite meaningful and correspond to object boundaries really well.  Finally a tractable number of top hypothesis (still overlapping at this stage), are piped into a recognition engine.

The idea that features derived from segments are better for recognition than features from the spatial support of a sliding rectangle resonates in all of my papers.  Regarding these CVPR2010 papers, I like their ideas of learning a category-free "segmentation-function" and the sort of multiple-segmentation version of this algorithm is very appealing.  If I remember correctly, the idea of learning a segmentation function comes to us from X. Ren, and the idea of using multiple segmentation comes from D. Hoiem. These papers are a cool new idea utilizing both insights.

J. Carreira and C. Sminchisescu. Constrained Parametric Min-Cuts for Automatic Object Segmentation. In CVPR 2010.

F. Li, J. Carreira, and C. Sminchisescu. Object Recognition as Ranking Holistic Figure-Ground Hypotheses. In CVPR 2010.


Spotlights for these papers are during these tracks at CVPR2010:
Object Recognition III: Similar Shapes
Segmentation and Grouping II: Semantic Segmentation tracks

Sven Dickinson at POCV 2010, ACVHL Tomorrow

This morning, Sven Dickinson gave a talk to start the POCV 2010 Workshop at CVPR2010.  For those of you who might not know, POCV stands for Perceptual Organization in Computer Vision.  While segmentation can be thought of as a perceptual grouping process, contiguous regions don't have to be the end product of a meaningful perceptual grouping process.  There are many popular and useful algorithms which group non-accidental contours yet come short of a full-blown image segmentation.

The title of Dickinson's talk was "The Role of Intermediate Shape Priors in Perceptual Grouping and Image Abstraction." In the beginning of his talk, Sven pointed out how perceptual organization was at its prime in the mid 90s and declined in the 2000s due to the popularity of machine learning and the "detection" task.  He believes that good perceptual grouping is what is going to make vision scale -- that is, without first squeezing out all that we can out of the bottom level we are doomed to fail.

Dickinsons showed some nice results from his most recent research efforts where objects are broken down into generic "parts" -- this reminded me of Biederman's geons, although Sven's fitting is done in the 2D image plane.  Sven emphasized that successful shape primitives must be category-independent if we are to have scalable recognition of thousands of visual concepts in images.  This is much different than the mainstream per-category object detection task which has been popularized by contests such as the PASCAL VOC.

While I personally believe that there is a good place for perceptual organization in vision, I wouldn't view it as the Holy Grail.  It is perhaps the Holy Bridge we must inevitably cross on the way to finding the Holy Grail.  I believe that for full-grown fully-functional members of society, our ability to effortlessly cope with the world is chiefly due to its simplicity and repeatability, and not due to some amazing internal perceptual organization algorithm.  Perhaps it is when we were children -- viewing the world through a psychedelic fog of innocence -- that perceptual grouping helped us cut up the world into meaningful entities.

A common theme in Sven's talk was the idea of Learning to Group in a category-independent way.  This means that all of the successes of Machine Learning aren't thrown out the door, and this appears to a quite different way of grouping than what has been done in the 1970s.

Tomorrow I will be at ACVHL Workshop "Advancing Computer Vision with Humans in the Loop".  I haven't personally "turked" yet, but I feel I will be jumping on the bandwagon soon.  Anyways, the keynote speakers should make for an awesome workshop.  They do not need introductions: David Forsyth, Aude Oliva, Fei-Fei Li, Antonio Torralba, and Serge Belongie -- all influential visionaries.

everything is misc -- torralba cvpr paper to check out

Weinberger's Everything is Miscellaneous is a delightful read -- I just finished it today while flying from PIT to SFO.  It was recommended to me by my PhD advisor, Alyosha, and now I can see why!  Many of the key motivations behind my current research on object representation deeply resonate in Weinberger's book.

Weinberger motivates Rosch's theory of categorization (the Prototype Model), and explains how it is a significant break from the thousand years of Aristotelian thought.  Aristotle gave us the notion of a category -- centered around the notion of a definition.  For Aristotle, every object can be stripped to its essential core, and place in its proper place in a God's-eye objective organization of the world.  It was Rosch who showed us that categories are much fuzzier and more hectic than suggested by the rigid Aristotelian system. Just like Copernicus single-handedly stopped the Sun and set the Earth in motion, Rosch disintegrated our neatly organized world-view and demonstrated how an individual's path through life shapes h/er concepts.

I think it is fair to say that my own ideas as well as Weinberger's aren't so much an extension of the Roschian mode of thought, but also a significant break from the entire category-based way of thinking.  Given that Rosch studied Wittgenstein as a student, I'm surprised her stance wasn't more extreme, more along the anti-category line of thought.  I don't want to undermine her contribution to psychology and computer science in any way, and I want to be clear that she should only be lauded for her remarkable research.  Perhaps Wittgenstein was as extreme and iconoclastic as I like my philosophers to be, but Rosch provided us with a computational theory and not just a philosophical lecture.

From my limited expertise in theories of categorization in the field of Psychology, whether it is Prototype Models or the more recent data-driven Exemplar Models, these theories are still theories of categories.  Whether the similarity computations are between prototypes and stimuli, or between exemplars and stimuli, the output of a categorization model is still a category.  Weinberger is all about modern data-driven notions of knowledge organization, in a way that breaks free from the imprisoning notion of a category.  Knowledge is power, so why imprison it in rigid modules called categories?  Below is a toy visualization of a web of concepts, as imagined by me.  This is very much the web-based view of the world.  Wikipedia is a bunch of pages and links.

Artistic rendition of a "web of concepts"

I found it valuable to think of the Visual Memex, the model I'm developing in my thesis research, as an anti-categorization model of knowledge -- a vast network of object-object relationships.  The idea of using little concrete bits of information to create a rich non-parametric web is the recurring theme in Weinberger's book.  In my case, the problem of extracting primitives from images, and all of the problem in dealing with real-world images are around to plague me, and the Visual Memex must rely on many Computer Vision techniques -- such things are not discussed in Weinberger's book.  The "perception" or "segmentation" component of the Visual Memex is not trivial -- where linking words on the web is much easier.

CVPR paper to look out for

However, the category-based view is all around us.  I expect most of this year's CVPR papers to fit in this category-based view of the world. One paper, co-authored by the great Torralba, looks relevant to my interests.  It is yet another triumph for the category-based mentality in computer vision.  In fact, one of the figures in the paper demonstrates the category-based view of the world very well.  Unlike the memex, the organization is explicit in the following figure:

Myung Jin Choi, Joseph Lim, Antonio Torralba, and Alan S. Willsky. CVPR 2010.

Friday, June 11, 2010

blogging from CVPR2010

It might not be one of those glamorous Apple events during which Steve Jobs introduces a new shiny gadget for the masses to desire, but plenty of exciting stuff happens at CVPR which instills desire into our souls, that is, the souls of computer vision scientists. Wouldn't you rather see the Great Torralba give a talk over some big company's chief executive officer? For those of you who do not know what CVPR is -- it is one of the big Computer Vision conferences during which we (the geeks, scientists, engineers, developers, hackers, and mathematicians) exchange ideas regarding our most recent research in the world of computer vision.

I am flying to SF tomorrow morning, and will be blogging about some of the cool papers I encounter at this year's CVPR. I do not have a paper at this year's conference so I'm in full assimilate-knowledge mode where I hope to absorb thousands of ideas related to my field. I already mentioned some of Kristen Grauman's cool segmentation papers, but expect to see in the next several blog posts many additional discussions for what I think are "exciting" papers. I am already getting excited and have plenty of papers to read during my flight, in addition to finishing Everything is Miscellaneous. I will be blogging from CVPR, like an Apple fanboy would at one of those Apple WWDC events -- but I will share math, theory, algorithms, and the like.

As always, the list of CVPR 2010 papers on the web can be found here.