Tombone's Computer Vision Blog: 2012

Tuesday, July 10, 2012

Machine Learning Doesn't Matter?

Bagpipes and International Conference of Machine Learning (ICML) in Edinburgh

Two weeks ago, I attended the ICML 2012 Conference in Edinburgh, UK. First of all, Edinburgh is a great place for a conference! The scenery is marvelous, the weather is comfortable, and most notably, the sound of bagpipes adds an inimitable charm to the city. I attended the conference because I was invited to give an invited applications talk during the invited talks session. In case you’re wondering, I did not have a plenary session (a plenary session is a session attended by all conference members) which is preserved for titans such as Yann Lecun, David MacKay, and Andrew Ng. My presentation was on the last day of ICML and was titled “Exemplar-SVMs for Visual Object Detection, Label Transfer and Image Retrieval,” during which I gave an overview of my ICCV 2011 paper on visual object detection as well as the SIGGRAPH ASIA 2011 paper on cross-domain image retrieval. As part of the invited talk, we submitted a 2 page extended abstract which summarizes some key ideas behind the exemplar-svm project: you can check out the abstract as well as the presentation slides online. I believe the talk was recorded, so I will post the video link once it becomes available. It was a great opportunity to convey some of my ideas to a non-vision audience. I think I got a handful of new people excited about single example SVMs (i.e., Exemplar-SVMs)!

Tomasz Malisiewicz, Abhinav Shrivastava, Abhinav Gupta, and Alexei A. Efros. Exemplar-SVMs for Visual Object Detection, Label Transfer and Image Retrieval. To be presented as an invited applications talk at ICML, 2012. PDF | Talk Slides

Getting Ready for Edinburgh with David Hume

To get ready for my first visit to Edinburgh (pronounced Ed-in-bur-ah which does not rhyme with Pittsburgh), I bought a Kindle Touch and proceeded to read David Hume’s An Enquiry Concerning Human Understanding. David Hume is one of the great British Empiricists (together with John Locke and George Berkeley) who stood by the empiricist motto: impressions are the source of all ideas. Empiricists can be contrasted to rationalists who appeal to reason as the source of knowledge. [Of course, I am neither an empiricist nor a rationalist. Such polarizing extremes are a thing of the past. I am a pragmatists and my world-view combines elements from many different philosophies.] I choose Hume’s treatise because he is the one whom Kant credits for awakening him from his dogmatic slumber. I found Hume’s words rejuvenating, full of gedankenexperiments which show the limits of radical empiricism, and most notably is free on the Kindle store! In your attempts to build intelligent machines, maybe you will also words of inspiration in the classics. It was a great book to get into the Edinburgh mindset (although the ICML crowd is probably more familiar with a different University of Edinburgh figure, namely Reverend Bayes).

Impressions of ICML

I would first like to first say that the ICML website is well-organized and serves as a great tool during the conference! Good job ICML! There is a great mobile version of the ICML website which is excellent for visiting on your iPhone when figuring out which talk to go to next. The ICML website also provides a forum for discussing papers and every paper gets a presentation and a poster. The discussion boards do not seem heavily utilized but it would be great to use a moderator-style system to have the actual after-presentation questions come from this forum. I’m sure something like this will actually arise in the upcoming years. ICML is much smaller than CVPR (compare ~700 attendees with ~2000 attendees) which makes for a much more intimate environment. I was amazed by the number of people proving bounds and doing “theoretical” non-applied machine learning. Its like some people really don't care about anything other than analysis. However, this is not my style, and I personally prefer to build “real” systems and combine insights from disparate disciplines such as mathematics, cognitive science, philosophy, physics, and computer science. There is a bit of ICML and Machine Learning conferences which I think of as nothing more than mathturbation. I understand there's merit to doing analysis of this sort -- somebody’s gotta do it, but if you’re gonna do it, please at least try to understand the implications of the real-world problem your dataset and task are trying to address.

Machine Learning doesn’t Matter?

The highlight of the conference by far was Kiri Wagstaff’s plenary talk “Machine Learning that Matters.” Kiri gave an enchanting 30 minute presentation regarding what is rotten in the state of Edinburgh (aka what is wrong with the style of machine learning conferences). Her words were gentle, yet harsh, while simultaneously enlightening, yet morbid. She showed us, machine learning researchers, just how useless much of machine learning research is today. Let’s not forget that Machine Learning is one of the most revolutionary ideas if the modern computer science classroom. Trying to get a PhD in Computer Science and avoiding Machine Learning is like avoiding Calculus while getting and undergraduate degree in Engineering. There is nothing wrong with machine learning as a discipline, but there is something wrong with researchers making the field overly academic. Making a discipline overly academic means creating a self-contained, overly-mathematical, self-citing, and jargon-filled discipline which doesn’t care about world-impact but only cares to propagate a small community’s citation count. Note that much of these arguments also apply to the CVPR world. But do not take my words for granted, read Kiri’s treatise yourself. Abstract Below:

"Machine Learning that Matters" Abstract: Much of current machine learning (ML) research has lost its connection to problems of import to the larger world of science and society. From this perspective, there exist glaring limitations in the data sets we investigate, the metrics we employ for evaluation, and the degree to which results are communicated back to their originating domains. What changes are needed to how we conduct research to increase the impact that ML has? We present six Impact Challenges to explicitly focus the field’s energy and attention, and we discuss existing obstacles that must be addressed. We aim to inspire ongoing discussion and focus on ML that matters.

Kiri Wagstaff, "Machine Learning that Matters," ICML 2012.

PDF Link: http://icml.cc/2012/papers/298.pdf

If you have something to say in response to Kiri's treatise, check out her Machine Learning Impact Forum on http://mlimpact.com/.

Thursday, June 21, 2012

Predicting events in videos, before they happen. CVPR 2012 Best Paper

Intelligence is all about making inferences given observations, but somewhere in the history of Computer Vision, we (as a community) have put too much emphasis on classification tasks. What many researchers in the field (unfortunately this includes myself) focus on is extracting semantic meaning from images, image collections, and videos. Whether the output is a scene category label, an object identity and location, or an action category, the way we proceed is relatively straightforward:

Extract some measurements from the image (we call them "features", and SIFT and HOG are two very popular such features)
Feed those features into a machine learning algorithm which predicts the category these features belong to. Some popular choices of algorithms are Neural Networks, SVMs, decision trees, boosted decision stumps, etc.
Evaluate our features on a standard dataset (such as Caltech-256, PASCAL VOC, ImageNet, LabelMe, etc)
Publish (or as is commonly know in academic circles: publish-or-perish)

While only in the last 5 years has action recognition become popular, it still adheres to the generic machine vision pipeline. But let's consider a scenario where adhering to this template can hav disastrous consequences. Let's ask ourselves the following question:

Q: Why did the robot cross the road?

Image courtesy of napkinville.com

A: The robot didn't cross the road -- he was obliterated by a car. This is because in order to make decisions in the world you can't just wait until all observations happened. To build a robot that can cross the road, you need to be able to predict things before they happen! (Alternate answer: The robot died because he wasn't using Minh's early-event detection framework, the topic of today's blog post.)

This year's Best Student Paper winner at CVPR has given us a flavor of something more, something beyond the traditional action recognition pipeline, aka "early event detection." Simply put, the goal is to detect an action before it completes. Minh's research is rather exciting, which opens up room for a new paradigm in recognition. If we want intelligent machines roaming the world around us (and every CMU Robotics PhD student knows that this is really what vision is all about), then recognition after an action has happened will not enable our robots to do much beyond passive observation. Prediction (and not classification) is the killer app of computer vision because classification assumes you are given the data and prediction assumes there is an intent to act on and interpret the future.

While Minh's work focused on simpler actions such as facial recognition, gesture recognition, and human activity recognition, I believe these ideas will help make machines more intelligent and more suitable for performing actions in the real world.

Disgust detection example from CVPR 2012 paper

To give the vision hackers a few more details, this framework uses Structural SVMs (NOTE: trending topic at CVPR) and is able to estimate the probability of an action happening before it actually finishes. This is something which we, humans, seem to do all the time but has been somehow neglected by machine vision researchers.

Max-Margin Early Event Detectors.
Hoai, Minh & De la Torre, Fernando
CVPR 2012

Abstract:
The need for early detection of temporal events from sequential data arises in a wide spectrum of applications ranging from human-robot interaction to video security. While temporal event detection has been extensively studied, early detection is a relatively unexplored problem. This paper proposes a maximum-margin framework for training temporal event detectors to recognize partial events, enabling early detection. Our method is based on Structured Output SVM, but extends it to accommodate sequential data. Experiments on datasets of varying complexity, for detecting facial expressions, hand gestures, and human activities, demonstrate the benefits of our approach. To the best of our knowledge, this is the first paper in the literature of computer vision that proposes a learning formulation for early event detection.

Early Event Detector Project Page (code available on website)

Minh gave an excellent, enthusiastic, and entertaining presentation during day 3 of CVPR 2012 and was definitely one of the highlights of that day. He received his PhD from CMU's Robotics Institute (like me, yipee!) and is currently a Postdoctoral research scholar in Andrew Zissermann's group in Oxford. Let's all congratulate Minh for all his hard work.

CVPR 2012 Day 2: optimize, optimize, optimize

Due to popular request, here is my overview of some of the coolest stuff from Day 2 of CVPR 2012 in Providence, RI. While the Lobster dinner was the highlight for many of us, there were also some serious learning/optimization-based papers presented during Day 2 worthy of sharing. Here are some of the papers which left me with a very positive impression.

Dennis Strelow of Google Research in Mountain View presented a general framework for Wiberg minimization. This is a strategy for minimizing objective functions with multiple variables -- objectives which are typically tackled in an EM-style fashion. The idea is to express one of the variables as a linear function of the other variable, effectively making the problem depend on only one set of variables. The technique is quite general and has been shown to produce state-of-the-art results on a bundle adjustment problem. I know Dennis from my second internship at Google where we worked on some sparse-coding problems. If you perform lots of matrix decomposition problems, check out his paper!

Dennis Strelow
General and Nested Wiberg Minimization
CVPR 2012

Another cool paper which is all about learning is Hossein Mobahi's algorithm for optimizing objectives by smoothing them to avoiding getting stuck in local minima. This paper is not about blurry images, but about applying Gaussians to objective functions. In fact, for the problem of image alignment, Hossein provides closed form versions of image operators. Now when you apply these operators to images, you efficiently smooth the underlying cross-correlation alignment objective. You decrease the blur, while following the optimum path, and get much nicer answers that doing naive image alignment.

Hossein Mobahi, C. Lawrence Zitnick, Yi Ma
Seeing through the Blur
CVPR 2012

Ira Kemelmacher-Shlizerman, of Photobios fame, showed a really cool algorithm for computing optical flow between two different faces based on learning a subspace (using a large database of faces). The ideas is quite simple and allows for flowing between two very different faces where the underlying operation produces a sequence of intermediate faces in an interpolation-like manner. She shared this video with us during her presentation, but it is on Youtube, so now you can enjoy it for yourself.

Ira Kemelmacher-Shlizerman, Steven M. Seitz
Collection Flow
CVPR 2012

Now talk about cool ideas! Pyry, of CMU fame, presented a recommendation engine for classifiers. The idea is to take techniques from collaborative filtering (think Netflix!) and apply then to the classifier selection problem. Pyry has been working on action recognition and the ideas presented in this work are not only quite general, but have are quite intuitive and likely to benefit anybody working with large collections of classifiers.

Pyry Matikainen, Rahul Sukthankar, Martial Hebert
Model Recommendation for Action Recognition
CVPR 2012

And finally, a super-easy algorithm presented for metric learning by Martin Köstinger had me intrigued! This a Mahalanobis distance metric learning paper which uses equivalence relationships. This means that you are given pairs of similar items and pairs of dissimilar items. The underlying algorithm is really not much more than fitting two covariance matrices, one to the positive equivalence relations, and another to the non-equivalence relations. They have lots of code online, and if you don't believe that such a simple algorithm can beat LMNN (Large-Margin Nearest Neighbor from Killian Weinberger), then get their code and hack away!

Martin Köstinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, Horst Bischof
Large Scale Metric Learning from Equivalence Constraints
CVPR 2012

CVPR 2012 gave us many very math-oriented papers, and while I cannot list of all of them, I hope you found my short list useful.

Tuesday, June 19, 2012

CVPR 2012 Day 1: Accidental Cameras, Large Jigsaws, and Cosegmentation

Today ended the first day of CVPR 2012 in Providence, RI. And here's a quick recap:

On the administrative end of things, Deva Ramanan received an award for his contributions to the field as a new young CVPR researcher. This is a new nomination-based award so be sure to vote for your favorite vision scientists next year! Deva's work has truly influenced the field and he is well-known for being a co-author of the Felzenszwalb et al. DPM object detector, but since then he has pushed his ideas on part-based models to the next level. Congratulations Deva, you are the type of researcher we should all strive to be.
Secondly, it looks like CVPR 2015 will be in Boston.
Here are some noteworthy papers from the oral sessions of Day 1:

During the first oral session, Antonio Torralba gave an intriguing talk where he showed the world how accidental anti-pinhole and pin-speck cameras are "all around us." In his presentation, he showed how a person walking in front of a window can be used to image the world outside of a window. Additionally he showed a variant of image-based Van-Eck phreaking, where his technique could be used to view what is on a person's computer screen without having to look at the screen directly.

Accidental pinhole and pinspeck cameras: revealing the scene outside the picture
Antonio Torralba and William T. Freeman
CVPR 2012

Andrew Gallagher gave a really great presentation on using computer vision to solve jigsaw puzzles, where not only are the pieces jumbled, but their orientation is unknown. His algorithm was used to solve really really large puzzles, ones which are much larger than could be tackled by a human.

Jigsaw Puzzles with Pieces of Unknown Orientation
Andrew Gallagher
CVPR 2012

Gunhee Kim presented his newest work on co-segmentation. He has been working on this for quite some time and if you are interested in segmentation in image collections, you should definitely check it out.

On Multiple Foreground Cosegmentation
Gunhee Kim and Eric P. Xing
CVPR 2012

Sunday, June 17, 2012

Workshop on Egocentric Vision @ CVPR 2012

Today (Sunday 6/17/2012) is the second day of CVPR 2012 workshops and I'll be going to the Egocentric Vision workshop. The workshop kicks off at 8:50am (come earlier for some CVPR breakfast) and will start with a keynote talk by Takeo Kanade. There will also be a talk by Hartmut Neven of Neven-vision and now a part of Google. Also during the poser session, my fellow colleague, Abhinav Shrivastava, will be presenting his work on applying ExemplarSVMs to detection from a first-person point of view --- yet another super-cool application of ExemplarSVMs.

Object detection from first person's view using exemplar SVMs

There are lots of other plenty of cool talks during this workshop including: action recognition from a first-person point of view, experience classification, as well as a study of the obtrusiveness of wearable computing platforms by some fellow MIT vision hackers.

The accuracy-obtrusiveness tradeoff for wearable vision platforms

You might be thinking, "What is egocentric vision?" but nothing explains it better than the following video from Google about its super exciting research project codename Project Glass. I'm really hoping Hartmut talks about this...

If you're looking for me, you know where I'll be tomorrow. Happy computing.

Wednesday, May 23, 2012

Why your vision lab needs a reading group

I have a certain attitude when it comes to computer vision research -- don't do it in isolation. Reading vision papers on your own is not enough. Learning how your peers analyze computer vision ideas will only strengthen your own understanding of the field and help you become a more critical thinker. And that is why at places like CMU and MIT we have computer vision reading groups. The computer vision reading group at CMU (also known as MISC-read to the CMU vision hackers) has a long tradition, and Martial Hebert has made sure it is a strong part of the CMU vision culture. Others ex-CMU hackers such as Sanjiv Kumar have continued the vision reading group tradition onto places such as Google Research in NY (correct me if this is no longer the case). I have continued the reading group tradition to MIT (where I'm currently a postdoc) because I was surprised there wasn't one already! In reality, we spend so much time talking about papers in an informal setting, that I felt it was a shame to not do so in a more organized fashion.

Image courtesy of Platypus

My personal philosophy is that as a vision researcher, the way towards the goal of creating novel long-lasting ideas is learning how others think about the field. There's a lot of value in being able to analyze, criticize, and re-synthesize other researchers' ideas. Believe me when I say that a lot of new vision papers come out of top tier vision conferences every year. You should be reading them! But not just reading, also criticizing them among your peers. Because once you learn to criticize others' ideas, you will become better at promulgating your own. Do not equate criticism with nasty words for the sake of being nasty -- good criticism stems from a keen understanding of what must be done in science to convince a broad audience of your ideas.

In case you want to start your own computer vision research group, I've collected some tips, tricks, and advice:

1. You don't need faculty. If you can't find a season vision veteran to help you organize the event, do not worry. You just need 3+ people interested in vision and the motivation to maintain weekly meetings. Who cares if you don't understand every detail of every paper! Nobody besides the authors will ever understand every detail.

2. Be fearless. Ask dumb questions. Alyosha Efros taught me that if you're reading a paper or listening to a presentation, if you don't understand something then there's a good chance you're not the only one in the audience with the same questions. Sometimes younger PhD students are afraid of "asking a dumb question" in front of audience. But if you love knowledge, then it is your duty to ask. Silence will not get you far. Be bold, be curious, and grow wise.

3. Choose your own papers to present. Do not present papers that others want you to present -- that is better left for a seminar course led by a faculty member. In a reading group it is very important that you care about the problems you will be discussing with your peers. If you keep up with this trend then when it comes to "paper writing time" you should be up to date on many relevant papers in your field and you will know about your other lab mates' research interests.

4. It is better to show a paper PDF up on a projector than cancel a meeting. Even if everybody is busy, and the presenter didn't have time to create slides, it is important to keep the momentum going.

5. After a major conference, have all of the people who attended the conference present their "top K paper." The week after CVPR it will be valuable to have such a massive vision brain dump onto your peers because it is unlikely that everybody got to attend.

6. Book a room every week and try to have the meeting at the same time and place. Have either the presenter or the reading group organizer send out an announcement with the paper they will be presenting ahead of time. At MIT we share a google doc with the information about interesting papers and the upcoming presenter usually chooses the paper one week in advance so that the following week's presenter doesn't choose the same paper. If somebody already presents your paper, don't do it a second time! Choose another paper. cvpapers.com is a great resource to find upcoming papers.

At CMU, there is a long rotating schedule which includes every vision student and faculty member. Once it is your time to present, you can only get off the hook if you swap your slot with somebody else. Being on a schedule months in advance means you'll have lots of time to prepare your slides. At MIT, we are currently following the object recognition / scene understanding / object detection theme where we (Prof. Torralba, his students, his postdocs, his visiting students, etc) choose a paper highly relevant to our interests. By keeping such a focus, we can really jump into the relevant details without having to explain fundamental concepts such as SVMs, features, etc. However, at CMU the reading group is much broader because on the queue are students/profs interested in all aspects of vision and related fields such as graphics, illumination, geometry, learning, etc.

Wednesday, April 18, 2012

One Part Basis to Rule them All: Steerable Part Models

Last week, some of us vision hackers at MIT started an Object Recognition Reading Group. The group is currently in stealth-mode, but our goal is to analyze, criticize, and re-synthesize ideas from the object detection/recognition community. To inaugurate the group, I covered Hamed Pirsiavash's Steerable Part Models paper from the upcoming CVPR 2012 conference. As background reading, I had to go over the mathematical basics of learning with tensors (i.e., multidimensional arrays) which were outlined in their earlier NIPS 2009 paper, Bilinear Classifiers for Visual Recognition. After reading up on their work, I have a better grasp of what the trace operator actually does. It is nothing more than a Hermitian inner product defined between the space of linear operators from C^N to C^M (see post here for geometric interpretations of the trace).

Hamed Pirsiavash, Deva Ramanan, "Steerable part models", CVPR 2012

"Our representation can be seen as an approach to sharing parts."
-- H. Pirisiavash and D. Ramanan

The idea behind this paper is relatively simple -- instead of learning category-specific part-models, learn a part-basis from which all category-specific part models come from. Consider the different parts learned from a deformable part model (see Felzenszwalb's DPM page for more info about DPMs) and their depiction below. If you take a close look you see that the parts are quite general, and it makes sense to assume that there is a finite basis from which these parts come from.

Parts from a Part-model

The model learns a steerable basis by factoring the matrix of all part models into the product of two low rank matrices, and because the basis is shared across categories, this performs both dimensionality reduction (like to help prevent over-fitting as well as speed up the final detectors) and sharing (likely to boost performance).

The learned steerable basis

While the objective function is not convex, it can be tackled via a simple alternating optimization algorithm where the resulting sub-objectives are convex and can be optimized using off-the-shelf Linear SVM solvers. They call this property bi-convexity, and it doesn't guarantee finding the global optimum, just makes using standard tools easy.

While the results on PASCAL VOC2007, do not show an improvement in performance (VOC2007 is not a very good dataset for sharing as there are only a few category combinations which should in theory benefit significantly from sharing (e.g., bicycle and motorbike)), they show a significant computational speed up. Below is a picture of the part-based car model from Felzenszwalb et al, as well as the one from their steerable basis approach. Note that the HOG visualizations look very similar.

In conclusion, this is one paper worthy of checking out if you are serious about object recognition research. The simplicity of the approach is a strong point, and if you are a HOG-hacker (like many of us these days) then you will be able to understand the paper without a problem.

Tuesday, April 17, 2012

Using Panoramas for Better Scene Understanding

There's a lot more to automated object interpretation than merely predicting the correct category label. If we want machines to be able to one day interact with objects in the physical world, then predicting additional properties of objects such as their attributes, segmentations, and poses is of utmost importance. This has been one of the key motivations in my own research behind exemplar-based models of object recognition.

The same argument holds for scenes. If we want to build machines which understand environments around them, then they will have to do much more than predict some sloppy "scene category." Consider what happens when a machine automatically analyzes a picture and says that it from the "theatre" category. Well, the picture could be of the stage, the emergency exit, or just about anything else within a theater -- in each of these cases, the "theatre" category would be deemed correct, but would fall short of explaining the content of the image. Most scene understanding papers either focus getting the scene category right, or strive to obtain a pixel-wise semantic segmentation map. However, there's more to scene categories than meets the eye.

Well, there is an interesting paper which will be presented this summer at the CVPR2012 Conference in Rhode Island which tries to bring the concept of "pose" into scene understanding. Pose-estimation has already been well established in the object recognition literature, but this is one of the first serious attempts to bring this new way of thinking into scene understanding.

J. Xiao, K. A. Ehinger, A. Oliva and A. Torralba.

Recognizing Scene Viewpoint using Panoramic Place Representation.
Proceedings of 25th IEEE Conference on Computer Vision and Pattern Recognition, 2012.

The SUN360 panorama project page also has links to code, etc.

The basic representation unit of places in their paper is that of a panorama. If you've ever taken a vision course, then you probably stitched some of your own. Below are some examples of cool looking panoramas from their online gallery. A panorama roughly covers the space of all images you could take while centered within a place.

Car interior panoramas from SUN360 page

Building interior panoramas from SUN360 page

What the proposed algorithm accomplishes is twofold. First it acts like an ordinary scene categorization system, but in addition to producing a meaningful semantic label, it also predicts the likely view within a place. This is very much like predicting that there is a car in an image, and then providing an estimate of the car's orientation. Below are some pictures of inputs (left column), a compass-like visualization which shows the orientation of the picture (with respect to a cylindrical panorama), as well as a depiction of the likely image content to fall outside of the image boundary. The middle column shows per-place mean panoramas (in the style of TorralbaArt), as well as the input image aligned with the mean panorama.

I think panoramas are a very natural representation for places, perhaps not as rich as a full 3D reconstruction of places, but definitely much richer than static photos. If we want to build better image understanding systems, then we should seriously start looking at using richer sources of information as compared to static images. There is only so much you can do with static images and MTurk, thus videos, 3D models, panoramas, etc are likely to be big players in the upcoming years.

Thursday, March 08, 2012

"I shot the cat with my proton gun."

I often listen to lectures and audiobooks when I drive more than 2 hours because I don't always have the privilege of enjoying a good conversation with a passenger. Recently I was listening to some philosophy of science podcasts on my iPhone while driving from Boston to New York when the following sentence popped into my head:

"I shot the cat with my proton gun."

I had just listened to three separate Podcasts (one about Kant, one about Wittgenstein and one about Popper) when the sentence came to my mind. What is so interesting about this sentence is that while it is effortless to grasp, it uses two different types of concepts in a single sentence, a "proton gun" and a "cat." It is a perfectly normal sentence, and the above illustration describes the sentence fairly well (photo credits to http://afashionloaf.blogspot.com/2010/03/cat-nap-mares.html for the kitty, and http://www.colemanzone.com/ for the proton gun).

Cat == an "everyday" empirical concept

"Cat" is an everyday "empirical" concept, a concept with which most people have first hand experience (i.e., empirical knowledge). It is commonly believed that such everyday concepts are acquired by children at a young age -- it is an exemple of a basic level concept which people like Immanuel Kant and Ludwig Wittgenstein discuss at great length. We do not need a theory of cats for the idea of a cat to stick.

Image from shadowpaw99

Proton Gun == a "scientific" theoretical concept

On the other extreme is the "proton gun." It is an example of a theoretical concept -- a type of concept which rests upon classroom (i.e., "scientific") knowledge. The idea of a proton gun is akin to the idea of Pluto, an esophagus or cancer -- we do not directly observe such entities, we learn about them from books and by seeing illustrations such as the one below. Such theoretical constructs are the the entities which Karl Popper and the Logical Positivists would often discuss.

While many of us have never seen a proton (nor a proton gun), it is a perfectly valid concept to invoke in my sentence. If you have a scientific background, then you have probably seen so many artistic renditions of protons (see Figure below) and spent so many endless nights studying for chemistry and physics exams, that the word proton conjures a mental image. It is hard for me to thing of entities which trigger mental imagery as non-empirical.

How do we learn such concepts? The proton gun comes from scientific education! The cat comes from experience! But since the origins of the concept "proton" and the concept "cat" are so disjoint, our (human) mind/brain must be more-amazing-than-previously-thought because we have no problem mixing such concepts in a single clause. It does not feel like these two different types of concepts are stored in different parts of the brain.

The idea which I would like you, the reader, to entertain over the next minute or so is the following:

Perhaps the line between ordinary "empirical" concepts and complex "theoretical" concepts is an imaginary boundary -- a boundary which has done more harm than good.

One useful thing I learned from Philosophy of Science, is that it is worthwhile to doubt the existence of theoretical entities. Not for iconoclastic ideals, but for the advancement of science! Descartes' hyperbolic doubt is not dead. Another useful thing to keep in mind is Wittgenstein's Philosophical Investigations and his account of the acquisition of knowledge. Wittgenstein argued elegantly that "everyday" concepts are far from "easy-to-define." (see his family resemblances argument and the argument on defining a "game.") Kant, with his transcendental aesthetic, has taught me to question a hardcore empiricist account of knowledge.

So then, as good cognitive scientists, researchers, and pioneers in artificial intelligence, we must also doubt the rigidity of those everyday concepts which appear to us so ordinary. If we want to build intelligent machines, then we must be ready to break down own understanding of reality, and not be afraid to questions things which appear unquestionable.

In conclusion, if you find popular culture reference more palatable than my philosophical pseudo-science mumbo-jumbo, then let me leave you with two inspirational quotes. First, let's not forget Pink Floyd's lyrics which argued against the rigidity of formal education: "We don't need no education, We don't need no thought control." And the second, a misunderstood, yet witty aphorism which comes to us from Dr. Timothy Leary reminds us that there is a time for education and there is a time for reflection. In his own words: "Turn on, tune in, drop out."

Friday, January 27, 2012

drawing sexy graphs in matlab

Everybody who loves computer sciences loves graphs. But the fat 'n juicy graphs, the ones with complex structure you just gotta visualize. To enjoy these beautiful data structures, the hackers at AT&T gave us, the world, Graphviz as a powerful tool for visualizing complex graphs in two dimensions. I do a lot of stuff in Matlab, so I've put my simple graphviz matlab wrappers up on Github so everybody can enjoy them. I do a lot of stuff with graphs...

My repository, which I'm already using as a submodule in many of my projects, can be found here:
https://github.com/quantombone/graphviz_matlab_magic

Here is a matlab script (included as a Github gist), which should be ran in an empty directory, and it will download a nice mat file plus clone my repo and show the following nice graph. I perform two graphviz passes where the first one is used to read the graphviz coordinates (from the sfdp embedding) and use Matlab's jet colormap to color the edges based on distances in this space. In other words, nearby nodes which are connected will be connected by red (hot) edges and faraway nodes will be connected by blue (cold) edges.

The matrix visualized comes from an electromagnetic model, the details can be found here: http://www.cise.ufl.edu/research/sparse/matrices/Bai/qc324.html

The original picture generated by Yifan Hu is here for comparison:

Enjoy
--Tomasz

Tuesday, January 10, 2012

100,000+ page views on my computer vision blog

I like high-risk / high-reward activity. While some say that this is my temperament (perhaps a vestige of youth?) I simply say: "that's how I roll." Maybe I was too young when I read Kuhn's Structure of Scientific Revolutions, or maybe I was born with iconoclastic ideals, but I earnestly believe that life is too short to always do what you've been told. One of my favorite maxims is the following: "The only limits we have are the ones we impose upon ourselves."

I took a gamble when I started this blog, blurring the line between all things related to computer vision, philosophy, artificial intelligence, machine learning, and other fun things which constitute my intellectual life. During my PhD I was even discouraged from blogging, because "my superiors" incessantly reminded me that "you get famous by writing CVPR papers" and not by wasting time maintaining a "cute" blog. Today I'd like to argue that my adventure in blogging has not been a failure at all!

I had multiple reasons for wanting to blog, several of which I list below:

I wanted to practice my writing, and what better way to practice writing than by writing!
I wanted an outlet to discuss certain ideas which I find invaluable in my pursuit of building intelligence, but which aren't necessarily publishable. On my blog I am the sole contributor, the sole editor. If you don't like what I have to say, start your own blog. I don't need anonymous reviews, the CVPR submission process stresses me out enough for one lifetime.
I wanted a medium to advertise my own work as well other works which I find important for graduate students in Computer Vision to know about.
I wanted to expose the field of Computer Vision to a broader audience and hopefully get others excited about this amazing research field.

Today I'm glad to announce that according to statcounter, my computer vision blog has reached over 100,000 views. In an absolute sense, this really is nothing to be excited about. By since my CMU homepage has approximately 30,000 views, this means that my blog is 3x as popular as my academic homepage! Next goal: 1,000,000 page views!

I actually meet more people that know me through my blog than through my research papers, even though I put in 100x the effort in doing the research behind those papers. I don't plan on taking up blogging full time anytime soon, but it feels good to know that my blogging adventure has paid off.

Here are some of the top keywords which have been used to find my blog:

computer vision blog

tombone

cmu computer vision blog

newton fractal matlab

tombone's blog

Here are some of my most popular blog posts of all time:

Computer Vision is Artificial Intelligence

The vision hacker culture at Google

graph visualizations as sexy as fractals

Simple Newton's Method Fractal code in MATLAB

Kinect Object Datasets: Berkeley's B3DO, UW's RGB-D, and NYU's Depth Dataset

I encourage anybody who reads my blog to shoot me a quick "yo what's up!" at a local conference or where ever else our paths might cross. I also encourage everybody to suggest the types of things they would like to read about on my blog.