Tombone's Computer Vision Blog: paradigm

Showing posts with label paradigm. Show all posts

Friday, July 03, 2009

Linguistic Idealism

I have been an anti-realist since a freshman in college. Due to my lack of philosophical vocabulary I might have even called myself an idealist back then. However, looking back I think it would have been much better to use the word 'anti-realist.' I was mainly opposed to the correspondence theory of truth which presupposes an external, observer independent, reality to which our thoughts and notions are supposed to adhere to. It was in the context of the Philosophy of Science that I acquired my strong anti-realist views, (developing my views while taking Quantum Mechanics, Epistemology, and Artificial Intelligence courses at the same time). Pragmatism -- the offspring of William James -- was the single best view which best summarized my philosophical views. While pragmatism is a rejection of the absolutes, an abandonment of metaphysics, it does not get in the way of making progress in science. It is merely a new a perspective on science, a view that does not undermine the creativity of the creator of scientific theories, a re-rendering of the scientist as more of an artist and less of a machine.

However, pragmatism is not the anything-goes postmodern philosophy that many believe it to be. It is as if there is something about the world which compels scientists to do science in a similar way and for ideas to converge. I recently came across the concept of Linguistic Idealism, and being a recent reader of Wittgenstein this is a truly novel concept for me. Linguistic Idealism is a sort of dependence on language, or the Gamest-of-all-games that we (humans) play. It is a sort of epiphany that all statements we make about the world are statements within the customs of language which results in a criticism of the validity of those statements with respect to correspondence to an external reality. The criticism of statements' validity stems from the fact that they rely on language, a somewhat arbitrary set of customs and rules which we follow when we communicate. Philosophers such as Sellars have gone as far as to say that all awareness is linguistically mediated. If we step back, can we say anything at all about perception?

I'm currently reading a book on Wittgenstein called "Wittgenstein's Copernican Revolution: The Question of Linguistic Idealism."

Tuesday, June 16, 2009

On Edelman's "On what it means to see"

I previously mentioned Shimon Edelman in my blog and why his ideas are important for the advancement of computer vision. Today I want to post a review of a powerful and potentially influential 2009 piece written by Edelman.

Below is a review of the June 16th, 2009 version of this paper:
Shimon Edelman, On what it means to see, and what we can do about it, in Object Categorization: Computer and Human Vision Perspectives, S. Dickinson, A. Leonardis, B. Schiele, and M. J. Tarr, eds. (Cambridge University Press, 2009, in press). Penultimate draft.

I will refer to the article as OWMS (On What it Means to See).

The goal of Edelman's article is to demonstrate the limitations of conceptual vision (referred to as "seeing as"), criticize the modern computer vision paradigm as being overly conceptual, and show how providing a richer representation of a scene is required for advancing computer vision.

Edelman proposes non-conceptual vision, where categorization isn't forced on an input -- "because the input may best be left altogether uninterpreted in the traditional sense." (OWMS) I have to agree with the author, where abstracting away the image into a conceptual map is not only an impoverished view of the world, but it is not clear whether such a limited representation is useful for other tasks relying on vision (something like the bottom of Figure 1.2 in OWMS or the Figure seen below and taken from my Recognition by Association talk).

Building a Conceptual Map = Abstracting Away

Drawing on insights from the influential Philosopher Wittgenstein, Edelman discusses the difference between "seeing" versus "seeing as." "Seeing as" is the easy-to-formalize map-pixels-to-objects attitude which modern computer vision students are spoon fed from the first day of graduate school -- and precisely the attitude which Edelman attacks in this wonderful article. To explain "seeing" Edelman uses some nice prose from Wittgenstein's Philosophical Investigations; however, instead of repeating the passages Edelman selected, I will complement the discussion with a relevant passage by William James:

The germinal question concerning things brought for the first time before consciousness is not the theoretic "What is that?" but the practical "Who goes there?" or rather, as Horwicz has admirably put it, "What is to be done?" ... In all our discussions about the intelligence of lower animals the only test we use is that of their acting as if for a purpose. (William James in Principles of Psychology, page 941)

"Seeing as" is a non-invertible process that abstracts away visual information to produce a lower dimensional conceptual map (see Figure above), whereas "seeing" provides a richer representation of the input scene. Its not exactly clear what is the best way to operationalize this "seeing" notion in a computer vision system, but the escapability-from-formalization might be one of the subtle points Edelman is trying to make about non-conceptual vision. Quoting Edelman, when "seeing" we are "letting the seething mass of categorization processes that in any purposive visual system vie for the privilege of interpreting the input be the representation of the scene, without allowing any one of them to gain the upper hand." (OWMS) Edelman goes on to criticize "seeing as" because vision systems have to be open-ended in the sense that we cannot specify ahead of time all the tasks that vision will be applied to. According to Edelman, conceptual vision cannot capture the ineffability (or richness) of the human visual experience. Linguistic concepts capture a mere subset of visual experience, and casting the goal of vision as providing a linguistic (or conceptual) interpretation is limited. The sparsity of conceptual understanding is one key limitation of the modern computer vision paradigm. Edelman also criticizes the notion of a "ground-truth" segmentation in computer vision, arguing that a fragmentation of the scene into useful chunks is in the eye of the beholder.

To summarize, Edelman points out that "The missing component is the capacity for having rich visual experiences... The visual world is always more complex than can be expressed in terms of a ﬁxed set of concepts, most of which, moreover, only ever exist in the imagination of the beholder." (OWMS) Being a pragmatist, many of these words resonate deeply within my soul, and I'm particularly attracted to elements of Edelman's antirealism.

I have to give two thumbs up to this article for pointing out the flaws in the current way computer vision scientists go about tackling vision problems (in other words researchers too often blindly work inside the current computer vision paradigm and do not often enough question fundamental assumptions which can help new paradigms arise). Many similar concerns regarding Computer Vision I have already pointed out on this blog, and it is reassuring to find others point to similar paradigmatic weaknesses. Such insights need to somehow leave the Philosophy/Psychology literature and make a long lasting impact in the CVPR/NIPS/ICCV/ECCV/ICML communities. The problem is that too many researchers/hackers actually building vision systems and teaching Computer Vision courses have no clue who Wittgenstein is and that they can gain invaluabe insights from Philosophy and Psychology alike. Computer Vision is simply not lacking computational methods, it is gaining critical insights that cannot be found inside an Emacs buffer. In order to advance the field, one needs to: read, write, philosophize, as well as mathematize, exercise, diversify, be a hacker, be a speaker, be one with the terminal, be one with prose, be a teacher, always a student, a master of all trades; or simply put, be a Computer Vision Jedi.

Sunday, March 29, 2009

My 2nd Summer Internship in Google's Computer Vision Research Group

This summer I will be going for my 2nd summer internship at Google's Computer Vision Research Group in Mountain View, CA. My first real internship ever was last summer at Google -- I loved it.

There are many reasons for going back for the summer. Being in the research group and getting to address the same types of vision/recognition related problems as during my PhD is very important for me. It is not just a typical software engineering internship -- I get an better overall picture of how object recognition research can impact the world at a large scale, the Google-scale, before I finish my PhD and become set in my ways. Being in an environment where one can develop something super cool and weeks later millions of people see a difference in the way they interact with the internet (via Google's services of course) is also super exciting. Finally, the computing infrastructure that Google has set up for its researchers/engineers is unrivaled when it comes to large scale machine learning.

Many Google researchers (such as Fernando Periera) are big advocates of the data-driven mentality, where using massive amounts of data coupled with simple algorithms has more promise than complex algorithms with small amounts of training data. In earlier posts I already mentioned how my advisor at CMU is a big advocate of this approach in Computer Vision. This Unreasonable Effectiveness of Data is a powerful mentality yet difficult to embrace with the computational resources offered by one's computer science department. But this data-driven paradigm is not only viable at Google -- it is the essence of Google.

Tuesday, January 13, 2009

Computer Vision Courses, Measurement, and Perception

The new semester began at CMU and I'm happy to announce that I'm TAing my advisor's 16-721 Learning Based Methods in Vision this semester. I'm also auditing Martial Hebert's Geometry Based Methods in Vision.

This semester we're trying to encourage students of 16-721 LBMV09 to discuss papers using a course discussion blog. Quicktopic has been used in the past, but this semester we're using Google's Blogger.com for the discussion!

In the first lecture of LBMV, we discussed the problem of Measurement versus Perception in a Computer Vision context. The idea is that while we could build vision systems to measure the external world, it is percepts such as "there is a car on the bottom of the image" and not measurements such as "the bottom of the image is gray" that we are ultimately interested in. However, the line between measurement and perception is somewhat blurry. Consider the following gedanken experiment: place a human in a box and feed him an image and the question "is there a car on the bottom of the image?". Is it legitimate to call this apparatus as a measurement device? If so, then isn't perception a type of measurement? We would still have the problem of building a second version of this measurement device -- different people have different notions of cars and when we start feeding two apparatuses examples of objects that are very close to trucks/buses/vans/cars then would would loss measurement repeatability.

This whole notion of measurement versus perception in computer vision is awfully similar to the theory and observation problem in philosophy of science. Thomas Kuhn would say that the window through which we peer (our scientific paradigm) circumscribes the world we see and thus it is not possible to make theory-independent observations. For a long time I have been a proponent of this post modern view of the world. The big question that remains is: for computer vision to be successful how much consensus must there be between human perception and machine perception? If according to Kuhn Aristotelian and Galilean physicists would have different "observations" of an experiment, then should we expect intelligent machines to see the same world that we see?

Wednesday, March 19, 2008

Understanding the past

While a certain degree of advancement is possible when working in isolation on a scientific problem, interaction with the scientific community can drastically hasten one's progress. Most people have their own experiences with 'isolation' and 'interaction with a community' but I should explicitly delineate how I intend to use these terms. While 'interaction with a community' usually implies two-sided communication such as directly working together on a problem or simply discussing one's research with a group of other scientists, I want to consider a subtler form of interaction.

By reading about past accomplishments and former ideologies in a particular field, one is essentially communicating with the ideas of the past. While many scholarly articles -- in a field such as Computer Vision -- are mostly devoted to algorithmic details and experimental evaluations, it isn't too difficult to find manuscripts which reveal the philosophic underpinnings of the proposed research. It is even possible to find papers which are entirely devoted to understanding the philosophical motivations of a past generation of research.

A prime example of interaction with the past is the paper "Object Recognition in the Geometric Era: A Retrospective," by Joseph L. Mundy from Brown University. Such a compilation of ideas -- perhaps even a mini-summa -- is quite accessible to any researcher in the field of Computer Vision. Avoiding the specific details of any algorithm developed in the so-called Geometric Era of Computer Vision, this text is both entertaining and highly educational. By reading such texts one is effectively communicating (albeit one-way) with a larger scientific community of the past.

To conclude, I would like to point out that neither do I agree with some of the past paradigms of Computer Vision, nor am I a die-hard proponent of the modern statistical machine learning school of thought. However, to explore new territories what better way to scope the world around you than by standing on the shoulders of giants? We should be aware of what has been done in the past, and sometimes de-emphasize algorithmic minutiae in order to understand the philosophical motivations behind former paradigms.