Tombone's Computer Vision Blog: July 2009

Friday, July 31, 2009

Simple Newton's Method Fractal code in MATLAB

Due to popular request I've sharing some very simple Newton's Method Fractal code in MATLAB. It produces the following 800x800 image (in about 2.5 seconds on my 2.4Ghz Macbook Pro):

>> [niters,solutions] = matlab_fractal;
>> imagesc(niters)

function [niters,solutions] = matlab_fractal
%Create Newton's Method Fractal Image
%Tomasz Malisiewcz (tomasz@cmu.edu)
%http://quantombone.blogspot.com/
NITER = 40;
threshold = .001;

[xs,ys] = meshgrid(linspace(-1,1,800), linspace(-1,1,800));
solutions = xs(:) + i*ys(:);
select = 1:numel(xs);
niters = NITER*ones(numel(xs), 1);

for iteration = 1:NITER
oldi = solutions(select);

%in newton's method we have z_{i+1} = z_i - f(z_i) / f'(z_i)
solutions(select) = oldi - f(oldi) ./ fprime(oldi);

%check for convergence or NaN (division by zero)
differ = (oldi - solutions(select));
converged = abs(differ) < threshold;
problematic = isnan(differ);

niters(select(converged)) = iteration;
niters(select(problematic)) = NITER+1;
select(converged | problematic) = [];
end

niters = reshape(niters,size(xs));
solutions = reshape(solutions,size(xs));

function res = f(x)
res = (x.^2).*x - 1;

function res = fprime(x)
res = 3*x.^2;

Wednesday, July 15, 2009

Spin Images for object recognition in 3D Laser Data

Today's post is about 3D object recognition, that is localization and recognition of objects from 3D laser data (and not the perception/recovery of 3D from 2D images).

My first exposure to object recognition was in the context of specific object recognition inside 3D laser scans. In specific object recognition, you are looking for 'stapler X' or 'computer keyboard Y' and not just any stapler/computer keyboard. If the computer keyboard was black then it will always be black since we assume intrinsic appearance doesn't change in specific object recognition. This is a different (and easier!) problem than category-based recognition where colors and shapes can change due to intra-class variation.

The problem of specific object 3D recognition I'll be discussing is as follows:

Given M detailed 3D object models, localize all (if any) of these objects (in any spatial configuration) in a 3D laser scan of a scene potentially containing much more stuff than just the objects of interest (aka the clutter).

There was actually quite a lot of research in this style of 3D recognition in the 1990's with the belief that 3D recognition would be much simpler than recognition from 2D images. The idea (Marr's idea, actually) was that object recognition in 2D images would by preceded by object-identity independent 3D surface extraction so that 2D recognition would resemble this version of 3D recognition after some initial geometric processing.

However, it ends up that many of the ambiguities present in 2D imagery were also present in 3D laser data -- the problems of bottom-up perceptual grouping were as difficult in 3D as in 2D. Just because you have 3D locations associated with parts of an object does not make it any easier to tell where the object begins and where it ends (namely the problem of segmentation). It is this inability to segment out objects that resulted in the widespread usage of local descriptors such as SIFT.

Many of today's 2D object recognition problems rely on local descriptors which bypass the problem of segmentation, and it isn't surprising that the 3D recognition problem I described above was elegantly approached by A.E. Johnson and M. Hebert as early as 1997 via a local 3D descriptor known as a Spin Image.

The idea behind a Spin Image is actually very similar to that of a SIFT descriptor used in image-based object recognition. A spin image is a regional point descriptor used to characterize the shape properties of a 3D object with respect to a single oriented point. It is called a "spin" image because the process of creating such a descriptor can be envisioned as spinning a sheet around the axis defined by an oriented point and collecting the contributions of nearby points. Since a point's normal can be computed fairly robustly given its neighboring points, the spin image is highly robust to rigid transformations when defined with respect to this canonical frame. Since it is 2D and not 3D it does lose some discriminative power -- two different yet related surfaces chunks can have the same spin image. The idea behind using this descriptor for recognition is that we can compute many of these descriptors all over the surface of our object models as well as the input 3D laser scan. We then have to perform matching over these descriptors to create some sort of correspondences (potentially spatially verified).

(For a fairly recent overview of spin images as well as other similar regional shape descriptors and their applications to 3D object recognition check out Andrea Frome's ECCV 2004 paper, Recognizing Objects in Range Data Using Regional Point Descriptors.)

Spin images aren't a thing of the past, in fact here is a link to a RSS 2009 paper by Kevin Lai and Dieter Fox which uses spin images (and my local distance function learning approach!):
3D Laser Scan Classification Using Web Data and Domain Adaptation

Friday, July 03, 2009

Linguistic Idealism

I have been an anti-realist since a freshman in college. Due to my lack of philosophical vocabulary I might have even called myself an idealist back then. However, looking back I think it would have been much better to use the word 'anti-realist.' I was mainly opposed to the correspondence theory of truth which presupposes an external, observer independent, reality to which our thoughts and notions are supposed to adhere to. It was in the context of the Philosophy of Science that I acquired my strong anti-realist views, (developing my views while taking Quantum Mechanics, Epistemology, and Artificial Intelligence courses at the same time). Pragmatism -- the offspring of William James -- was the single best view which best summarized my philosophical views. While pragmatism is a rejection of the absolutes, an abandonment of metaphysics, it does not get in the way of making progress in science. It is merely a new a perspective on science, a view that does not undermine the creativity of the creator of scientific theories, a re-rendering of the scientist as more of an artist and less of a machine.

However, pragmatism is not the anything-goes postmodern philosophy that many believe it to be. It is as if there is something about the world which compels scientists to do science in a similar way and for ideas to converge. I recently came across the concept of Linguistic Idealism, and being a recent reader of Wittgenstein this is a truly novel concept for me. Linguistic Idealism is a sort of dependence on language, or the Gamest-of-all-games that we (humans) play. It is a sort of epiphany that all statements we make about the world are statements within the customs of language which results in a criticism of the validity of those statements with respect to correspondence to an external reality. The criticism of statements' validity stems from the fact that they rely on language, a somewhat arbitrary set of customs and rules which we follow when we communicate. Philosophers such as Sellars have gone as far as to say that all awareness is linguistically mediated. If we step back, can we say anything at all about perception?

I'm currently reading a book on Wittgenstein called "Wittgenstein's Copernican Revolution: The Question of Linguistic Idealism."