Today's post is about 3D object recognition, that is localization and recognition of objects from 3D laser data (and not the perception/recovery of 3D from 2D images).
My first exposure to object recognition was in the context of specific object recognition inside 3D laser scans. In specific object recognition, you are looking for 'stapler X' or 'computer keyboard Y' and not just any stapler/computer keyboard. If the computer keyboard was black then it will always be black since we assume intrinsic appearance doesn't change in specific object recognition. This is a different (and easier!) problem than category-based recognition where colors and shapes can change due to intra-class variation.
The problem of specific object 3D recognition I'll be discussing is as follows:
Given M detailed 3D object models, localize all (if any) of these objects (in any spatial configuration) in a 3D laser scan of a scene potentially containing much more stuff than just the objects of interest (aka the clutter).
There was actually quite a lot of research in this style of 3D recognition in the 1990's with the belief that 3D recognition would be much simpler than recognition from 2D images. The idea (Marr's idea, actually) was that object recognition in 2D images would by preceded by object-identity independent 3D surface extraction so that 2D recognition would resemble this version of 3D recognition after some initial geometric processing.
However, it ends up that many of the ambiguities present in 2D imagery were also present in 3D laser data -- the problems of bottom-up perceptual grouping were as difficult in 3D as in 2D. Just because you have 3D locations associated with parts of an object does not make it any easier to tell where the object begins and where it ends (namely the problem of segmentation). It is this inability to segment out objects that resulted in the widespread usage of local descriptors such as SIFT.
Many of today's 2D object recognition problems rely on local descriptors which bypass the problem of segmentation, and it isn't surprising that the 3D recognition problem I described above was elegantly approached by A.E. Johnson and M. Hebert as early as 1997 via a local 3D descriptor known as a Spin Image.
The idea behind a Spin Image is actually very similar to that of a SIFT descriptor used in image-based object recognition. A spin image is a regional point descriptor used to characterize the shape properties of a 3D object with respect to a single oriented point. It is called a "spin" image because the process of creating such a descriptor can be envisioned as spinning a sheet around the axis defined by an oriented point and collecting the contributions of nearby points. Since a point's normal can be computed fairly robustly given its neighboring points, the spin image is highly robust to rigid transformations when defined with respect to this canonical frame. Since it is 2D and not 3D it does lose some discriminative power -- two different yet related surfaces chunks can have the same spin image. The idea behind using this descriptor for recognition is that we can compute many of these descriptors all over the surface of our object models as well as the input 3D laser scan. We then have to perform matching over these descriptors to create some sort of correspondences (potentially spatially verified).
(For a fairly recent overview of spin images as well as other similar regional shape descriptors and their applications to 3D object recognition check out Andrea Frome's ECCV 2004 paper, Recognizing Objects in Range Data Using Regional Point Descriptors.)
Spin images aren't a thing of the past, in fact here is a link to a RSS 2009 paper by Kevin Lai and Dieter Fox which uses spin images (and my local distance function learning approach!):
3D Laser Scan Classification Using Web Data and Domain Adaptation
do you know of any good implementations for spin images (on matlab)?
ReplyDeleteUnfortunately, I don't know of any spin image Matlab implementations. My research as an undergraduate student was on Range Data Registration, and I used spin images in my research back then.
ReplyDeleteI had written my own code back then, and I suspect a Matlab implementation of spin-images is fairly straightforward if you have several years of hacking experience. The tricky bit is the normal estimation and scale estimation for descriptor. There might also be some issues with non-uniform sampling of points. Check out some of Daniel Huber's papers if you want to learn more about spin images used for 3D vision tasks. Huber also worked with Martial Hebert (briefly after A.E.Johnson, the "spin images were my thesis" guy, graduated from CMU).
If I find any spin-image implementation (especially Matlab), I'll post an update on my blog.
I think a year of spin-image appearance is 1999, not 1997.
ReplyDeleteI remember 1997 because that was when AE Johnson finished his PhD thesis at CMU.
ReplyDeleteSpin Images PhD thesis.
It was the first full PhD thesis I ever read, so I remember 1997 on the front cover very clearly. There was also a PAMI submission in 1998 on the topic, and some other papers came out in 1999. So it really depends on how you date ideas. For me a PhD dissertation is legit enough to count it as the official idea start-time. But he was probably working on these ideas as far back as 1995 or 1994.
Dear Tomasz,
ReplyDeleteI am a student doing my thesis on Object Recognition in 3d Point Cloud. I am implementing spin images. I am able to generate spin images based on normals(generated using PCA method). However, I am facing problem in matching the model spin image with scene spin image. Can you please let me know how to match spin images.
I haven't working with Spin Images for quite some time, but from my undergraduate years (when I worked on Spin Images) I remember that I had to re-sample my meshes to have equal point density per surface area element. You can probably use Deep Learning to learn a better regional point descriptor these days.
ReplyDeleteA great summary can be found in the 2004 Frome et al. paper titled Recognizing Objects in Range Data Using Regional Point Descriptors: http://www.cs.jhu.edu/~misha/Papers/Frome04.pdf
Keep posting the good work. Some really helpful information in there. Bookmarked. Nice to see your site.data scraping
ReplyDelete