Thursday, October 06, 2011

Kinect Object Datasets: Berkeley's B3DO, UW's RGB-D, and NYU's Depth Dataset

Why Kinect?
The Kinect, made by Microsoft, is starting to become quite a common item in Robotics and Computer Vision research.  While the Robotics community has been using the Kinect as a cheap laser sensor which can be used for obstacle avoidance, the vision community has been excited about using the 2.5D data associated with the Kinect for object detection and recognition.  The possibility of building object recognition systems which have access to pixel features as well as 2.5D features is truly exciting for the vision hacker community!

Berkeley's B3DO
First of all, I would like to mention that it looks like the Berkeley Vision Group jumped on the Kinect bandwagon.  But the data collection effort will be crowdsourced -- they need your help!  They need you to use your Kinect to capture your own home/office environments and upload it to their servers  This way, a very large dataset will be collected, and we, the vision hackers, can use machine learning techniques to learn what sofas, desks, chairs, monitors, and paintings look like.  They Berkeley hackers have a paper on this at one of the ICCV 2011 workshops in Barcelona, here is the paper information:

A Category-Level 3-D Object Dataset: Putting the Kinect to Work
Allison JanochSergey KarayevYangqing JiaJonathan T. BarronMario FritzKate SaenkoTrevor Darrell
ICCV-W 2011
[pdf] [bibtex]

UW's RGB-D Object Dataset
On another note, if you want to use 3D for your own object recognition experiments then you might want to check out the following dataset: University of Washington's RGB-D Object Dataset.  With this dataset you'll be able to compare against UW's current state-of-the-art.

In this dataset you will find RGB+Kinect3D data for many household items taken from different views.  Here is the really cool paper which got me excited about the RGB-D Dataset:
A Scalable Tree-based Approach for Joint Object and Pose Recognition
Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox
In the Twenty-Fifth Conference on Artificial Intelligence (AAAI), August 2011.

NYU's Depth Dataset
I have to admit that I did not know about this dataset (created by by Nathan Silberman of NYU), until after I blogged about the other two datasets.  Check out the NYU Depth Dataset homepage. However the internet is great, and only a few hours after posted this short blog post, somebody let me know that I left out this really cool NYU dataset.  In fact, it looks like this particular dataset might be at the LabelMe-level regarding dense object annotations, but with accompanying Kinect data.  Rob Fergus & Co strike again!

Nathan Silberman, Rob Fergus. Indoor Scene Segmentation using a Structured Light Sensor. To Appear: ICCV 2011 Workshop on 3D Representation and Recognition


  1. Anonymous2:47 PM

    Nice post! BTW, the device was made by Primesense (, not Microsoft (they just buy stuff).

  2. Anonymous5:27 PM

    Theres another Kinect dataset available from NYU, also being published at an ICCV workshop:

    Its larger than both the UW and Berkeley sets (combined) and contains dense labels over each scene for a very large number of classes. Also I believe the full kinect video streams are downloadable.

  3. Hey, thanks for letting me know about the NYU dataset! I learned something new today!

    I have updated the post to include the NYU dataset, it definitely looks like a cool dataset. And it has some matlab code too!

  4. Anonymous1:18 AM

    Nice post. do you know about local features that capture visual as well as depth characteristics. I know of BF-SIFT and 3D SIFT but I was wondering whats the state of the art on this and if there's code ready for download. Thought you may know about this. Thanks a million for your post.

  5. Unfortunately I don't know of any good RGB/3D features that are suitable for the Kinect, but I expect there to be dozens coming out over the next year or two.

    If I hear about anything exciting on this front at ICCV, I'll be sure to post an update.

  6. Anonymous4:38 PM

    After seeing examples of the Kinect technology and speech-to-text technology from Microsoft, I wonder if you can design a system that would capture all the needed elements of a patient visit; such as SOAP, Subjective, Objective,
    Assessment, and Plan; using a Kinect-Driven Robot that can also synthesize all these observations into an Electronic Medical Record database.

    Such a system could be sold to every Medical Clinic and ER in the world. Think Kinect-Robotic Documentation like a flight recorder !

    Feel free to share this idea with anyone who can build such a prototype. I would be happy to give fedback and testing ideas.


    Cesar "Bud" Labitan, MD, FAAFP, MBA

  7. Anonymous3:08 PM

    Thanks for this post. There is another dataset for the evaluation of SLAM systems here: