This paper uses ideas from Abhinav Gupta's work on 3D scene understanding as well as Ali Farhadi's work on visual phrases; however, it also uses RGB-D input data (like many other CVPR 2013 papers).
W. Choi, Y. -W. Chao, C. Pantofaru, S. Savarese. "Understanding Indoor Scenes Using 3D Geometric Phrases" in CVPR, 2013. [pdf]
This paper shows a uses the crowd to learn which parts of birds are useful for fine-grained categorization. If you work on fine-grained categorization or run experiments with MTurk, then you gotta check this out!
Fine-Grained Crowdsourcing for Fine-Grained Recognition. Jia Deng, Jonathan Krause, Li Fei-Fei. CVPR, 2013. [ pdf ]
This paper won the best paper award. Congrats Google Research!
Fast, Accurate Detection of 100,000 Object Classes on a Single Machine. Thomas Dean, Mark Ruzon, Mark Segal, Jon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik. CVPR, 2013 [pdf]
The following is the Scene-SIRFs paper, which I thought was one of the best papers at this year's CVPR. The ideas to to decompose an input image into intrinsic images using Barron's algorithm which was initially shown to work on objects, but now is being applied to realistic scenes.
Intrinsic Scene Properties from a Single RGB-D Image. Jonathan T. Barron, Jitendra Malik. CVPR, 2013 [pdf]
This is a graph-based localization paper which uses a sort of "Visual Memex" to solve the problem.
Graph-Based Discriminative Learning for Location Recognition. Song Cao, Noah Snavely. CVPR, 2013. [pdf]
This paper provides an exciting new way of localizing contours in images which is orders of magnitude faster than the gPb. There is code available, so the impact is likely to be high.
Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection. Joseph J. Lim, C. Lawrence Zitnick, and Piotr Dollar. CVPR 2013. [ pdf ] [code@github]