Tuesday, January 12, 2010

Image Interpretation Objectives

An example of a typical complex outdoor natural scene that a general knowledge-based image interpretation system might be expected to understand is shown in Figure 1. An objective of such systems is to identify semantically meaningful visual entities in a digitized and segmented image of some scene. That is, to correctly assign semantically meaningful labels (e.g., house, tree, grass, and so on) to regions in an image -- see [29,30]. A computer-based image interpretation system can be viewed as having two major components, a "low-level" component and a "high-level" component [19],[31]. In many respects, the low-level portion of the system is designed to mimic the early stages of visual image processing in human-like systems. In these early stages, it is believed that scenes are partitioned, to some extent, into regions that are homogeneous with respect to some set of perceivable features (i.e., feature vector) in the scene [6],[40],[39]. To this extent, most low-level general purpose computer vision systems are designed to perform the same task. An example of a partitioning (i.e., segmentation) of Figure 1 into homogeneous regions is shown in Figure 2. The knowledge-based computer vision system we shall describe in this paper is not currently concerned with resegmenting portions of an image. Rather, its task is to correctly label as many regions as possible in a given segmentation.

This a direct quote from a 1984 paper on computer vision. A great example of segmentation-driven scene understanding. The content is similar enough to my own line of work that it could have been an excerpt from my own thesis.

It is actually in a section called Image Interpretation Objectives from "Evidential Knowledge-Based Computer Vision" by Leonard P. Wesley, 1984. I found this while reading lots of good tech reports from SRI International's AI Center in Menlo Park. Some good stuff there by Tenenbaum, Barrow, Duda, Hart, Nillson, Fischler, Pereira, Pentland, Fua, Szeliski, to name a few. Lots of stuff there is relevant to scene understanding and grounds the problem in robotics (since there was no "internet" vision back in the 70s and 80s).

On another note, I still haven't been able to find a copy of the classic paper, Experiments in Interpretation-Guided Segmentation by Tenenbaum and Barrow from 1978. If anybody knows where to find a pdf copy send me an email. UPDATE: Thanks to the quick reply! I have the paper now.