Friday, September 01, 2006

visual class recognition inside a bounding box

Let's consider the problem of determining the object class that is located inside a bounding box?

Consider the PASCAL 2006 Visual Object Classes:
{'bicycle' 'bus' 'car' 'cat' 'cow' 'dog' 'horse' 'motorbike' 'person' 'sheep'}

If we use boosted decision trees to train a classifier that returns a posterior distribution over visual object classes given the size of the bounding box as input, how well can we expect our classifier to perform?

Surprisingly, it ends up that for the PASCAL 2006 'trainval' dataset, if we train such a classifier we are able to get a test rate of 40% correct. Given no information we expect a 10% accuracy rate for 10 object classes( This is just like guessing randomly; you're correct 10% of the time), so 40% is decent given that the classifier didn't actually get any appearance features from the interior of the bounding box.

If we look at the separate visual classes and look at the boosted decision tree performance for that visual class, we see something rather interesting:

'bicycle' 0.2139
'bus' 0.0242
'car' 0.5973
'cat' 0.1714
'cow' 0.0314
'dog' 0.1381
'horse' 0.0760
'motorbike' 0.0375
'person' 0.8327
'sheep' 0.1472

This means that the bounding box dimensions are highly discriminative for
the person and car classes. 83% of the bounding boxes containing a person were given the correct label by the classifier! However, since people are generally standing it is not too surprising to realize that the height of a person bounding box is generally much larger than its width and generally in a similar ratio.