What do you get when you average 421 bounding boxes of manually segmented sheep?
I averaged scenes (Torralba-style) containing objects of interest and the bounding boxes of objects of interest from the PASCAL 2006 Challenge (see the blog entry below). You can see the results here:
Thats cool, kinda looks like a sheep, but I guess the segmentation makes it kinda blurry, due to slight differences between the different images.ReplyDelete
There is no segmentation going on here. The 'kinda looks like a sheep' effect happens because this particular data set (and many others like it) contains sheep in very similar poses and at similar scales.ReplyDelete
If you take a random sample of possible images of the natural world that contain sheep (and do not constraint the viewpoint to be human-centered) and you average them you will not see a blurred out sheep. If you add the restriction that the images are sampled from a human-centered coordinate frame, then you will drastically reduce the space of all sheep-containing images. Taken to the next level, if you only keep images that contain one sheep that takes up most of the pixels in the image then you will again drastically reduce the space of all cow-containing images. Clearly as you make more and more restrictions like this, your average image will look more and more like each individual sample and thus the 'kinda looks like a sheep' effect arises.
Humans are able to recognize sheep in many different scenarios, but the 'kinda looks like a sheep' effect only happens because many of the images in this data set contain one sheep in the center of the image at roughly the same size with green grass on the outside. If you train a classifier on these types of images you will most likely overfit.