Disclaimer: I found this text file on my computer and I don't remember when I wrote this, but probably sometime in September.
The classic problem of computer vision is centered around the sub-problem of object recognition?
However, one key observation about vision is that work in this field has produced any stunning results in over 30 years of work. Perhaps people have been attacking this problem from the wrong direction.
When somebody is trying to do good object-recognition they would like to extract the locations of objects in an image. Recently, researchers have begun using machine learning techniques to learn the space of all object appearances; however, maybe the answer to this problem isn't in extracting objects from images.
When a human sees an image they see they can easily decompose the image into its constituent parts, namely the objects present in the scene. But is this statement really true about the nature of the human visual system? Language and society have introduced 'objects' into our understanding of the world, but perhaps we can still do vision without focusing on objects.
If we treat the image as a holistic entity, perhaps once we have seen enough images so that 'object' segmentation naturally happens. If a human was never given a language-based context for describing what they see, would they see objects?
Are objects an artifact of the fact that human experience is in most-part dominated by the language we know. Clearly language is concerned with objects.
We should perhaps focus on image understanding as a memory-based approach. Thus to understand a scene we might not really need to segment it into object categories. Perhaps we only need to associate a given scene with some other scene we have encountered.