The sliding window approach of detecting rigid objects (such as
cars) is predicated on the belief that the object can be identified
from the appearance in a small region around the object. Other
types of objects of amorphous spatial extent (\eg trees, sky),
however, are more naturally classified based on texture or color.
In this paper, we seek to combine recognition of these two types of
objects into a system that leverages ``context'' toward improving
detection. In particular, we cluster image regions based on their
ability to serve as context for the detection of objects. Rather
than providing an explicit training set with region labels, our
method automatically groups regions based on both their appearance
and their relationships to the detections in the image. We show
that our things and stuff (TAS) context model produces meaningful
clusters that are readily interpretable, and helps improve our
detection ability over state-of-the-art detectors. We also present
a method for learning the active set of relationships for a
particular dataset. We present results on object detection in
images from the PASCAL VOC 2005/2006 datasets and on the task of
overhead car detection in satellite images, demonstrating
significant improvements over state-of-the-art detectors.
|Example detections from the satellite dataset that
demonstrate context. Classifying using local appearance only, we
might think that both windows at left are cars. However, when seen
in context, the bottom detection is unlikely to be an actual car.