Learning Spatial Context: Using Stuff to Find Things

Project Contributors: Geremy Heitz and Daphne Koller

Project Abstract

The sliding window approach of detecting rigid objects (such as cars) is predicated on the belief that the object can be identified from the appearance in a small region around the object. Other types of objects of amorphous spatial extent (\eg trees, sky), however, are more naturally classified based on texture or color. In this paper, we seek to combine recognition of these two types of objects into a system that leverages ``context'' toward improving detection. In particular, we cluster image regions based on their ability to serve as context for the detection of objects. Rather than providing an explicit training set with region labels, our method automatically groups regions based on both their appearance and their relationships to the detections in the image. We show that our things and stuff (TAS) context model produces meaningful clusters that are readily interpretable, and helps improve our detection ability over state-of-the-art detectors. We also present a method for learning the active set of relationships for a particular dataset. We present results on object detection in images from the PASCAL VOC 2005/2006 datasets and on the task of overhead car detection in satellite images, demonstrating significant improvements over state-of-the-art detectors.
Example detections from the satellite dataset that demonstrate context. Classifying using local appearance only, we might think that both windows at left are cars. However, when seen in context, the bottom detection is unlikely to be an actual car.


Learning Spatial Context: Using Stuff to Find Things.
Geremy Heitz and Daphne Koller.
European Conference on Computer Vision (ECCV), 2008 [PDF | PPT]

Things and Stuff Public Package

This page is under construction. My hope is to offer the Things and Stuff (TAS) code as a package that is easily downloaded and used by members of the research community. The package is implemented in MATLAB, and requires that you have an object detector, image segmentation tool, and region feature extractor available to you. The method builds on these basic inputs to produce a contextual model that learns to leverage context cues to improve the object detection. Details of the method are available in my ECCV 2008 paper. Included is a README file that walks you through how to use the package.

Download the TAS Package

Please send feedback and questions to Geremy Heitz at gaheitz@stanford.edu

website hit counters