abstract
Human object recognition in a physical 3-d environment
is still far superior to that of any robotic
vision system. We believe that one reason (out
of many) for this—one that has not heretofore
been significantly exploited in the artificial vision
literature—is that humans use a fovea to fixate on,
or near an object, thus obtaining a very high resolution
image of the object and rendering it easy to recognize.
In this paper, we present a novel method for
identifying and tracking objects in multi-resolution
digital video of partially cluttered environments.
Our method is motivated by biological vision systems
and uses a learned "attentive" interest map
on a low resolution data stream to direct a high
resolution "fovea." Objects that are recognized in
the fovea can then be tracked using peripheral vision.
Because object recognition is run only on
a small foveal image, our system achieves performance
in real-time object recognition and tracking
that is well beyond simpler systems.
demonstrations
The following videos demonstrate the experiments outlined in the paper
Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video.
The videos show four panes organized as follows:
Attentive Map |
Peripheral View |
Foveal View |
Results |
In the results pane, red boxes represent groundtruth and green boxes represent objects detected and tracked (which includes both true- and false-positives). The rectangle in the peripheral view pane shows where the fovea is being directed. The border is green during frames for which objects are tracked and changes color when the classifiers are run.
View the poster from our NIPS 2006 conference demonstration.
papers