Human object recognition in a physical 3-d environment is still far superior to that of any robotic vision system. We believe that one reason (out of many) for this—one that has not heretofore been significantly exploited in the artificial vision literature—is that humans use a fovea to fixate on, or near an object, thus obtaining a very high resolution image of the object and rendering it easy to recognize. In this paper, we present a novel method for identifying and tracking objects in multi-resolution digital video of partially cluttered environments. Our method is motivated by biological vision systems and uses a learned "attentive" interest map on a low resolution data stream to direct a high resolution "fovea." Objects that are recognized in the fovea can then be tracked using peripheral vision. Because object recognition is run only on a small foveal image, our system achieves performance in real-time object recognition and tracking that is well beyond simpler systems.
The following videos demonstrate the experiments outlined in the paper Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video. The videos show four panes organized as follows:
In the results pane, red boxes represent groundtruth and green boxes represent objects detected and tracked (which includes both true- and false-positives). The rectangle in the peripheral view pane shows where the fovea is being directed. The border is green during frames for which objects are tracked and changes color when the classifiers are run.