An egocentric vision system, is a framework consisting of a wearable camera that continuoulsy captures the scene in front of the first-person. In particular, I define an egocentric vision system as a framework that leverages different levels of first-person attention to identify important objects and faces in the scene that contribute to subject's activities. First-person's attitude, including where she looks (gaze) and what she does (hands manipulating objects) provide an invaluable context for determining the objects that grab her attention at any given time. Our goal is to use these structured sources of information coming from first-person in order to enable weakly supervised recognition of objects and activities.
I aim at developing action recognition techniques that rely on semantically meaningful features which capture interaction of objects with each other. This is in contrast to state of the art techniques that are based on space-time interest points or point trajectories. Many actions involve similar dynamics and hand-object relationships, but differ in their purpose and meaning. The key to differentiating these actions is the ability to identify how they change the state of objects and materials in the environment.
Segmentation is one of the most fundamental problems in computer vision. If segmentation is solved, many of the big challenges in the field become trivial.