Stanford-ECM Dataset


We introduce the egocentric multimodal dataset (Stanford-ECM), which comprises about 27 hours of egocentric video augmented with heart rate and acceleration data. The lengths of the individual videos cover a diverse range from 3 minutes to about 51 minutes in length. A mobile phone was used to collect egocentric video at 720x1280 resolution and 30 fps, as well as triaxial acceleration at 30Hz. The mobile phone was equipped with a wide-angle lens, so that the horizontal field of view was enlarged from 45 degrees to about 64 degrees. A wrist-worn heart rate sensor was used to capture the heart rate every 5 seconds. The phone and heart rate monitor was time-synchronized through Bluetooth, and all data was stored in the phone’s storage. Piecewise cubic polynomial interpolation was used to fill in any gaps in heart rate data. Finally, data was aligned to the millisecond level at 30 Hz.


Please submit the data use agreement to receive a full copy of the dataset.


All dataset were de-identified via face pixelization and audio-removal. Due to the difficulty of face pixelization, we are able to share only 90 of 113 videos that the paper used. The list of test set videos is identical to the paper.


Download the paper here.

        title={Jointly Learning Energy Expenditures and Activities using Egocentric Multimodal Signals},
        author = {Nakamura, Katsuyuki and Yeung, Serena and Alahi, Alexandre and Fei-Fei, Li},
        booktitle={Computer Vision and Pattern Recognition (CVPR)},


stanford.ecm [at]