Objects play a crucial role in our everyday activities. Though multisensory object-centric learning has shown great potential lately, the modeling of objects in prior work is rather unrealistic. ObjectFolder 1.0 is a recent dataset that introduces 100 virtualized objects with visual, acoustic, and tactile sensory data. However, the dataset is small in scale and the multisensory data is of limited quality, hampering generalization to real-world scenarios. We present ObjectFolder 2.0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1.0 in three aspects. First, our dataset is 10 times larger in the amount of objects and orders of magnitude faster in rendering time. Second, we significantly improve the multisensory rendering quality for all three modalities. Third, we show that models learned from virtual objects in our dataset successfully transfer to their real-world counterparts in three challenging tasks: object scale estimation, contact localization, and shape reconstruction. ObjectFolder 2.0 offers a new path and testbed for multisensory learning in computer vision and robotics.
In the supplementary video, we show 1) the motivation and goal of our dataset, and the comparison against ObjectFolder 1.0; 2) visualization of the visual appearance for all 1,000 objects obtained from Object Files under different camera viewpoints or lighting conditions; 3) examples of the acoustic data (impact sounds) obtained from Object Files for some sample objects; 4) examples of the tactile RGB images obtained from Object Files for some sample objects; 5) demos of three Sim2Real tasks.
R. Gao, Z. Si, Y. Chang, S. Clarke, J. Bohg, L. Fei-Fei, W. Yuan, J. Wu. "ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer". In CVPR, 2022. [Bibtex]
R. Gao, Y. Chang, S. Mall, L. Fei-Fei, J. Wu. "ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations". In CoRL, 2021. [Bibtex]
We thank Sudharshan Suresh, Mark Rau, Doug James, and Stephen Tian for helpful discussions. The work is in part supported by the Stanford Institute for Human-Centered AI (HAI), the Stanford Center for Integrated Facility Engineering, NSF CCRI #2120095, Toyota Research Institute (TRI), Samsung, Autodesk, Amazon, Adobe, Google, and Facebook.