Ruohan Gao

Postdoctoral Research Fellow
Department of Computer Science
Stanford University
Email: rhgao[AT]cs[DOT]stanford[DOT]edu

CV     Google Scholar     GitHub     Short Bio     Twitter     Thesis

I am a SAIL Postdoctoral Fellow working with Prof. Jiajun Wu, Prof. Fei-Fei Li, and Prof. Silvio Savarese at the Stanford Vision and Learning Lab. I received my Ph.D. at The University of Texas at Austin advised by Prof. Kristen Grauman, and my B.Eng. from The Chinese University of Hong Kong. My research interests are mainly in computer vision and machine learning. Particularly, I am interested in multisensory learning with sight, sound, and touch. My research goal is to teach machines to see, hear, and feel like humans to perceive, understand, and interact with the multisensory world.

News

  • I will serve as an Area Chair for ICCV 2023 and SPC for AAAI 2023.

  • We are organizing the Creative AI Across Modalities Workshop at AAAI 2023.

  • We are organizing the AV4D Workshop at ECCV 2022.

  • We are organizing the Sight and Sound Workshop at CVPR 2022.

  • I am very honored to have received the Michael H. Granof Award that recognizes UT Austin's Top 1 Doctoral Dissertation of 2021.

  • We are organizing the Sight and Sound Workshop at CVPR 2021.

  • We are organizing the Embodied Multimodal Learning Workshop at ICLR 2021.

  • Publications

    (*equal contribution, †equal advising)

    The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

    Ruohan Gao*, Yiming Dou*, Hao Li*, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, Jiajun Wu
    Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

    PDF Project Page ObjectFolder Real Demo Code






    RealImpact: A Dataset of Impact Sound Fields for Real Objects

    Samuel Clarke, Ruohan Gao, Mason Wang, Mark Rau, Julia Xu, Jui-Hsien Wang, Doug James, Jiajun Wu
    Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
    (Highlight)

    PDF Supp






    Learning Object-centric Neural Scattering Functions for Free-viewpoint Relighting and Scene Composition

    Hong-Xing Yu*, Michelle Guo*, Alireza Fathi, Yen-Yu Chang, Eric Ryan Chan, Ruohan Gao, Thomas Funkhouser, Jiajun Wu
    Transactions on Machine Learning Research (TMLR), 2023.
    PDF






    Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

    Ruohan Gao*, Hao Li*, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio Savarese, Li Fei-Fei, Jiajun Wu
    International Conference on Robotics and Automation (ICRA), 2023.
    PDF Project Page Code




    Differentiable Physics Simulation of Dynamics-Augmented Neural Objects


    Simon Le Cleac'h, Hong-Xing Yu, Michelle Guo, Taylor A Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, Mac Schwager
    Robotics and Automation Letters (RA-L), 2023.

    PDF







    An Extensible Multi-modal Multi-task Object Dataset with Materials

    Trevor Scott Standley, Ruohan Gao, Dawn Chen, Jiajun Wu, Silvio Savarese
    International Conference on Learning Representations (ICLR), 2023.
    PDF



    See, Hear, Feel: Smart Sensory Fusion for Robotic Manipulation

    Hao Li*, Yizhi Zhang*, Junzhe Zhu, Shaoxiong Wang, Michelle A Lee, Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao, Jiajun Wu
    Conference on Robot Learning (CoRL), 2022.

    PDF Supp Project Page Code





    ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

    Ruohan Gao*, Zilin Si*, Yen-Yu Chang*, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu.
    Conference on Computer Vision and Pattern Recognition (CVPR), 2022.


    PDF Supp Project Page Dataset/Code







    Visual Acoustic Matching

    Changan Chen, Ruohan Gao, Paul Calamia, Kristen Grauman.
    Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
    (Oral Presentation)

    PDF Project Page Code Media Coverage





    ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations

    Ruohan Gao, Yen-Yu Chang*, Shivani Mall*, Li Fei-Fei, Jiajun Wu.
    Conference on Robot Learning (CoRL), 2021.

    PDF Supp Project Page Dataset/Code






    DiffImpact: Differentiable Rendering and Identification of Impact Sounds

    Samuel Clarke, Negin Heravi, Mark Rau, Ruohan Gao, Jiajun Wu, Doug James, Jeannette Bohg.
    Conference on Robot Learning (CoRL), 2021.
    (Oral Presentation)

    PDF Project Page






    Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

    Rishabh Garg, Ruohan Gao, Kristen Grauman.
    British Machine Vision Conference (BMVC), 2021.
    (Oral Presentation) [Best Paper Award Runner Up]

    PDF Project Page Dataset






    Look and Listen: From Semantic to Spatial Audio-Visual Perception

    Ruohan Gao
    Ph.D. Dissertation, UT Austin, 2021.
    Michael H. Granof University's Best Doctoral Dissertation Award
    UT Austin Outstanding Dissertation Award in Mathematics, Engineering, Physical Science, and Biological and Life Sciences




    VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency

    Ruohan Gao and Kristen Grauman.
    Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

    PDF Supp Project Page Code Media Coverage





    Learning to Set Waypoints for Audio-Visual Navigation

    Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen Grauman.
    International Conference on Learning Representations (ICLR), 2021.
    PDF Project Page Code




    VisualEchoes: Spatial Image Representation Learning through Echolocation

    Ruohan Gao, Changan Chen, Ziad Al-Halah, Carl Schissler, Kristen Grauman.
    European Conference on Computer Vision (ECCV), 2020.
    PDF Supp Data Project Page






    Listen to Look: Action Recognition by Previewing Audio

    Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani.
    Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

    PDF Supp Poster Project Page Code






    Co-Separating Sounds of Visual Objects

    Ruohan Gao and Kristen Grauman.
    International Conference on Computer Vision (ICCV), 2019.

    PDF Supp Poster Project Page Code








    2.5D Visual Sound

    Ruohan Gao and Kristen Grauman.
    Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    (Oral Presentation) [Best Paper Award Finalist]
    PDF Project Page Dataset Code Media Coverage Oral Video




    Learning to Separate Object Sounds by Watching Unlabeled Video

    Ruohan Gao, Rogerio Feris, Kristen Grauman.
    European Conference on Computer Vision (ECCV), 2018.
    (Oral Presentation)
    PDF Supp Poster Project Page Code Oral Video




    ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids

    Dinesh Jayaraman, Ruohan Gao, Kristen Grauman.
    European Conference on Computer Vision (ECCV), 2018.
    PDF Supp




    Im2Flow: Motion Hallucination from Static Images for Action Recognition

    Ruohan Gao, Bo Xiong, Kristen Grauman.
    Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    (Oral Presentation)
    PDF Supp Poster Project Page Code Oral Video








    On-Demand Learning for Deep Image Restoration

    Ruohan Gao and Kristen Grauman.
    International Conference on Computer Vision (ICCV), 2017.

    PDF Supp Poster Project Page Code









    Object-Centric Representation Learning from Unlabeled Videos

    Ruohan Gao, Dinesh Jayaraman, Kristen Grauman.
    Asian Conference on Computer Vision (ACCV), 2016.

    PDF Poster Project Page





    Teaching

  • CS231N: Deep Learning for Computer Vision, Spring 2023 (Co-Instructing with Fei-Fei Li and Yunzhu Li)

  • CS231N: Deep Learning for Computer Vision, Spring 2022 (Co-Instructing with Fei-Fei Li and Jiajun Wu)

  • Talks

  • Invited Talk at MERL Seminar Series, Sept. 2021, "Look and Listen: From Semantic to Spatial Audio-Visual Perception" (PDF)

  • Invited Talk at the UTSA AI Consortium Seminar Series, April 2020, "Look to Listen and Listen to Look: Audio-Visual Learning from Video" (PDF, PPT)

  • Invited Talk at the MIT Vision Seminar Series, Sept. 2019, "Learning to See and Hear with Unlabeled Video" (PDF, PPT)

  • Invited Talk at the Sight and Sound Workshop, CVPR'19, "Learning to See and Hear with Unlabeled Video" (PDF, PPT)

  • CVPR'19 Oral, Long Beach, "2.5D Visual Sound" (Video, PDF, PPT)

  • ECCV'18 Oral, Munich, Germany, "Learning to Separate Object Sounds by Watching Unlabeled Video" (Video, PDF, PPT)

  • CVPR'18 Oral, Salt Lake City, "Im2Flow: Motion Hallucination from Static Images for Action Recognition" (Video, PDF)

  • Media Coverage

  • Engadget: Auditory AIs promise a more immersive AR/VR experience.

  • UT Austin News: Looking and Listening in Machine Learning.

  • Facebook AI Blog: New milestones in embodied AI.

  • MIT Technology Review: Deep learning turns mono recordings into immersive sound.

  • Two Minute Papers: This AI produces binaural (2.5D) audio.

  • Facebook AI Blog: Creating 2.5D visual sound for an immersive audio experience.
  • Undergraduate Research Publications

    Ruohan Gao, Huanle Xu, Pili Hu, Wing Cheong Lau, “Accelerating Graph Mining Algorithms via Uniform Random Edge Sampling”, IEEE ICC, 2016. [PDF]

    Ruohan Gao, Pili Hu, Wing Cheong Lau, “Graph Property Preservation under Community-Based Sampling”, IEEE Globecom, 2015. [PDF]

    Ruohan Gao, Huanle Xu, Pili Hu, Wing Cheong Lau, “Accelerating Graph Mining Algorithms via Uniform Random Edge Sampling (Poster)”, ACM Conference on Online Social Networks (COSN), 2015.