The Conference on Computer Vision and Pattern Recognition (CVPR) 2022 is taking place June 19-24. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction

Authors: Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua Tenenbaum, Chuang Gan
Contact: kaichun@cs.stanford.edu
Links: Paper | Video | Website
Keywords: fixing malfunctional 3d shapes, shape functionality, dynamic model


Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior

Authors: Davis Rempe, Jonah Philion, Leonidas Guibas, Sanja Fidler, Or Litany
Contact: drempe@stanford.edu
Links: Paper | Website
Keywords: autonomous vehicles, adversarial scenario generation, traffic simulation


Measuring Compositional Consistency for Video Question Answering

Authors: Mona Gandhi, Mustafa Omer Gul, Eva Prakash, Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala
Contact: momergul@alumni.stanford.edu
Links: Paper | Video | Website
Keywords: compositionality, video question answering, evaluation, dataset, metrics


Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning

Authors: Weixin Liang*, Yuhui Zhang*, Yongchan Kwon*, Serena Yeung, James Zou
Contact: yuhuiz@stanford.edu
Links: Paper | Website
Keywords: multi-modal representation learning, contrastive representation learning, cone effect, modality gap


Multi-Objective Diverse Human Motion Prediction with Knowledge Distillation

Authors: Hengbo Ma, Jiachen Li, Ramtin Hosseini, Masayoshi Tomizuka, Chiho Choi
Contact: hengbo_ma@berkeley.edu; jiachen_li@stanford.edu
Award nominations: Oral presentation
Links: Paper
Keywords: human motion prediction, robotics


ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Authors: Ruohan Gao*, Zilin Si*, Yen-Yu Chang*, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu
Contact: rhgao@cs.stanford.edu
Links: Paper | Video | Website
Keywords: multisensory, object, dataset, sim2real


PartGlot: Learning Shape Part Segmentation from Language Reference Games

Authors: Juil Koo, Ian Huang, Panos Achlioptas, Leonidas Guibas, Minhyuk Sung
Contact: ianhuang@stanford.edu
Links: Paper | Video | Website
Keywords: language grounding, semantic part segmentation, multimodal learning, natural language processing, 3d vision


Point2Cyl: Reverse Engineering 3D Objects from Point Clouds to Extrusion Cylinders

Authors: Mikaela Angelina Uy*, Yen-yu Chang*, Minhyuk Sung, Purvi Goel, Joseph Lambourne, Tolga Birdal, Leonidas Guibas
Contact: mikacuy@stanford.edu
Links: Paper | Video | Website
Keywords: reverse engineering, cad, shape modeling, editing, segmentation, point clouds


Programmatic Concept Learning for Human Motion Description and Synthesis

Authors: Sumith Kulal*, Jiayuan Mao*, Alex Aiken §, Jiajun Wu§
Contact: sumith@cs.stanford.edu
Links: Paper | Website
Keywords: hierarchical representation, human motion, video understanding, video synthesis


Revisiting the “Video” in Video-Language Understanding

Authors: Shyamal Buch, Cristóbal Eyzaguirre, Adrien Gaidon, Jiajun Wu, Li Fei-Fei, Juan Carlos Niebles
Contact: shyamal@cs.stanford.edu
Award nominations: Oral Presentation
Links: Paper | Website
Keywords: video understanding, vision and language, multimodal


Rotationally Equivariant 3D Object Detection

Authors: Hong-Xing Yu, Jiajun Wu, Li Yi
Contact: koven@cs.stanford.edu
Links: Paper | Video | Website
Keywords: rotation equivariance, detection, object


We look forward to seeing you at CVPR!