Stanford AI Lab Papers and Talks at CVPR 2023

June 20, 2023

The Computer Vision and Pattern Recognition Conference (CVPR) 2023 is being hosted in Vancouver, Canada on June 18th - 22th. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

Finetune like you pretrain: Improved finetuning of zero-shot vision models

Authors: Sachin Goyal, Ananya Kumar, Sankalp Garg, Zico Kolter, Aditi Raghunathan
Contact: sachingo@andrew.cmu.edu
Keywords: robust fine-tuning, multimodal, robustness, clip, transfer learning

Multi-Object Manipulation via Object-Centric Neural Scattering Functions

Authors: Stephen Tian*, Yancheng Cai*, Hong-Xing Yu, Sergey Zakharov, Katherine Liu, Adrien Gaidon, Yunzhu Li, Jiajun Wu
Contact: tians@stanford.edu
Links: | Video | Website
Keywords: dynamics models, neural rendering, robotic manipulation

Accidental Light Probes

Authors: Hong-Xing Yu, Samir Agarwala, Charles Herrmann, Richard Szeliski, Noah Snavely, Jiajun Wu, Deqing Sun
Contact: koven@cs.stanford.edu
Links: Paper | Video | Website
Keywords: inverse rendering, lighting estimation

CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects

Authors: Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Abhinav Valada, Thomas Kollar
Contact: heppert@cs.uni-freiburg.de
Links: Paper | Video | Website
Keywords: single-shot 3d reconstruction, articulated objects

CIRCLE: Capture in Rich Contextual Environments

Authors: João Pedro Araújo, Jiaman Li, Karthik Vetrivel, Rishi Agarwal, Jiajun Wu, Deepak Gopinath, Alexander William Clegg, C. Karen Liu
Contact: jparaujo@stanford.edu
Links: Paper | Website
Keywords: motion capture, motion generation, virtual reality, egocentric video

EDGE: Editable Dance Generation from Music

Authors: Jonathan Tseng, Rodrigo Castellon, C. Karen Liu
Contact: jtseng20@stanford.edu
Links: Paper | Website
Keywords: motion, diffusion, music, dance, editing

EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision

Authors: Jiahui Lei, Congyue Deng, Karl Schmeckpeper, Leonidas Guibas, Kostas Daniilidis
Contact: leijh@cis.upenn.edu
Links: Paper | Video | Website
Keywords: pointcloud segmentation, equivariance, weakly-supervised learning

Ego-Body Pose Estimation via Ego-Head Pose Estimation

Authors: Jiaman Li, C. Karen Liu†, Jiajun Wu†
Contact: jiamanli@stanford.edu
Award nominations: Award Candidates
Links: Paper | Video | Website
Keywords: egocentric video, human motion estimation, decomposition, conditional diffusion

GINA-3D: Learning to Generate Implicit Neural Assets in the Wild

Authors: Bokui Shen, Xinchen Yan, Charles R. Qi, Mahyar Najibi, Boyang Deng, Leonidas Guibas, Yin Zhou, Dragomir Anguelov
Contact: willshen@stanford.edu
Links: Paper | Video | Website
Keywords: generative ai, autonomous driving, generative 3d, simulation assets

NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

Authors: Joy Hsu, Jiayuan Mao, and Jiajun Wu
Contact: joycj@stanford.edu
Links: Paper | Website
Keywords: neuro-symbolic learning, visual reasoning, 3d grounding

NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action

Authors: Kuan-Chieh Wang, Zhenzhen Weng, Maria Xenochristou, Joao Pedro Araujo, Jeffrey Gu, C. Karen Liu, Serena Yeung
Contact: wangkua1@stanford.edu
Links: Paper | Video | Website
Keywords: human mesh recovery, human motion, 3d vision, neural field

NeRDi: Single-View NeRF Synthesis With Language-Guided Diffusion As General Image Priors

Authors: Congyue Deng, Chiyu “Max” Jiang, Charles R. Qi, Xinchen Yan, Yin Zhou, Leonidas Guibas, Dragomir Anguelov
Contact: congyue@stanford.edu
Links: Paper | Video | Website
Keywords: nerf, diffusion model, single view to 3d

PROB: Probabilistic Objectness for Open World Object Detection

Authors: Orr Zohar, Kuan-Chieh Wang, Serena Yeung
Contact: orrzohar@stanford.edu
Links: Paper | Video | Website
Keywords: open world learning, open world object detection, object detection, class-agnostic object detection, object detection,

Partial-View Object View Synthesis via Filtering Inversion

Authors: Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber
Contact: fanyun@stanford.edu
Links: Paper | Website
Keywords: view synthesis, partial-view, filtering inversion, gan

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

Authors: Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh
Contact: sumith@stanford.edu
Links: Paper | Website
Keywords: affordances, self-supervision, image synthesis, editing

RealImpact: A Dataset of Impact Sound Fields for Real Objects

Authors: Samuel Clarke, Ruohan Gao, Mason Wang, Mark Rau, Julia Xu, Jui-Hsien Wang, Doug L. James, Jiajun Wu
Contact: spclarke@stanford.edu
Award nominations: Highlight
Links: Paper | Video | Website
Keywords: audio processing, acoustic learning, multimodal data, sound

Keep on top of the latest SAIL Blog posts via , , or email: