Fan-Yun Sun
sunfanyun [at] cs.stanford.edu
I am a final-year CS PhD Candidate at Stanford AI Lab ,
affiliated with the Autonomous Agents Lab and Stanford Vision and Learning Lab .
During my PhD, I also work extensively with Nvidia Research, including the Learning and Perception Research Group , Metropolis Deep Learning (Omniverse) , and the Autonomous Vehicle Research Group .
I'm interested in generating embodied (3D) environments and data to train robotics/RL policies, particularly towards advancing embodied, multi-modal foundational models and their spatial reasoning abilities .
Prior to my PhD, I'm grateful to have worked with Jure Leskovec , Jian Tang at MILA , and Shou-De Lin at NTU .
Outside of research, I occasionally write on X , build AI applications to experiment with new ways of creating (for example ), and have enjoyed hosting conferences / bootcamps .
LinkedIn /
Twitter /
GitHub /
Scholar
News
I'm honored to have received The Google Graduate Fellowship in Computer Science .
I gave a talk at BuzzRobot .
My research was featured in some press outlets (Tech Times , TechXplore , and Interesting Engineering ).
I have received my master's degree in CS as part of the PhD program.
Selected Publications
Your browser does not support the video tag.
3D-Generalist: Self-Improving Vision-Language-Action Models for Crafting 3D Worlds
Fan-Yun Sun ,
Shengguang Wu ,
Christian Jacobsen ,
Thomas Yim ,
Haoming Zou ,
Alex Zook ,
Shangru Li ,
Ethem Can ,
Xunlei Wu ,
Clemens Eppner ,
Valts Blukis ,
Jonathan Tremblay ,
Jiajun Wu ,
Stan Birchfield † ,
Nick Haber †
arXiv , 2024
project page
/
arXiv (TBA)
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
Shengguang Wu ,
Fan-Yun Sun ,
Kaiyue Wen ,
Nick Haber
To appear
project page / arXiv
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
Fan-Yun Sun *,
Weiyu Liu *,
Siyi Gu ,
Dylan Lim ,
Goutam Bhat ,
Federico Tombari ,
Manling Li ,
Nick Haber ,
Jiajun Wu
* Equal Contribution
project page
/
paper
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition
Hsuan Su ,
Hua Farn ,
Fan-Yun Sun ,
Shang-Tse Chen ,
Hung-yi Lee
EMNLP , 2024
project page
/
code
Your browser does not support the video tag.
GRS: Generating Robotic Simulation Tasks from Real-World Images
Alex Zook ,
Fan-Yun Sun ,
Josef Spjut ,
Valts Blukis ,
Stan Birchfield ,
Jonathan Tremblay
arXiv , 2024
arXiv
Your browser does not support the video tag.
FactorSim: Generative Simulation via Factorized Representation
Fan-Yun Sun ,
S. I. Harini ,
Angela Yi ,
Yihan Zhou ,
Alex Zook ,
Jonathan Tremblay ,
Logan Cross ,
Jiajun Wu ,
Nick Haber
NeurIPS , 2024
project page
/
code
Your browser does not support the video tag.
Holodeck: Language-Guided Generation of 3D Embodied Environments
Yue Yang * ,
Fan-Yun Sun * ,
Luca Weihs * ,
Eli Vanderbilt ,
Alvaro Herrasti ,
Winson Han ,
Jiajun Wu ,
Nick Haber ,
Ranjay Krishna ,
Lingjie Liu ,
Chris Callison-Burch ,
Mark Yatskar ,
Aniruddha Kembhavi ,
Christopher Clark
* Equal Technical Contribution
CVPR , 2024
project page
/
code
Partial-View Object View Synthesis via Filtering Inversion
Fan-Yun Sun ,
Jonathan Tremblay ,
Valts Blukis ,
Kevin Lin ,
Danfei Xu ,
Boris Ivanovic ,
Peter Karkus ,
Stan Birchfield ,
Dieter Fox ,
Ruohan Zhang ,
Yunzhu Li ,
Jiajun Wu ,
Marco Pavone ,
Nick Haber
Workshop XRNeRF, CVPR , 2023
3DV , 2024 (Spotlight)
project page
/
paper
/
code
Interaction Modeling with Multiplex Attention
NeurIPS , 2022
project page
/
code
Your browser does not support the video tag.
Physion: Evaluating Physical Prediction from Vision in Humans and Machines
Daniel M Bear ,
Elias Wang ,
Damian Mrowca ,
Felix J Binder ,
Hsiau-Yu Fish Tung ,
RT Pramod ,
Cameron Holdaway ,
Sirui Tao ,
Kevin Smith ,
Fan-Yun Sun ,
Fei-Fei Li ,
Nancy Kanwisher ,
Joshua B Tenenbaum ,
Daniel LK Yamins ,
Judith E Fan
NeurIPS, Datasets and Benchmarks Track , 2021
project page
/
code
Equivariant Neural Network for Factor Graphs
arXiv
InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization
ICLR , 2020 (spotlight)
project page
vGraph: A Generative Model for Joint Community Detection and Node Representation Learning
NeurIPS , 2019
arXiv
/
slides
/
poster
Organ At Risk Segmentation with Multiple Modality
Kuan-Lun Tseng ,
Winston Hsu ,
Chun Ting Wu ,
Ya-Fang Shih ,
Fan-Yun Sun
arXiv
Designing Non-Greedy Agents through Reward Shaping and Regulation Enforcement in Multi-Agent Reinforcement Learning
1 AAAI/ACM conference on AI, Ethics, Society , 2018 (Oral)
2 AAMAS , 2019 (Montreal, QC)
paper 1
/
paper 2
Academic Services & Awards
Program Committee @ AAAI
Reviewer @ NeurIPS, ICLR, CVPR, ICCV, ICML, SIGGRAPH
[ 1st place ] ACM-ICPC Asia Regionals