
The Neural Information Processing Systems conference (NeurIPS) 2025 is being hosted in San Diego from December 2nd to 7th. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!
List of Accepted Papers
MCP Explorer: Interactive Learning Experience
Authors: Jiayu He, Sherry Ruan, James Landay
Contact: sruan@cs.stanford.edu
Workshop: NeurIPS Educational Content for the AI Education Resource Showcase (Oral Presentation)
Links: Website
Keywords: model context protocol (mcp), ai assistants, interactive learning, responsible ai, tool use
Preference Learning with Response Time
Authors: Ayush Sawarni, Sahasrajit Sarmasarkar, Vasilis Syrgkanis
Contact: ayushsaw@stanfored.edu
Workshop: Main Conference
Links: Video
Keywords: rlhf, preference learning, orthogonal statistics,
Procurement Auctions with Predictions: Improved Frugality for Facility Location
Authors: Eric Balkanski ~Eric_Balkanski2 , Nicholas DeFilippis, Vasilis Gkatzelis, Xizhi Tan
Contact: xizhi@stanford.edu
Workshop: Main Conference
Keywords: frugality, mechanism design, procurement auction
Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models
Authors: Wentse Chen, Jiayu Chen, Fahim Tajwar, Hao Zhu, Xintong Duan, Ruslan Salakhutdinov, Jeff Schneider
Contact: zhuhao@stanford.edu
Workshop: Main Conference
Keywords: reinforcement learning, large language models, self-evolving agents
A Practical Guide for Incorporating Symmetry in Diffusion Policy
Authors: Dian Wang, Boce Hu, Shuran Song, Robin Walters, Robert Platt
Contact: dianwang@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: robotic manipulation, equivariance, diffusion model
Agentic Bridge Framework: Closing the Gap Between Agentic Capability and Performance Benchmarks
Authors: Yun Du, Rubens Lacouture, Qizheng Zhang, Genghan Zhang, Tian Zhao, Kunle Olukotun
Contact: yundu27@stanford.edu
Workshop: Workshop
Links: Paper | Website
Keywords: agents, llms, benchmarking, gaia benchmark, ml systems, multi-agent systems, system optimizations, agentic workflows, agentic ai, trace-level telemetry
Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents
Authors: Qizheng Zhang, Michael Wornow, Kunle Olukotun
Contact: qizhengz@stanford.edu
Workshop: Main Conference
Links: Paper | Video
Keywords: caching, memory, serving, llm agents
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
Authors: Anjiang Wei, Tarun Suresh, Jiannan Cao, Naveen Kannan, Yuheng Wu, Kai Yan, Thiago S. F. X. Teixeira, Ke Wang, Alex Aiken
Contact: anjiang@cs.stanford.edu
Workshop: Workshop
Links: Paper | Website
Keywords: agent, large language model, reasoning, code, program synthesis
Discovering Latent Graphs with GFlowNets for Diverse Conditional Image Generation
Authors: Bailey Trang, Parham Saremi, Alan Wang, Fangrui Huang, Zahra TehraniNasab, Amar Kumar, Tal Arbel, Fei-Fei Li, Ehsan Adeli
Contact: eadeli@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: generative models, diffusion model, diversity, gflownet
DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance
Authors: Maximilian Du, Shuran Song
Contact: maxjdu@stanford.edu
Workshop: Main Conference
Links: Paper | Video | Website
Keywords: robots, steering behaviors, imitation learning
Exploring Diffusion Transformer Designs via Grafting
Authors: Keshigeyan Chandrasegaran, Michael Poli, Daniel Y. Fu, Dongjun Kim, Lea M. Hadzic, Manling Li, Agrim Gupta, Stefano Massaroli, Azalia Mirhoseini, Juan Carlos Niebles, Stefano Ermon, Li Fei-Fei
Contact: keshik@stanford.edu
Workshop: Main Conference
Award nominations: Oral
Links: Paper | Blog Post | Website
Keywords: diffusion transformers, model grafting, architectural editing, hybrid model architectures
Fantastic Bugs and Where to Find Them in AI Benchmarks
Authors: Sang Truong, Yuheng Tu, Michael Hardy, Anka Reuel, Zeyu Tang, Jirayu Burapacheep, Jonathan Perera, Chibuike Uwakwe, Ben Domingue, Nick Haber, Sanmi Koyejo
Contact: sttruong@cs.stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: benchmark, evaluation, measurement theory
From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries
Authors: Joy Hsu, Emily Jin, Jiajun Wu, and Niloy J. Mitra
Contact: joycj@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: factorization, library learning, real-world scene generation
HouseLayout3D Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild
Authors: Valentin Bieri, Marie-Julie Rakotosaona, Keisuke Tateno, Francis Engelmann, Leonidas Guibas
Contact: engelmann@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Video | Website
Keywords: 3d scene understanding, 3d scene generation, cadification
In-Context Learning Strategies Emerge Rationally
Authors: Daniel Wurgaft, Ekdeep Singh Lubana, Core Francisco Park, Hidenori Tanaka, Gautam Reddy, Noah D. Goodman
Contact: wurgaft@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: in-context learning, loss-complexity tradeoff, bayesian modeling, algorithmic complexity
Joint Design of Protein Surface and Structure Using a Diffusion Bridge Model
Authors: Guanlue Li, Xufeng Zhao, Fang Wu, Sören Laue
Contact: fangwu97@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: protein design, diffusion model
LLM-Guided Autoscheduling for Large-Scale Sparse Machine Learning
Authors: Rubens Lacouture, Genghan Zhang, Konstantin Hossfeld, Tian Zhao, Kunle Olukotun
Contact: rubensl@stanford.edu
Workshop: Workshop
Links: Paper
Keywords: sparse machine learning, compiler optimization, autoscheduling
Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution
Authors: Zhanyi Sun, Shuran Song
Contact: zhanyis@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: out-of-distribution generalization, imitation learning, robotic manipulation
On the Entropy Calibration of Language Models
Authors: Steven Cao, Gregory Valiant, Percy Liang
Contact: shcao@stanford.edu
Workshop: Main Conference
Links: Paper | Video
Keywords: language models, language generation, calibration, entropy, error accumulation, scaling laws, language model theory, rl theory
SATBench: Benchmarking LLMs’ Logical Reasoning via Automated Puzzle Generation from SAT Formulas
Authors: Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken
Contact: anjiang@cs.stanford.edu
Workshop: Workshop
Links: Paper | Website
Keywords: reasoning, sat solving, benchmark
SWE-smith: Scaling Data for Software Engineering Agents
Authors: John Yang, Kilian Lieret, Carlos E. Jimenez, Alexander Wettig, Kabir Khandpur, Yanzhe Zhang, Binyuan Hui, Ofir Press, Ludwig Schmidt, Diyi Yang
Contact: johnby@stanford.edu
Workshop: Main Conference
Award nominations: Spotlight
Links: Paper | Blog Post | Video | Website
Keywords: software engineering, language models, swe-bench, swe-agent
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
Authors: Xianzhe Fan, Xuhui Zhou, Chuanyang Jin, Kolby Nottingham, Hao Zhu, Maarten Sap
Contact: zhuhao@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: theory of mind, embodied ai, vision-language models
VIPScene: Video Perception Models for 3D Scene Synthesis
Authors: Rui Huang, Guangyao Zhai, Zuria Bauer, Marc Pollefeys, Federico Tombari, Leonidas Guibas, Gao Huang, Francis Engelmann
Contact: engelmann@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Video | Website
Keywords: 3d scene generation, video models
Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time
Authors: Daniel D. Richman, Jessica Karaguesian, Carl-Mikael Suomivuori, Ron O. Dror
Contact: ddrichma@stanford.edu, jkara@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: protein structure, diffusion
We look forward to seeing you at NeurIPS 2025!