The Forty-Second International Conference on Machine Learning (ICML) 2025 is being hosted in Vancouver from July 13 to July 19. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts

Authors: Samar Khanna, Medhanie Irgau, David Lobell, Stefano Ermon
Contact: samarkhanna@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: parameter-efficient pre-training, continual learning, PEFT, unsupervised learning


An analytic theory of creativity in convolutional diffusion models

Authors: Mason Kamb, Surya Ganguli
Contact: kambm@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: diffusion models, creativity, combinatorial generalization, theory


What can large language models do for sustainable food?

Authors: Anna T. Thomas, Adam Yee, Andrew Mayne, Maya B. Mathur, Dan Jurafsky, Kristina Gligorić
Contact: thomasat@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: large language models, sustainability, climate, food, health, optimization


Archon: An Architecture Search Framework for Inference-Time Techniques

Authors: Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, Azalia Mirhoseini
Contact: jonsaadfalcon@gmail.com
Workshop: Main Conference
Award nominations: ICLR 2025: SSM FM Workshop, Oral Presentation
Links: Paper | Website
Keywords: inference-time techniques, test-time scaling, machine learning, natural language processing


Auditing Prompt Caching in Language Model APIs

Authors: Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, Tatsunori Hashimoto
Contact: cygu@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: large language models, statistical hypothesis testing, timing attack, privacy


AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Authors: Zhengxuan Wu, Aryaman Arora, Atticus Geiger, Zheng Wang, Jing Huang, Dan Jurafsky, Christopher D. Manning, Christopher Potts
Contact: wuzhengx@stanford.edu
Workshop: Main Conference
Award nominations: Spotlight poster
Links: Paper | Website
Keywords: mechanistic interpretability


Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel

Authors: Carlota Parés-Morlans, Michelle Yi, Claire Chen, Sarah A. Wu, Rika Antonova, Tobias Gerstenberg, Jeannette Bohg
Contact: cpares@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: active exploration, physical reasoning, causality, bayesian optimization


CollabLLM: From Passive Responders to Active Collaborators

Authors: Shirley Wu, Michel Galley, Baolin Peng, Hao Cheng, Gavin Li, Yao Dou, Weixin Cai, James Zou, Jure Leskovec, Jianfeng Gao
Contact: shirwu@cs.stanford.edu
Workshop: Main Conference
Award nominations: Oral, Outstanding Paper (6 out of all oral papers, but the organizers said in the email that “This information is under embargo until we announce it officially during the Opening Remarks on the first day of ICML, so please do not share it publicly before then.” so not sure if this is ok to share)
Links: Paper | Blog Post | Website
Keywords: human-centered large language model, multiturn interaction, collaborative problem-solving, reinforcement learning


Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World

Authors: Joshua Kazdan, Rylan Schaeffer, Apratim Dey, Matthias Gerstgrasser, Rafael Rafailov, David L. Donoho, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: model collapse, synthetic data, model-data feedback loops, data-model feedback loops, generative models, generative modeling, kernel density estimation, supervised finetuning, machine learning, icml


Confounder-Free Continual Learning via Recursive Feature Normalization

Authors: Yash Shah, Camila Gonzalez, Mohammad H. Abbasi, Qingyu Zhao, Kilian M. Pohl, Ehsan Adeli
Contact: {ynshah,eadeli}@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: deep neural networks, confounders, continual learning, invariant representations, statistical regression


Cost-efficient Collaboration between On-device and Cloud Language Models

Authors: Avanika Narayan*, Dan Biderman*, Sabri Eyuboglu*, Avner May, Scott Linderman, James Zou, Christopher Re
Contact: avanikan@stanford.edu
Workshop: Main Conference
Links: Paper | Blog Post | Video | Website
Keywords: local-remote collaboration, edge ai, on-device ai, edge-cloud hybrid systems


Gaussian Mixture Flow Matching Models

Authors: Hansheng Chen, Kai Zhang, Hao Tan, Zexiang Xu, Fujun Luan, Leonidas Guibas, Gordon Wetzstein, Sai Bi
Contact: hanshengchen@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: diffusion models


Geometric Algebra Planes: Convex Implicit Neural Volumes

Authors: Irmak Sivgin, Sara Fridovich-Keil, Gordon Wetzstein, Mert Pilanci
Contact: gordon.wetzstein@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: representation learning


Geometric Generative Modeling with Noise-Conditioned Graph Networks

Authors: Peter Pao-Huang,Mitchell Black,Xiaojie Qiu
Contact: peterph@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: graph neural networks,generative models,diffusion models,flow-matching


How Do Large Language Monkeys Get Their Power (Laws)?

Authors: Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Workshop: Main Conference
Award nominations: Oral
Links: Paper
Keywords: scaling laws, inference compute, scaling inference compute, test-time compute, scaling test-time compute, language models, evaluations, scaling-predictable evaluations,machine learning, icml


Independence Tests for Language Models

Authors: Sally Zhu, Ahmed M. Ahmed, Rohith Kuditipudi, Percy Liang
Contact: salzhu@stanford.edu
Workshop: Main Conference
Award nominations: Spotlight
Links: Paper
Keywords: model provenance, language models


Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

Authors: Jing Huang*, Junyi Tao*, Thomas Icard, Diyi Yang, Christopher Potts
Contact: hij@stanford.edu
Workshop: Main Conference, Workshop
Links: Paper
Keywords: causal abstraction, causal interpretability, ood, correctness prediction


KernelBench: Can LLMs Write Efficient GPU Kernels?

Authors: Anne Ouyang*, Simon Guo*, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini
Contact: simonguo@stanford.edu
Workshop: Main Conference
Links: Paper | Blog Post | Website
Keywords: benchmark, gpu kernel design, code generation


PhD Student

Authors: Amber Xie, Oleh Rybkin, Dorsa Sadigh, Chelsea Finn
Contact: amberxie@stasnford.edu
Workshop: Main Conference
Award nominations: Spotlight Poster
Links: Paper | Website
Keywords: imitation learning, diffusion, planning, robotics


RBench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation

Authors: Meng-Hao Guo, Jiajun Xu, Yi Zhang, Jiaxi Song, Haoyang Peng, Yi-Xuan Deng, Xinzhi Dong, Kiyohiro Nakayama, Zhengyang Geng, Chen Wang, Bolin Ni, Guo-Wei Yang, Yongming Rao, Houwen Peng, Han Hu, Gordon Wetzstein, Shi-min Hu
Contact: gordon.wetzstein@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: benchmark


Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations

Authors: Kevin L. Wei, Patricia Paskov, Sunishchal Dev, Michael J. Byun, Anka Reuel, Xavier Roberts-Gaal, Rachel Calcott, Evie Coxon, Chinmay Deshpande
Contact: kevinwei@acm.org, anka.reuel@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: human baselines, evaluation, ai, governance


RelGNN: Composite Message Passing for Relational Deep Learning

Authors: Tianlang Chen, Charilaos Kanatsoulis, Jure Leskovec
Contact: chentl@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: gnn, relational deep learning


Shrinking the Generation-Verification Gap with Weak Verifiers

Authors: Jon Saad-Falcon, E. Kelly Buchanan, Mayee F. Chen, Tzu-Heng Huang, Brendan McLaughlin, Tanvir Bhathal, Shang Zhu, Ben Athiwaratkun, Frederic Sala, Scott Linderman, Azalia Mirhoseini, Christopher Ré
Contact: jonsaadfalcon@gmail.com
Workshop: Workshop
Links: Paper | Blog Post | Website
Keywords: test-time compute, repeated sampling, weak supervision, weak verification


TAROT: Targeted Data Selection via Optimal Transport

Authors: Lan Feng, Fan Nie, Yuejiang Liu, Alexandre Alahi
Contact: yuejiang.liu@stanford.edu
Workshop: Main Conference
Links: Paper | Blog Post
Keywords: data selection, optimal transport, data attribution


Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Authors: Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda, Gabriel Mukobi, Varun Madan, Adam Ibrahim, Herbie Bradley, Stella Biderman, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: evaluations, benchmarks, scaling laws, emergent abilities, capabilities, frontier models, foundation models, machine learning, icml


We look forward to seeing you at ICML 2025!