
The Forty-Second International Conference on Machine Learning (ICML) 2025 is being hosted in Vancouver from July 13 to July 19. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!
List of Accepted Papers
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
Authors: Samar Khanna, Medhanie Irgau, David Lobell, Stefano Ermon
Contact: samarkhanna@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: parameter-efficient pre-training, continual learning, PEFT, unsupervised learning
An analytic theory of creativity in convolutional diffusion models
Authors: Mason Kamb, Surya Ganguli
Contact: kambm@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: diffusion models, creativity, combinatorial generalization, theory
What can large language models do for sustainable food?
Authors: Anna T. Thomas, Adam Yee, Andrew Mayne, Maya B. Mathur, Dan Jurafsky, Kristina Gligorić
Contact: thomasat@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: large language models, sustainability, climate, food, health, optimization
Archon: An Architecture Search Framework for Inference-Time Techniques
Authors: Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, Azalia Mirhoseini
Contact: jonsaadfalcon@gmail.com
Workshop: Main Conference
Award nominations: ICLR 2025: SSM FM Workshop, Oral Presentation
Links: Paper | Website
Keywords: inference-time techniques, test-time scaling, machine learning, natural language processing
Auditing Prompt Caching in Language Model APIs
Authors: Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, Tatsunori Hashimoto
Contact: cygu@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: large language models, statistical hypothesis testing, timing attack, privacy
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
Authors: Zhengxuan Wu, Aryaman Arora, Atticus Geiger, Zheng Wang, Jing Huang, Dan Jurafsky, Christopher D. Manning, Christopher Potts
Contact: wuzhengx@stanford.edu
Workshop: Main Conference
Award nominations: Spotlight poster
Links: Paper | Website
Keywords: mechanistic interpretability
Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel
Authors: Carlota Parés-Morlans, Michelle Yi, Claire Chen, Sarah A. Wu, Rika Antonova, Tobias Gerstenberg, Jeannette Bohg
Contact: cpares@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: active exploration, physical reasoning, causality, bayesian optimization
CollabLLM: From Passive Responders to Active Collaborators
Authors: Shirley Wu, Michel Galley, Baolin Peng, Hao Cheng, Gavin Li, Yao Dou, Weixin Cai, James Zou, Jure Leskovec, Jianfeng Gao
Contact: shirwu@cs.stanford.edu
Workshop: Main Conference
Award nominations: Oral, Outstanding Paper (6 out of all oral papers, but the organizers said in the email that “This information is under embargo until we announce it officially during the Opening Remarks on the first day of ICML, so please do not share it publicly before then.” so not sure if this is ok to share)
Links: Paper | Blog Post | Website
Keywords: human-centered large language model, multiturn interaction, collaborative problem-solving, reinforcement learning
Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Authors: Joshua Kazdan, Rylan Schaeffer, Apratim Dey, Matthias Gerstgrasser, Rafael Rafailov, David L. Donoho, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: model collapse, synthetic data, model-data feedback loops, data-model feedback loops, generative models, generative modeling, kernel density estimation, supervised finetuning, machine learning, icml
Confounder-Free Continual Learning via Recursive Feature Normalization
Authors: Yash Shah, Camila Gonzalez, Mohammad H. Abbasi, Qingyu Zhao, Kilian M. Pohl, Ehsan Adeli
Contact: {ynshah,eadeli}@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: deep neural networks, confounders, continual learning, invariant representations, statistical regression
Cost-efficient Collaboration between On-device and Cloud Language Models
Authors: Avanika Narayan*, Dan Biderman*, Sabri Eyuboglu*, Avner May, Scott Linderman, James Zou, Christopher Re
Contact: avanikan@stanford.edu
Workshop: Main Conference
Links: Paper | Blog Post | Video | Website
Keywords: local-remote collaboration, edge ai, on-device ai, edge-cloud hybrid systems
Gaussian Mixture Flow Matching Models
Authors: Hansheng Chen, Kai Zhang, Hao Tan, Zexiang Xu, Fujun Luan, Leonidas Guibas, Gordon Wetzstein, Sai Bi
Contact: hanshengchen@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: diffusion models
Geometric Algebra Planes: Convex Implicit Neural Volumes
Authors: Irmak Sivgin, Sara Fridovich-Keil, Gordon Wetzstein, Mert Pilanci
Contact: gordon.wetzstein@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: representation learning
Geometric Generative Modeling with Noise-Conditioned Graph Networks
Authors: Peter Pao-Huang,Mitchell Black,Xiaojie Qiu
Contact: peterph@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: graph neural networks,generative models,diffusion models,flow-matching
How Do Large Language Monkeys Get Their Power (Laws)?
Authors: Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Workshop: Main Conference
Award nominations: Oral
Links: Paper
Keywords: scaling laws, inference compute, scaling inference compute, test-time compute, scaling test-time compute, language models, evaluations, scaling-predictable evaluations,machine learning, icml
Independence Tests for Language Models
Authors: Sally Zhu, Ahmed M. Ahmed, Rohith Kuditipudi, Percy Liang
Contact: salzhu@stanford.edu
Workshop: Main Conference
Award nominations: Spotlight
Links: Paper
Keywords: model provenance, language models
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Authors: Jing Huang*, Junyi Tao*, Thomas Icard, Diyi Yang, Christopher Potts
Contact: hij@stanford.edu
Workshop: Main Conference, Workshop
Links: Paper
Keywords: causal abstraction, causal interpretability, ood, correctness prediction
KernelBench: Can LLMs Write Efficient GPU Kernels?
Authors: Anne Ouyang*, Simon Guo*, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini
Contact: simonguo@stanford.edu
Workshop: Main Conference
Links: Paper | Blog Post | Website
Keywords: benchmark, gpu kernel design, code generation
PhD Student
Authors: Amber Xie, Oleh Rybkin, Dorsa Sadigh, Chelsea Finn
Contact: amberxie@stasnford.edu
Workshop: Main Conference
Award nominations: Spotlight Poster
Links: Paper | Website
Keywords: imitation learning, diffusion, planning, robotics
RBench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Authors: Meng-Hao Guo, Jiajun Xu, Yi Zhang, Jiaxi Song, Haoyang Peng, Yi-Xuan Deng, Xinzhi Dong, Kiyohiro Nakayama, Zhengyang Geng, Chen Wang, Bolin Ni, Guo-Wei Yang, Yongming Rao, Houwen Peng, Han Hu, Gordon Wetzstein, Shi-min Hu
Contact: gordon.wetzstein@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: benchmark
Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations
Authors: Kevin L. Wei, Patricia Paskov, Sunishchal Dev, Michael J. Byun, Anka Reuel, Xavier Roberts-Gaal, Rachel Calcott, Evie Coxon, Chinmay Deshpande
Contact: kevinwei@acm.org, anka.reuel@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: human baselines, evaluation, ai, governance
RelGNN: Composite Message Passing for Relational Deep Learning
Authors: Tianlang Chen, Charilaos Kanatsoulis, Jure Leskovec
Contact: chentl@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: gnn, relational deep learning
Shrinking the Generation-Verification Gap with Weak Verifiers
Authors: Jon Saad-Falcon, E. Kelly Buchanan, Mayee F. Chen, Tzu-Heng Huang, Brendan McLaughlin, Tanvir Bhathal, Shang Zhu, Ben Athiwaratkun, Frederic Sala, Scott Linderman, Azalia Mirhoseini, Christopher Ré
Contact: jonsaadfalcon@gmail.com
Workshop: Workshop
Links: Paper | Blog Post | Website
Keywords: test-time compute, repeated sampling, weak supervision, weak verification
TAROT: Targeted Data Selection via Optimal Transport
Authors: Lan Feng, Fan Nie, Yuejiang Liu, Alexandre Alahi
Contact: yuejiang.liu@stanford.edu
Workshop: Main Conference
Links: Paper | Blog Post
Keywords: data selection, optimal transport, data attribution
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Authors: Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda, Gabriel Mukobi, Varun Madan, Adam Ibrahim, Herbie Bradley, Stella Biderman, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: evaluations, benchmarks, scaling laws, emergent abilities, capabilities, frontier models, foundation models, machine learning, icml
We look forward to seeing you at ICML 2025!