The International Conference on Learning Representations (ICLR) 2020 is being hosted virtually from April 26th - May 1st. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

Aligning Language Models with Demonstrated Feedback

Authors: Omar Shaikh, Michelle S. Lam, Joey Hejna, Yijia Shao, Hyundong Justin Cho, Michael S. Bernstein, Diyi Yang
Contact: oshaikh@stanford.edu
Workshop: Main Conference
Keywords: personalization, few-shot learning, human computer interaction, alignment


3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Authors: Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan Yang
Contact: yhxu@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: 3d scene editing; gaussian splatting;


Adaptive Self-improvement LLM Agentic System for ML Library Development

Authors: Genghan Zhang, Weixin Liang, Olivia Hsu, Kunle Olukotun
Contact: zgh23@stanford.edu
Workshop: Workshop
Award nominations: DL4C @ ICLR 2025 BestPaper
Links: Paper | Blog Post | Website
Keywords: llm agents, self-improvement learning, machine learning library


Archon: An Architecture Search Framework for Inference-Time Techniques

Authors: Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Kumar Guha, E. Kelly Buchanan, Mayee F Chen, Neel Guha, Christopher Re, Azalia Mirhoseini
Contact: jonsaadfalcon@gmail.com
Workshop: Workshop
Award nominations: Oral Presentation
Links: Paper | Website
Keywords: inference-time techniques, test-time scaling, machine learning, natural language processing


BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Authors: Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Liu Haisu, Quan Shi, Zachary S Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O Arik, Danqi Chen, Tao Yu
Contact: hjsu@cs.hku.hk
Workshop: Main Conference
Award nominations: Spotlight
Links: Paper | Blog Post | Website
Keywords: retrieval benchmark, reasoning


Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

Authors: Yuejiang Liu, Jubayer Ibn Hamid, Annie Xie, Yoonho Lee, Max Du, Chelsea Finn
Contact: yuejiang.liu@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: robot learning, action chunking, action decoding, test-time compute


BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Authors: Terry Yue Zhuo, Vu Minh Chien, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen GONG, James Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu, Zijian Wang, Binyuan Hui, Niklas Muennighoff, David Lo, Daniel Fried, Xiaoning Du, Harm de Vries, Leandro Von Werra
Contact: contact@bigcode-project.org
Workshop: Main Conference
Award nominations: Oral
Links: Paper | Blog Post | Website
Keywords: code generation, tool use, instruction following, benchmark


Bridging the Data Provenance Gap Across Text, Speech, and Video

Authors: Shayne Longpre, Nikhil Singh, Manuel Cherep, Kushagra Tiwary, Joanna Materzynska, William Brannon, Robert Mahari, Naana Obeng-Marnu, Manan Dey, Mohammed Hamdy, Nayan Saxena, Ahmad Mustafa Anis, Emad A. Alghamdi, Vu Minh Chien, Da Yin, Kun Qian, Yizhi LI, Minnie Liang, An Dinh, Shrestha Mohanty, Deividas Mataciunas, Tobin South, Jianguo Zhang, Ariel N. Lee, Campbell S. Lund, Christopher Klamm, Damien Sileo, Diganta Misra, Enrico Shippole, Kevin Klyman, Lester James Validad Miranda, Niklas Muennighoff, Seonghyeon Ye, Seungone Kim, Vipul Gupta, Vivek Sharma, Xuhui Zhou, Caiming Xiong, Luis Villa, Stella Biderman, Alex Pentland, Sara Hooker, Jad Kabbara
Contact: data.provenance.init@gmail.com
Workshop: Main Conference
Links: Paper | Website
Keywords: training data, audit, speech, video, text


CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Authors: Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang
Contact: yhxu@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: video generative models; 3d control for video generation


Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Authors: Chenglei Si, Diyi Yang, Tatsunori Hashimoto
Contact: clsi@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: large language models, automating research


Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHR Data

Authors: Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez, Ethan Steinberg, Jason Alan Fries, Christopher Re, Sanmi Koyejo, Nigam Shah
Contact: mwornow@stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: healthcare, foundation models, long context


Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

Authors: Andy K Zhang, Neil Perry, Riya Dulepet, Joey Ji, Celeste Menders, Justin W Lin, Eliot Jones, Gashon Hussein, Samantha Liu, Donovan Julian Jasper, Pura Peetathawatchai, Ari Glenn, Vikram Sivashankar, Daniel Zamoshchin, Leo Glikbarg, Derek Askaryar, Haoxiang Yang, Aolin Zhang, Rishi Alluri, Nathan Tran, Rinnara Sangpisit, Kenny O Oseleononmen, Dan Boneh, Daniel E. Ho, Percy Liang
Contact: andyzh@stanford.edu
Workshop: Main Conference
Award nominations: Oral
Links: Paper | Website
Keywords: language model agents, benchmark, cybersecurity, risk


Dr.

Authors: Christopher Fifty, Ronald Guenther Junkins, Dennis Duan, Aniketh Iyengar, Jerry Weihong Liu, Ehsan Amid, Sebastian Thrun, Christopher Ré
Contact: fifty@cs.stanford.edu
Workshop: Main Conference
Award nominations: Oral
Links: Paper | Website
Keywords: generative modeling, computer vision


Energy-Based Diffusion Language Models for Text Generation

Authors: Minkai Xu, Tomas Geffner, Karsten Kreis, Weili Nie, Yilun Xu, Jure Leskovec, Stefano Ermon, Arash Vahdat
Contact: minkai@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: language models, discrete diffusion models, energy-based models


Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

Authors: Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
Contact: rschaef@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: adversarial robustness, jailbreaking, language model, vision language model


Foundation Models Secretly Understand Neural Network Weights: Enhancing Hypernetwork Architectures with Foundation Models

Authors: Jeffrey Gu, Serena Yeung-Levy
Contact: jeffgu@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: hypernetworks, neural fields, implicit neural representations, generalizable neural fields, foundation models


Generative Representational Instruction Tuning

Authors: Niklas Muennighoff, Hongjin SU, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela
Contact: niklasm@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: large language models, instruction tuning, text embedding


KernelBench: Can LLMs Write Efficient GPU Kernels?

Authors: Anne Ouyang*, Simon Guo*, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini
Contact: simonguo@stanford.edu
Workshop: Workshop
Award nominations: Best Paper - Deep Learning for Code Workshop
Links: Paper | Blog Post | Website
Keywords: code generation, ml systems, gpu kernels, benchmark


Learning Efficient Positional Encodings with Graph Neural Networks

Authors: Charilaos Kanatsoulis, Evelyn Choi, Stefanie Jegelka, Jure Leskovec, Alejandro Ribeiro
Contact: charilaos@cs.stanford.edu
Workshop: Main Conference
Links: Paper
Keywords: graph transformers, positional encodings, graph neural networks


LoLCATs: On Low-Rank Linearizing of Large Language Models

Authors: Michael Zhang, Simran Arora, Rahul Chalamala, Benjamin Frederick Spector, Alan Wu, Krithik Ramesh, Aaryan Singhal, Christopher Re
Contact: mzhang@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Blog Post
Keywords: llms, efficient architectures, attention


MMTEB: Massive Multilingual Text Embedding Benchmark

Authors: Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, Márton Kardos, Ashwin Mathur, David Stap, Jay Gala, Wissam Siblini, Dominik Krzemiński, Genta Indra Winata, Saba Sturua, Saiteja Utpala, Mathieu Ciancone, Marion Schaeffer, Diganta Misra, Shreeya Dhakal, Jonathan Rystrøm, Roman Solomatin, Ömer Veysel Çağatan, Akash Kundu, Martin Bernstorff, Shitao Xiao, Akshita Sukhlecha, Bhavish Pahwa, Rafał Poświata, Kranthi Kiran GV, Shawon Ashraf, Daniel Auras, Björn Plüster, Jan Philipp Harries, Loïc Magne, Isabelle Mohr, Dawei Zhu, Hippolyte Gisserot-Boukhlef, Tom Aarsen, Jan Kostkan, Konrad Wojtasik, Taemin Lee, Marek Suppa, Crystina Zhang, Roberta Rocca, Mohammed Hamdy, Andrianos Michail, John Yang, Manuel Faysse, Aleksei Vatolin, Nandan Thakur, Manan Dey, Dipam Vasani, Pranjal A Chitale, Simone Tedeschi, Nguyen Tai, Artem Snegirev, Mariya Hendriksen, Michael Günther, Mengzhou Xia, Weijia Shi, Xing Han Lù, Jordan Clive, Gayatri K, Maksimova Anna, Silvan Wehrli, Maria Tikhonova, Henil Shalin Panchal, Aleksandr Abramov, Malte Ostendorff, Zheng Liu, Simon Clematide, Lester James Validad Miranda, Alena Fenogenova, Guangyu Song, Ruqiya Bin Safi, Wen-Ding Li, Alessia Borghini, Federico Cassano, Lasse Hansen, Sara Hooker, Chenghao Xiao, Vaibhav Adlakha, Orion Weller, Siva Reddy, Niklas Muennighoff
Contact: kenneth.enevoldsen@cas.au.dk
Workshop: Main Conference
Links: Paper | Website
Keywords: natural language processing, benchmark, sentence embeddings, multilingual


Mechanistic Interpretability Meets Vision Language Models: Insights and Limitations

Authors: Yiming Liu*, Yuhui Zhang*, Serena Yeung-Levy
Contact: yuhuiz@stanford.edu
Workshop: Blog Track
Links: Paper
Keywords: vision language models, mechanistic interpretability


Model Equality Testing: Which Model is this API Serving?

Authors: Irena Gao, Percy Liang, Carlos Guestrin
Contact: irena@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: api monitoring, model shift, two-sample testing


MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Authors: Julie Kallini, Shikhar Murty, Christopher D. Manning, Christopher Potts, Róbert Csordás
Contact: kallini@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: nlp, byt5, t5, tokenization, byte-level language models, character-level language models


OLMoE: Open Mixture-of-Experts Language Models

Authors: Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Evan Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi
Contact: niklasm@stanford.edu
Workshop: Main Conference
Award nominations: Oral
Links: Paper | Website
Keywords: large language models, mixture-of-experts, open-source


OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig
Contact: xingyao6@illinois.edu, gneubig@cs.cmu.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: ai agents, evaluation, infrastructure, benchmark


Predicate Hierarchies Improve Few-Shot State Classification

Authors: Emily Jin*, Joy Hsu*, Jiajun Wu
Contact: emilyjin@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: few-shot state classification, predicate hierarchies


Real2Code: Reconstruct Articulated Objects via Code Generation

Authors: Zhao Mandi, Yijia Weng, Dominik Bauer, Shuran Song
Contact: mandi@stanford.edu
Workshop: Main Conference
Links: Paper | Blog Post | Website
Keywords: code llms; articulated objects; digital twins; foundation models


Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering

Authors: Sheng Liu, Haotian Ye, James Zou
Contact: shengl@stanford.edu
Workshop: Main Conference
Award nominations: Spotlight
Links: Paper | Website
Keywords: hallucination, multimodal language model, large language model


RegMix: Data Mixture as Regression for Language Model Pre-training

Authors: Qian Liu, Xiaosen Zheng, Niklas Muennighoff, Guangtao Zeng, Longxu Dou, Tianyu Pang, Jing Jiang, Min Lin
Contact: liuqian.sea@gmail.com
Workshop: Main Conference
Award nominations: Spotlight
Links: Paper | Website
Keywords: language model pre-training, data mixture, regression


SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

Authors: John Yang, Carlos E. Jimenez, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik R. Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press
Contact: johnby@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: language models, natural language processing, software engineering


Scaling Laws for Precision

Authors: Tanishq Kumar, Zachary Ankner, Benjamin Frederick Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Re, Aditi Raghunathan
Contact: tkumar@college.harvard.edu
Workshop: Main Conference
Award nominations: Oral
Links: Paper
Keywords: quantization, scaling laws, precision, language models


Societal Impacts Research Requires Benchmarks for Creative Composition Tasks

Authors: Judy Hanwen Shen, Carlos Guestrin
Contact: jhshen@stanford.edu
Workshop: Workshop
Award nominations: Oral Presentation
Links: Paper
Keywords: societal impacts, creativity, position paper


Synthetic Continued Pretraining

Authors: Zitong Yang*, Neil Band*, Shuangping Li, Emmanuel Candès, Tatsunori Hashimoto
Contact: zitong@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: synthetic data, continued pretraining


TEOChat: Large Language and Vision Assistant for Temporal Earth Observation Data

Authors: Jeremy Andrew Irvin, Emily Ruoyu Liu, Joyce Chuyi Chen, Ines Dormoy, Jinyoung Kim, Samar Khanna, Zhuo Zheng, Stefano Ermon
Contact: jirvin16@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: vision-language model, large multimodal model, satellite imagery, earth observation, change detection


TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation

Authors: Juntong Shi, Minkai Xu, Harper Hua, Hengrui Zhang, Stefano Ermon, Jure Leskovec
Contact: minkai@cs.stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: tabular representative learning, generative models, diffusion models


The Utility and Complexity of in- and out-of-Distribution Machine Unlearning

Authors: Youssef Allouah, Joshua Kazdan, Rachid Guerraoui, Sanmi Koyejo
Contact: youssef.allouah@epfl.ch
Workshop: Main Conference
Links: Paper
Keywords: machine unlearning, differential privacy, optimization, theory, right to be forgotten


Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models

Authors: Caia Costello
Contact: caia@stanford.edu
Workshop: Workshop
Links: Paper
Keywords: fine-tuning, code generation, synthetic data, self-improvement, reasoning


TopoLM: brain-like spatio-functional organization in a topographic language model

Authors: Neil Rathi, Johannes Mehrer, Badr AlKhamissi, Taha Osama A Binhuraib, Nicholas Blauch, Martin Schrimpf
Contact: rathi@stanford.edu
Workshop: Main Conference
Award nominations: oral
Links: Paper | Website
Keywords: language modeling, topography, fmri, neuroscience


Video Action Differencing

Authors: James Burgess, Xiaohan Wang, Yuhui Zhang, Anita Rau, Alejandro Lozano, Lisa Dunlap, Trevor Darrell, Serena Yeung-Levy
Contact: jmhb@stanford.edu
Workshop: Main Conference
Links: Paper | Blog Post | Website
Keywords: video, action, comparion, lvm, lmm, benchmark


What Makes a Maze Look Like a Maze?

Authors: Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Noah D. Goodman, and Jiajun Wu
Contact: joycj@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: visual reasoning, abstract concepts, schemas


What’s the Move? Hybrid Imitation Learning via Salient Points

Authors: Priya Sundaresan*, Hengyuan Hu*, Quan Vuong, Jeannette Bohg, Dorsa Sadigh
Contact: priyasun@stanford.edu
Workshop: Main Conference
Links: Paper | Website
Keywords: imitation learning, robot learning, robot manipulation, robotics


s1: Simple test-time scaling

Authors: Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candes, Tatsunori Hashimoto
Contact: niklasm@stanford.edu
Workshop: Workshop
Award nominations: Oral
Links: Paper | Video | Website
Keywords: test-time scaling, reasoning, large language models


test

Authors: test
Contact: meghabs@gmail.com
Workshop: Main Conference
Keywords: tset


“I Am the One and Only, Your Cyber BFF”: Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI

Authors: Myra Cheng, Alicia DeVrio, Lisa Egede, Su Lin Blodgett, Alexandra Olteanu
Contact: myra1@stanford.edu
Workshop: Blogposts Track
Keywords: anthropomorphism, societal impacts


We look forward to seeing you at CONF_NAME!