The Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) 2023 is being hosted in New Orleans from December 10th - December 16th. We’re excited to share all the work from SAIL that’s being presented at the main conference, at the Datasets and Benchmarks track and the various workshops. You can find links to papers, videos and blogs below.

Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

Main Conference

Are Emergent Abilities of Large Language Models a Mirage?

Authors: Rylan Schaeffer, Brando Miranda, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Award nominations: Oral
Keywords: recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. what makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher’s choice of metric rather than due to fundamental changes in model behavior with scale. specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance. we present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the instructgpt/gpt-3 family on tasks with claimed emergent abilities; (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on big-bench; and (3) show to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks. via all three analyses, we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling ai models.


DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Authors: Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li
Contact: rschaef@cs.stanford.edu
Award nominations: Oral
Keywords: large language models, natural language processing, trustworthiness


Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells

Authors: Rylan Schaeffer, Mikail Khona, Tzuhsuan Ma, Cristóbal Eyzaguirre, Sanmi Koyejo, Ila Rani Fiete
Contact: rschaef@cs.stanford.edu
Keywords: self-supervised learning, neuroscience


AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Authors: Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto
Contact: lxuechen@cs.stanford.edu
Links: Paper | Blog Post | Website
Keywords: instruction-following, large language models, reinforcement learning from human feedback


BARFI Behavior Alignment via Reward Function Optimization

Authors: Dhawal Gupta, Yash Chandak, Scott Jordan, Philip Thomas, Bruno Castro da Silva
Contact: ychandak@stanford.edu
Award nominations: Spotlight
Links: Paper
Keywords: reward design, reward shaping, bi-level optimization,


Banana: Banach Fixed-Point Network for Pointcloud Segmentation with Inter-Part Equivariance

Authors: Congyue Deng*, Jiahui Lei*, Bokui Shen, Kostas Daniilidis, Leonidas Guibas (*equal contribution)
Contact: congyue@stanford.edu
Links: Paper | Video | Website
Keywords: equivariance, pointcloud segmentation, iterative inference


Beyond Confidence: Reliable Models Should Also Consider Atypicality

Authors: Mert Yuksekgonul, Linjun Zhang, James Zou, Carlos Guestrin
Contact: merty@stanford.edu
Links: Paper | Website
Keywords: reliable machine learning, uncertainty, calibration


Calibration by Distribution Matching: Trainable Kernel Calibration Metrics

Authors: Charles Marx*, Sofian Zalouk*, Stefano Ermon
Contact: ctmarx@stanford.edu
Links: Paper
Keywords: calibration, uncertainty quantification, decision-making under uncertainty


Convolutional State Space Models for Long-Range Spatiotemporal Modeling

Authors: Jimmy T.H. Smith, Shalini De Mello, Jan Kautz , Scott W. Linderman, Wonmin Byeon
Contact: jsmith14@stanford.edu
Links: Paper | Website
Keywords: ssms, convlstm, spatiotemporal modeling, video prediction


Data Selection for Language Models via Importance Resampling

Authors: Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang
Contact: xie@cs.stanford.edu
Links: Paper | Website
Keywords: language models, data selection, pretraining, data-centric ml


DataComp: In search of the next generation of multimodal datasets

Authors: Samir Yitzhak Gadre*, Gabriel Ilharco*, Alex Fang*, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
Contact: syagadre@gmail.com
Links: Paper | Blog Post | Website
Keywords: clip, zero-shot, data curation, vision-and-language, datasets, pre-training, benchmark


Disentanglement via Latent Quantization

Authors: Kyle Hsu, Will Dorrell, James C. R. Whittington, Jiajun Wu, Chelsea Finn
Contact: kylehsu@cs.stanford.edu
Links: Paper | Website
Keywords: disentanglement, representation learning, discrete representations


Diverse Conventions for Human-AI Collaboration

Authors: Bidipta Sarkar, Andy Shih, Dorsa Sadigh
Contact: bidiptas@stanford.edu
Links: Paper | Video | Website
Keywords: multi-agent rl, multi-agent coordination, human-ai coordination


DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

Authors: Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc Le, Tengyu Ma, Adams Wei Yu
Contact: xie@cs.stanford.edu
Award nominations: Spotlight
Links: Paper | Blog Post | Website
Keywords: large language models, pretraining, data mixtures, data-centric ml


Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

Authors: Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher Ré
Contact: nguha@stanford.edu
Links: Paper | Blog Post | Website
Keywords: large language models, prompt correction, weak supervision, graphical models


Exact Optimality of Communication-Privacy-Utility Tradeoffs in Distributed Mean Estimation

Authors: Berivan Isik, Wei-Ning Chen, Ayfer Ozgur, Tsachy Weissman, Albert No
Contact: berivan.isik@stanford.edu
Links: Paper
Keywords: distributed mean estimation, privacy, compression, communication, federated analytics.


High dimensional, tabular deep learning with an auxiliary knowledge graph

Authors: Camilo Ruiz*, Hongyu Ren*, Kexin Huang, Jure Leskovec
Contact: caruiz@cs.stanford.edu
Links: Paper
Keywords: deep learning, high dimensional, tabular prediction, knowledge graph, graph machine learning


Inferring Hybrid Fluid Fields from Videos

Authors: Hong-Xing Yu*, Yang Zheng*, Yuan Gao, Yitong Deng, Bo Zhu, Jiajun Wu
Contact: xkoven@gmail.com
Links: Paper | Website
Keywords: fluid, video, motion, physics, reconstruction


Inverse Preference Learning: Preference-based RL without a Reward Function

Authors: Joey Hejna, Dorsa Sadigh
Contact: jhejna@stanford.edu
Links: Paper
Keywords: reinforcement learning, preference-based rl, rlhf


Lexinvariant Language Models

Authors: Qian Huang, Eric Zelikman, Sarah Li Chen, Yuhuai Wu, Gregory Valiant, Percy Liang
Contact: qhwang@stanford.edu
Award nominations: Spotlight
Links: Paper
Keywords: large language model, in-context learning, pretraining


MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks

Authors: Allen Nie, Yuhui Zhang, Atharva Amdekar, Chris Piech, Tatsunori Hashimoto, Tobias Gerstenberg
Contact: anie@stanford.edu
Links: Paper | Website
Keywords: cognitive science, causal reasoning, moral reasoning, dataset, language models


NAP: Neural 3D Articulation Prior

Authors: Jiahui Lei, Congyue Deng, Bokui Shen, Leonidas Guibas, Kostas Daniilidis
Contact: congyue@stanford.edu
Links: Paper | Video | Website
Keywords: 3d generative model, articulated object, diffusion model


NAS-X: Neural Adaptive Smoothing via Twisting

Authors: Dieterich Lawson*, Michael Y. Li*, Scott W. Linderman
Contact: dieterich.lawson@gmail.com, michaelyli@stanford.edu
Links: Paper
Keywords: sequence models, probabilistic inference, reweighted wake-sleep, sequential monte carlo, smoothing, mechanistic models


OpenDataVal: a Unified Benchmark for Data Valuation

Authors: Kevin Fu Jiang, Weixin Liang, James Zou, Yongchan Kwon
Contact: wxliang@stanford.edu
Links: Paper | Website
Keywords: data valuation, influence function, data shapley


PRODIGY: Enabling In-context Learning Over Graphs

Authors: Qian Huang, Hongyu Ren, Peng Chen, Gregor Kržmanc, Daniel Zeng, Percy Liang, Jure Leskovec
Contact: qhwang@stanford.edu
Award nominations: Spotlight
Links: Paper | Website
Keywords: graph neural network, in-context learning, pretraining


Parallel Sampling of Diffusion Models

Authors: Andy Shih, Suneel Belkhale, Stefano Ermon, Dorsa Sadigh, Nima Anari
Contact: andyshih@stanford.edu
Award nominations: Spotlight
Links: Paper
Keywords: diffusion model, sampling, parallel


Parsel🐍: Algorithmic Reasoning with Language Models by Composing Decompositions

Authors: Eric Zelikman, Qian Huang, Gabriel Poesia, Noah Goodman, Nick Haber
Contact: ezelikman@cs.stanford.edu
Award nominations: Spotlight
Links: Paper | Website
Keywords: reasoning, language models, code synthesis, decomposition


Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Authors: Matthias Gerstgrasser, Tom Danino, Sarah Keren
Contact: mgerst@stanford.edu
Links: Paper | Website
Keywords: multi-agent reinforcement learning, cooperative ai, dqn


Siamese Masked Autoencoders

Authors: Agrim Gupta, Jiajun Wu, Jia Deng, Li Fei-Fei
Contact: agrim@stanford.edu
Award nominations: Oral
Links: Paper | Website
Keywords: representation learning, visual correspondence, self-supervised learning, videos


Supervised Pretraining Can Learn In-Context Reinforcement Learning

Authors: Jonathan Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill
Contact: jnl@stanford.edu, anniexie@stanford.edu
Award nominations: Spotlight
Links: Paper | Website
Keywords: decision making, reinforcement learning, in-context learning, bandits, transformers, offline reinforcement learning, exploration, reinforcement learning theory


Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design

Authors: AkshatKumar Nigam, Robert Pollice, Gary Tom, Kjell Jorner, John Willes, Luca A. Thiede, Anshul Kundaje, Alan Aspuru-Guzik
Contact: akshat98@stanford.edu
Links: Paper | Website
Keywords: molecular design, generative modelling


Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective

Authors: Guhao Feng, Bohang Zhang, Yuntian Gu, Haotian Ye, Di He, Liwei Wang
Contact: haotianye@stanford.edu
Award nominations: Oral
Links: Paper | Video | Website
Keywords: chain-of-thought prompting, large language models, theory, circuit complexity, dynamic programming


VeriX: Towards Verified Explainability of Deep Neural Networks

Authors: Min Wu, Haoze Wu, Clark Barrett
Contact: minwu@stanford.edu
Links: Paper | Video | Website
Keywords: trustworthy machine learning, deep neural networks, explainability, interpretability, formal methods, automated verification


What’s Left? Concept Grounding with Logic-Enhanced Foundation Models

Authors: Joy Hsu* Jiayuan Mao*, Joshua B. Tenenbaum, Jiajun Wu
Contact: joycj@stanford.edu
Links: Paper | Website
Keywords: concept learning, visual reasoning, large language models, neuro-symbolic learning


Why think step by step? Reasoning emerges from the locality of experience

Authors: Ben Prystawski, Michael Y. Li, Noah D. Goodman
Contact: benpry@stanford.edu
Award nominations: Oral
Links: Paper
Keywords: chain-of-thought; language models; reasoning


Zero-shot causal learning

Authors: Hamed Nilforoshan, Michael Moor, Yusuf Roohani, Yining Chen, Anja Šurina, Michihiro Yasunaga, Sara Oblak, Jure Leskovec
Contact: hamedn@cs.stanford.edu; mdmoor@cs.stanford.edu
Award nominations: Spotlight
Links: Paper
Keywords: causal inference; zero-shot; meta-learning; health; drug side effects


Datasets and Benchmarks Track

Are These the Same Apple? Comparing Images Based on Object Intrinsics

Authors: Klemen Kotar*, Stephen Tian*, Hong-Xing Yu, Daniel L.K. Yamins, Jiajun Wu
Contact: tians@stanford.edu
Workshop: Datasets and Benchmarks
Links: Paper | Website
Keywords: computer vision, image similarity


EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

Authors: Michael Wornow, Rahul Thapa, Ethan Steinberg, Jason A. Fries, Nigam H. Shah
Contact: mwornow@stanford.edu
Workshop: Datasets and Benchmarks
Award nominations: Spotlight
Links: Paper | Website
Keywords: foundation models, ehrs, healthcare,


INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

Authors: Shih-Cheng Huang, Zepeng Huo, Ethan Steinberg, Chia-Chun Chiang, Matthew P. Lungren, Curtis P. Langlotz, Serena Yeung, Nigam H. Shah, Jason A. Fries
Contact: zphuo@stanford.edu
Workshop: Datasets and Benchmarks
Links: Paper | Website
Keywords: multimodal fusion, medical imaging, electronic health records


Authors: Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N. Rockmore, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani, Frank Fagan, Galit Sarfaty, Gregory M. Dickinson, Haggai Porat, Jason Hegland, Jessica Wu, Joe Nudell, Joel Niklaus, John Nay, Jonathan H. Choi, Kevin Tobia, Margaret Hagan, Megan Ma, Michael Livermore, Nikon Rasumov-Rahe, Nils Holzenberger, Noam Kolt, Peter Henderson, Sean Rehaag, Sharad Goel, Shang Gao, Spencer Williams, Sunny Gandhi, Tom Zur, Varun Iyer, Zehua Li
Contact: nguha@stanford.edu
Workshop: Datasets and Benchmarks
Links: Paper | Website
Keywords: law, legal applications, large language models, benchmarks,


Workshop Papers

An Information-Theoretic Understanding of Maximum Manifold Capacity Representations

Authors: Rylan Schaeffer, Berivan Isik, Victor Lecomte, Mikail Khona, Yann LeCun, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Workshop: Unifying Representations in Neural Models, Information-Theoretic Principles in Cognitive Systems, Symmetry and Geometry in Neural Representations, Self-Supervised Learning Theory and Practice
Award nominations: Oral at Unifying Representations in Neural Models, Spotlight at Information-Theoretic Principles in Cognitive Systems
Keywords: machine learning, self-supervised learning, manifolds


Associative Memory Under the Probabilistic Lens: Improved Transformers & Dynamic Memory Creation

Authors: Rylan Schaeffer, Mikail Khona, Nika Zahedi, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Workshop: Associative Memory & Hopfield Networks
Keywords: associative memory, probabilistic modeling, bayesian nonparametrics


Beyond Expectations: Model-Driven Amplification of Dataset Biases in Data Feedback Loops

Authors: Rylan Schaeffer, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Workshop: Algorithmic Fairness through the Lens of Time
Keywords: bias, feedback loops, machine learning


Divergence at the Interpolation Threshold: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Authors: Rylan Schaeffer, Zachary Robertson, Akhilan Boopathy, Mikail Khona, Ila Fiete, Andrey Gromov, Sanmi Koyejo
Contact: rschaef@cs.stanford.edu
Workshop: Mathematics of Modern Machine Learning, Attributing Model Behavior at Scale
Keywords: machine learning, double descent


AutoFT: Robust Fine-Tuning by Optimizing Hyperparameters on OOD Data

Authors: Caroline Choi*, Yoonho Lee*, Annie S Chen, Allan Zhou, Aditi Raghunathan, Chelsea Finn
Contact: cchoi1@stanford.edu
Workshop: DistShift
Links: Paper
Keywords: robust fine-tuning, foundation models, adaptation, few-shot learning, meta-learning, hyperparameter optimization


Benchmarking Large Language Models As AI Research Agents

Authors: Qian Huang, Jian Vora, Percy Liang, Jure Leskovec
Contact: qhwang@stanford.edu
Workshop: Foundation Models for Decision Making
Award nominations: Oral
Links: Paper | Website
Keywords: benchmark, llm agent


Confidence-Based Model Selection: When to Take Shortcuts in Spurious Settings

Authors: Annie S Chen, Yoonho Lee, Amrith Setlur, Sergey Levine, Chelsea Finn
Contact: asc8@stanford.edu
Workshop: DistShift
Links: Paper
Keywords: distribution-shift robustness, spurious correlations, shortcut features, subpopulation shifts


Context-Aware Meta-Learning

Authors: Christopher Fifty, Dennis Duan, Ronald G. Junkins, Ehsan Amid, Jure Leskovec, Christopher Ré, Sebastian Thrun
Contact: fifty@cs.stanford.edu
Workshop: Distribution Shifts (DistShift): New Frontiers with Foundational Models
Links: Paper
Keywords: meta-learning, few-shot learning, deep learning, elmes


Enhancing Ligand Pose Sampling for Machine Learning–Based Docking

Authors: Patricia Suriana, Ron O. Dror
Contact: psuriana@stanford.edu
Workshop: Machine Learning for Structural Biology Workshop
Links: Paper
Keywords: ligand docking, deep learning


Generative AI for designing and validating easily synthesizable and structurally novel antibiotics

Authors: Kyle Swanson, Gary Liu, Denise Catacutan, Jonathan Stokes, James Zou
Contact: swansonk@stanford.edu
Workshop: Generative AI and Biology
Links: Paper | Website
Keywords: generative ai, antibiotic discovery, drug design, synthesizability


Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction

Authors: Christopher Fifty, Joseph M. Paggi, Ehsan Amid, Jure Leskovec, Ron O. Dror
Contact: fifty@cs.stanford.edu
Workshop: Machine Learning in Structural Biology
Links: Paper | Website
Keywords: few-shot learning, structural biology, deep learning


Interactive Model Correction with Natural Language

Authors: Yoonho Lee, Michelle Lam, Helena Vasconcelos, Michael Bernstein, Chelsea Finn
Contact: yoonho@stanford.edu
Workshop: ICBINB, XAI in Action
Links: Paper
Keywords: spurious correlations, human-computer interaction, natural language feedback, vision-language models


Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AI

Authors: Emily Jin, Jiaheng Hu, Zhuoyi Huang, Ruohan Zhang, Jiajun Wu, Li Fei-Fei, Roberto Martín-Martín
Contact: emilyjin@stanford.edu
Workshop: Generalization in Planning (GenPlan), Agent Learning in Open-Endedness (ALOE)
Links: Paper | Website
Keywords: symbolic, complex, long-horizon, decision-making, embodied ai benchmark


Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

Authors: Matthias Gerstgrasser, David Parkes
Contact: mgerst@stanford.edu
Workshop: Multi-Agent Security Workshop
Links: Paper | Website
Keywords: multi-agent reinforcement learning, mechanism design, security games, stackelberg equilibria


Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation

Authors: Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai
Contact: ezelikman@cs.stanford.edu
Workshop: OPT 2023: Optimization for Machine Learning
Links: Paper
Keywords: reasoning, language models, self-improvement, code generation


Social Contract AI: Aligning AI Assistants with Implicit Group Norms

Authors: Jan-Philipp Fränken, Sam Kwok, Peixuan Ye, Kanishk Gandhi, Dilip Arumugam, Jared Moore, Alex Tamkin, Tobias Gerstenberg, Noah D Goodman
Contact: janphilipp.franken@gmail.com
Workshop: Socially Responsible Language Modelling Research (SoLaR)
Links: Paper | Website
Keywords: alignment, preference learning, simulation


Testing Assumptions Underlying a Unified Theory for the Origin of Grid Cells

Authors: Rylan Schaeffer, Mikail Khona, Adrian Bertagnoli, Sanmi Koyejo, Ila Rani Fiete
Contact: rschaef@cs.stanford.edu
Workshop: Workshops: Unifying Representations in Neural Models, Symmetry and Geometry in Neural Representations, AI for Science
Links: Paper
Keywords: neuroscience, artificial intelligence, computational biology


Unifying Corroborative and Contributive Attributions in Large Language Models

Authors: Theodora Worledge, Judy Hanwen Shen, Nicole Meister, Caleb Winston, Carlos Guestrin
Contact: jhshen@stanford.edu
Workshop: ATTRIB Workshop 2023
Links: Paper
Keywords: llm, attributions, training data attributions, fact checking, fact tracing, information retrieval, retrieval augment generation


We look forward to seeing you at NeurIPS 2023!