Join us at the Thirty-Eighth Conference on Neural Information Processing Systems (NeurIPS) 2024, taking place in Vancouver from December 10th - December 15th. Stanford Artificial Intelligence Laboratory (SAIL) researchers will be presenting at the main conference, at the Datasets and Benchmarks track and the various workshops. Here’s some of the SAIL work you may run into at the conference!

Interested in learning more about Stanford Artificial Intelligence Laboratory’s latest innovations? Our researchers welcome your questions - don’t hesitate to connect with the contact authors listed for each paper.

List of Accepted Papers

Main Conference

Are More LLM Calls All You Need? Towards the Scaling Properties of Compound AI Systems

Authors: Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou
Contact: lingjiao@stanford.edu
Links: Paper | Blog Post | Website
Keywords: scaling laws; compound ai systems; language models


BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices

Authors: Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, and Mykel J. Kochenderfer
Contact: anka.reuel@stanford.edu, ahardy@stanford.edu
Award nominations: Spotlight (Datasets and Benchmarks Track)
Links: Paper | Website
Keywords: benchmarking, assessment, best practices, evaluation, benchmark


Authors: Shayne Longpre, Robert Mahari, Ariel N. Lee, Campbell S. Lund, Hamidah Oderinwale, William Brannon, Nayan Saxena, Naana Obeng-Marnu, Tobin South, Cole J Hunter, Kevin Klyman, Christopher Klamm, Hailey Schoelkopf, Nikhil Singh, Manuel Cherep, Ahmad Mustafa Anis, An Dinh, Caroline Shamiso Chitongo, Da Yin, Damien Sileo, Deividas Mataciunas, Diganta Misra, Emad A. Alghamdi, Enrico Shippole, Jianguo Zhang, Joanna Materzynska, Kun Qian, Kushagra Tiwary, Lester James Validad Miranda, Manan Dey, Minnie Liang, Mohammed Hamdy, Niklas Muennighoff, Seonghyeon Ye, Seungone Kim, Shrestha Mohanty, Vipul Gupta, Vivek Sharma, Vu Minh Chien, Xuhui Zhou, Yizhi LI, Caiming Xiong, Luis Villa, Stella Biderman, Hanlin Li, Daphne Ippolito, Sara Hooker, Jad Kabbara, Alex Pentland
Contact: data.provenance.init@gmail.com
Links: Paper | Website
Keywords: training data, audits, text


DART-Eval: A Comprehensive DNA Language Model Evaluation Benchmark on Regulatory DNA

Authors: Aman Patel*, Arpita Singhal*, Austin Wang*, Anusri Pampari, Maya Kasowski, Anshul Kundaje
Contact: arpitas@stanford.edu
Links: Paper
Keywords: dna, language models, llms, biology, foundation models, benchmarks, gene regulation, healthcare


Geometric Trajectory Diffusion Models

Authors: Jiaqi Han, Minkai Xu, Aaron Lou, Haotian Ye, Stefano Ermon
Contact: jiaqihan@stanford.edu
Links: Paper | Website
Keywords: diffusion models, trajectory generation


Optimistic Verifiable Training by Controlling Hardware Nondeterminism

Authors: Megha Srivastava, Simran Arora, Dan Boneh
Contact: meghas@stanford.edu
Links: Paper | Website
Keywords: security, verification, robustness, reproducibility, systems


ReFT: Representation Finetuning for Language Models

Authors: Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts
Contact: wuzhengx@stanford.edu, aryamana@stanford.edu
Links: Paper | Website
Keywords: interpretability, efficient training


Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Authors: Chaofan Tao, Qian Liu, Longxu Dou, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong
Contact: cftao@connect.hku.hk and liuqian.sea@gmail.com
Links: Paper | Website
Keywords: natural language processing, scaling laws, efficient neural networks, large language models


Smoothie: Label Free Language Model Routing

Authors: Neel Guha, Mayee F. Chen, Trevor Chow, Ishan S. Khare, Christopher Ré
Contact: neelguha@gmail.com
Links: Paper
Keywords: routing, llms


Streaming Detection of Queried Event Start

Authors: Cristóbal Eyzaguirre, Eric Tang, Shyamal Buch, Adrien Gaidon, Jiajun Wu, Juan Carlos Niebles
Contact: ceyzagui@stanford.edu
Links: Paper | Website
Keywords: streaming,video,vlm,cv,online


Structured flexibility in recurrent neural networks via neuromodulation

Authors: Julia Costacurta, Shaunak Bhandarkar, David Zoltowski, Scott Linderman
Contact: jcostac@stanford.edu
Links: Paper
Keywords: recurrent neural networks, neuroscience


The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding

Authors: Kenneth Enevoldsen ~Kenneth_Enevoldsen1 , Márton Kardos, Niklas Muennighoff, Kristoffer Nielbo
Contact: kenneth.enevoldsen@cas.au.dk
Links: Paper
Keywords: sentence embeddings, rag, low-resource nlp, danish, norwegian, swedish, scandinavian


Towards Scalable and Stable Parallelization of Nonlinear RNNs

Authors: Xavier Gonzalez, Andrew Warrington, Jimmy T.H. Smith, Scott W. Linderman
Contact: xavier18@stanford.edu
Links: Paper | Blog Post | Video
Keywords: rnns, ssms, parallel algorithms, optimization


Why are Visually-Grounded Language Models Bad at Image Classification?

Authors: Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy
Contact: yuhuiz@stanford.edu
Links: Paper | Video | Website
Keywords: vision language model, image classification


Datasets and Benchmarks Track

UniTox: Leveraging LLMs to Curate a Unified Dataset of Drug-Induced Toxicity from FDA Labels

Authors: Jake Silberg, Kyle Swanson, Elana Simon, Angela Zhang, Zaniar Ghazizadeh, Scott Ogden, Hisham Hamadeh, James Zou
Contact: jsilberg@stanford.edu
Award nominations: Spotlight
Links: Paper | Blog Post | Website
Keywords: large language model, gpt, biomedicine, drug discovery, drug toxicity, drug safety


Workshop Papers

AI Governance and the Developmental Immaturity of the Science of AI Safety

Authors: Rob Reich
Contact: reich@stanford.edu
Workshop: Regulatable ML Workshop
Keywords: ai governance, ai safety


On Short Textual Value Column Representation Using Symbol Level Language Models

Authors: Ron Begleiter, Nathan Roll
Contact: nroll@stanford.edu
Workshop: Table Representation Learning Workshop
Links: Paper
Keywords: symbol level language models, column matching


OLMoE: Open Mixture-of-Experts Language Models

Authors: Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi
Contact: niklasm@stanford.edu
Workshop: Efficient Natural Language and Speech Processing Workshop
Award nominations: Oral (Spotlight)
Links: Paper | Blog Post | Website
Keywords: large language models, mixture-of-experts, foundation models


Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations

Authors: Aryan Shrivastava, Jessica Hullman, Max Lamparth
Contact: lamparth@stanford.edu
Workshop: Socially Responsible Language Modelling Research Workshop
Links: Paper
Keywords: language models, ai safety, natural language processing, military, inconsistency, transparency


DivShift: Exploring Domain-Specific Distribution Shift in Large-Scale, Volunteer-Collected Biodiversity Datasets

Authors: Elena Sierra, Lauren Gillespie, Salim Soltani, Moisés Expósito-Alonso, Teja Kattenborn
Contact: esierra@stanford.edu
Workshop: Tackling Climate Change with Machine Learning Workshop
Links: Paper | Video
Keywords: biodiversity, data bias


Intuitions of Compromise: Utilitarianism vs. Conctractualism

Authors: Jared Moore, Yejin Choi, Sydney Levine
Contact: jlcmoore@stanford.edu
Workshop: Pluralistic Alignment Workshop
Links: Paper | Website
Keywords: value aggregation, nash product, expected utility, compromise, moral decision making, social welfare


We look forward to seeing you at NeurIPS this year!