The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) is being hosted in Miami, Florida from November 12 - 16. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

Are Large Language Models Consistent over Value-laden Questions?

Authors: Jared Moore, Tanvi Deshpande, Diyi Yang
Contact: jlcmoore@stanford.edu
Links: Paper | Blog Post | Website
Keywords: computational social science and cultural analytics, interpretability and analysis of models for nlp, language modeling, question answering


CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies

Authors: Weiyan Shi, Ryan Li, Yutong Zhang, Caleb Ziems, Sunny Yu, Raya Horesh, Rogério Abreu de Paula, Diyi Yang
Contact: weiyans@cs.stanford.edu
Links: Paper | Website
Keywords: nlp datasets; corpus creation; automatic creation and evaluation of language resources


Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach

Authors: Yanchen Liu, Mingyu Derek Ma, Wenna Qin, Azure Zhou, Jiaao Chen, Weiyan Shi, Wei Wang, Diyi Yang
Contact: yanchenliu@g.harvard.edu
Links: Paper
Keywords: misinformation, human behavior analysis, nlp for social analysis, quantitative analyses of news and/or social media


Demystifying Verbatim Memorization in Large Language Models

Authors: Jing Huang, Diyi Yang, Christopher Potts
Contact: hij@stanford.edu
Links: Paper | Website
Keywords: memorization; unlearning; causal interpretability


Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together

Authors: Dilara Soylu, Christopher Potts, Omar Khattab
Contact: soylu@stanford.edu
Links: Paper | Website
Keywords: fine-tuning, chain of thought, few-shot learning, in-context learning, multi-hop reasoning, prompting techniques


Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations

Authors: Yucheng Jiang, Yijia Shao, Dekun Ma, Sina J. Semnani, Monica S. Lam
Contact: yuchengj@cs.stanford.edu, shaoyj@cs.stanford.edu, lam@cs.stanford.edu
Links: Paper | Blog Post | Website
Keywords: human-ai collaboration, complex information seeking, mixed-initiative system, multi-agent dialogue system, long form report generation


Is Child-Directed Speech Effective Training Data for Language Models?

Authors: Steven Y. Feng, Noah D. Goodman, Michael C. Frank
Contact: syfeng@stanford.edu
Links: Paper | Website
Keywords: child-directed speech, language models, language acquisition, gpt-2, roberta, synthetic data, tinydialogues, babylm challenge, pretraining, data efficiency, learning efficiency, curricularization, curriculum learning, child language development, data quality


Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs

Authors: Krista Opsahl-Ong*, Michael J. Ryan*, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, Omar Khattab
Contact: mryan0@stanford.edu
Links: Paper | Video | Website
Keywords: lm programs, prompt optimization, dspy, lm systems


Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring Conversations

Authors: Rose E. Wang, Pawan Wirawarn, Kenny Lam, Omar Khattab, Dorottya Demszky
Contact: rewang@cs.stanford.edu
Links: Paper | Video
Keywords: segmentation, retrieval, discourse, education


Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

Authors: Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger
Contact: rcsordas@stanford.edu
Links: Paper | Website
Keywords: linear representation hypothesis, lrh, nonlinear representations, mechanistic interpretability, rnns, memory


Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

Authors: Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, Diyi Yang
Contact: rylouie@stanford.edu
Links: Paper | Website
Keywords: human-ai interaction, expert-in-the-loop, domain-expert evaluation, healthcare applications, mental health applications, nlp for social good


SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Authors: Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James Validad Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Jann Railey Montalan, Ryan Ignatius Hadiwijaya, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze GAO, Patrick Amadeus Irawan, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse, Ivan Halim Parmonangan, Maria Khelli, Wenyu Zhang, Lucky Susanto, Reynard Adha Ryanda, Sonny Lazuardi Hermawan, Dan John Velasco, Muhammad Dehan Al Kautsar, Willy Fitra Hendria, Yasmin Moslem, Noah Flynn, Muhammad Farid Adilazuarda, Haochen Li, Johanes Lee, R. Damanhuri, Shuo Sun, Muhammad Reza Qorib, Amirbek Djanibekov, Wei Qi Leong, Quyet V. Do, Niklas Muennighoff, Tanrada Pansuwan, Ilham Firdausi Putra, Yan Xu, Tai Ngee Chia, Ayu Purwarianti, Sebastian Ruder, William Chandra Tjhi, Peerat Limkonchotiwat, Alham Fikri Aji, Sedrick Keh, Genta Indra Winata, Ruochen Zhang, Fajri Koto, Zheng Xin Yong, Samuel Cahyawijaya
Contact: https://seacrowd.github.io/seacrowd-catalogue/
Links: Paper | Website
Keywords: multilingual evaluation; resources for less-resourced languages; software and tools; speech and vision evaluation; benchmarking; language resources; multilingual corpora


SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions

Authors: Shicheng Liu*, Sina J. Semnani*, Harold Triedman, Jialiang Xu, Isaac Dan Zhao, Monica S. Lam
Contact: shicheng@cs.stanford.edu
Links: Paper | Video | Website
Keywords: knowledge base qa, semantic parsing


Statistical Uncertainty in Word Embeddings: GloVe-V

Authors: Andrea Vallebueno, Cassandra Handan-Nader, Christopher D. Manning, Daniel E. Ho
Contact: deho@stanford.edu
Links: Paper | Website
Keywords: static word embeddings, glove model, reconstruction error, uncertainty estimates, hypothesis testing, computational social science


Updating CLIP to Prefer Descriptions Over Captions

Authors: Amir Zur, Elisa Kreiss, Karel D’Oosterlinck, Christopher Potts, Atticus Geiger
Contact: amir.zur1212@gmail.com
Links: Paper
Keywords: accessibility, image text matching, counterfactual/contrastive explanations, human-centered evaluation


Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval

Authors: Kazuaki Furumai, Roberto Legaspi, Julio Romero, Yudai Yamazaki, Yasutaka Nishimura, Sina J. Semnani, Kazushi Ikeda, Weiyan Shi, Monica S. Lam
Contact: lam@cs.stanford.edu
Links: Paper
Keywords: persuasion, task-oriented, factuality, information retrieval


We look forward to seeing you at EMNLP 2024!