The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) is being hosted in Miami, Florida from November 12 - 16. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!
List of Accepted Papers
Are Large Language Models Consistent over Value-laden Questions?
Authors: Jared Moore, Tanvi Deshpande, Diyi Yang
Contact: jlcmoore@stanford.edu
Links: Paper | Blog Post | Website
Keywords: computational social science and cultural analytics, interpretability and analysis of models for nlp, language modeling, question answering
CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies
Authors: Weiyan Shi, Ryan Li, Yutong Zhang, Caleb Ziems, Sunny Yu, Raya Horesh, Rogério Abreu de Paula, Diyi Yang
Contact: weiyans@cs.stanford.edu
Links: Paper | Website
Keywords: nlp datasets; corpus creation; automatic creation and evaluation of language resources
Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach
Authors: Yanchen Liu, Mingyu Derek Ma, Wenna Qin, Azure Zhou, Jiaao Chen, Weiyan Shi, Wei Wang, Diyi Yang
Contact: yanchenliu@g.harvard.edu
Links: Paper
Keywords: misinformation, human behavior analysis, nlp for social analysis, quantitative analyses of news and/or social media
Demystifying Verbatim Memorization in Large Language Models
Authors: Jing Huang, Diyi Yang, Christopher Potts
Contact: hij@stanford.edu
Links: Paper | Website
Keywords: memorization; unlearning; causal interpretability
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together
Authors: Dilara Soylu, Christopher Potts, Omar Khattab
Contact: soylu@stanford.edu
Links: Paper | Website
Keywords: fine-tuning, chain of thought, few-shot learning, in-context learning, multi-hop reasoning, prompting techniques
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
Authors: Yucheng Jiang, Yijia Shao, Dekun Ma, Sina J. Semnani, Monica S. Lam
Contact: yuchengj@cs.stanford.edu, shaoyj@cs.stanford.edu, lam@cs.stanford.edu
Links: Paper | Blog Post | Website
Keywords: human-ai collaboration, complex information seeking, mixed-initiative system, multi-agent dialogue system, long form report generation
Is Child-Directed Speech Effective Training Data for Language Models?
Authors: Steven Y. Feng, Noah D. Goodman, Michael C. Frank
Contact: syfeng@stanford.edu
Links: Paper | Website
Keywords: child-directed speech, language models, language acquisition, gpt-2, roberta, synthetic data, tinydialogues, babylm challenge, pretraining, data efficiency, learning efficiency, curricularization, curriculum learning, child language development, data quality
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
Authors: Krista Opsahl-Ong*, Michael J. Ryan*, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, Omar Khattab
Contact: mryan0@stanford.edu
Links: Paper | Video | Website
Keywords: lm programs, prompt optimization, dspy, lm systems
Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring Conversations
Authors: Rose E. Wang, Pawan Wirawarn, Kenny Lam, Omar Khattab, Dorottya Demszky
Contact: rewang@cs.stanford.edu
Links: Paper | Video
Keywords: segmentation, retrieval, discourse, education
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Authors: Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger
Contact: rcsordas@stanford.edu
Links: Paper | Website
Keywords: linear representation hypothesis, lrh, nonlinear representations, mechanistic interpretability, rnns, memory
Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles
Authors: Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, Diyi Yang
Contact: rylouie@stanford.edu
Links: Paper | Website
Keywords: human-ai interaction, expert-in-the-loop, domain-expert evaluation, healthcare applications, mental health applications, nlp for social good
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Authors: Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James Validad Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Jann Railey Montalan, Ryan Ignatius Hadiwijaya, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze GAO, Patrick Amadeus Irawan, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse, Ivan Halim Parmonangan, Maria Khelli, Wenyu Zhang, Lucky Susanto, Reynard Adha Ryanda, Sonny Lazuardi Hermawan, Dan John Velasco, Muhammad Dehan Al Kautsar, Willy Fitra Hendria, Yasmin Moslem, Noah Flynn, Muhammad Farid Adilazuarda, Haochen Li, Johanes Lee, R. Damanhuri, Shuo Sun, Muhammad Reza Qorib, Amirbek Djanibekov, Wei Qi Leong, Quyet V. Do, Niklas Muennighoff, Tanrada Pansuwan, Ilham Firdausi Putra, Yan Xu, Tai Ngee Chia, Ayu Purwarianti, Sebastian Ruder, William Chandra Tjhi, Peerat Limkonchotiwat, Alham Fikri Aji, Sedrick Keh, Genta Indra Winata, Ruochen Zhang, Fajri Koto, Zheng Xin Yong, Samuel Cahyawijaya
Contact: https://seacrowd.github.io/seacrowd-catalogue/
Links: Paper | Website
Keywords: multilingual evaluation; resources for less-resourced languages; software and tools; speech and vision evaluation; benchmarking; language resources; multilingual corpora
SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions
Authors: Shicheng Liu*, Sina J. Semnani*, Harold Triedman, Jialiang Xu, Isaac Dan Zhao, Monica S. Lam
Contact: shicheng@cs.stanford.edu
Links: Paper | Video | Website
Keywords: knowledge base qa, semantic parsing
Statistical Uncertainty in Word Embeddings: GloVe-V
Authors: Andrea Vallebueno, Cassandra Handan-Nader, Christopher D. Manning, Daniel E. Ho
Contact: deho@stanford.edu
Links: Paper | Website
Keywords: static word embeddings, glove model, reconstruction error, uncertainty estimates, hypothesis testing, computational social science
Updating CLIP to Prefer Descriptions Over Captions
Authors: Amir Zur, Elisa Kreiss, Karel D’Oosterlinck, Christopher Potts, Atticus Geiger
Contact: amir.zur1212@gmail.com
Links: Paper
Keywords: accessibility, image text matching, counterfactual/contrastive explanations, human-centered evaluation
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval
Authors: Kazuaki Furumai, Roberto Legaspi, Julio Romero, Yudai Yamazaki, Yasutaka Nishimura, Sina J. Semnani, Kazushi Ikeda, Weiyan Shi, Monica S. Lam
Contact: lam@cs.stanford.edu
Links: Paper
Keywords: persuasion, task-oriented, factuality, information retrieval
We look forward to seeing you at EMNLP 2024!