I'm a Ph.D. student in Computer Science at Stanford University, where I have been fortunate to be advised by Prof. James Zou. I am a part of Stanford Artificial Intelligence Laboratory (SAIL), where I have collaborated with Prof. Daniel A. McFarland, Prof. Christopher D Manning, Prof. Christopher Potts, Prof. Daniel Jurafsky, Prof. Li Fei-Fei, and Prof. Serena Yeung.
I did my master at Stanford in Electrical Engineering, working with Prof. James Zou and Prof. Zhou Yu. Prior to Stanford, I received a B.S. in Computer Science from Zhejiang University in 2019, where I worked with Prof. Kai Bu and Prof. Mingli Song. I have also spent time interning at Meta FAIR, Amazon Alexa AI, Apple, and Tencent.
Weixin Liang,
Lili Yu,
Liang Luo,
Srinivasan Iyer,
Ning Dong,
Chunting Zhou,
Gargi Ghosh,
Mike Lewis,
Wen-tau Yih,
Luke Zettlemoyer,
Xi Victoria Lin
arXiv preprint (2024)
Paper
Twitter
Bluesky
Code
How can we reduce pretraining costs for multi-modal models without sacrificing quality? At Meta, We introduce Mixture-of-Transformers (MoT), a sparse architecture with modality-aware sparsity for every non-embedding transformer parameter (e.g., feed-forward networks, attention matrices, and layer normalization). MoT achieves dense-level performance with up to 66% fewer FLOPs!
Weixin Liang*,
Zachary Izzo*,
Yaohui Zhang*,
Haley Lepp,
Hancheng Cao,
Xuandong Zhao,
Lingjiao Chen,
Haotian Ye,
Sheng Liu,
Zhi Huang,
Daniel A. McFarland,
James Y. Zou
International Conference on Machine Learning (ICML 2024)
Oral/top 5% of accepted papers, ICML 2024
Best Presentation Runner-up award at ICSSI 2024 (International Conference on the Science of Science and Innovation), National Academy of Sciences in Washington, DC.
Paper
NY Times Article
Twitter
Code
Slides PDF
Google Slides
Our estimates suggest that 10.6% of ICLR 2024 review sentences and 16.9% for EMNLP have been substantially modified by ChatGPT, with no significant evidence of ChatGPT usage in Nature portfolio reviews. Estimated ChatGPT usage in reviews spikes significantly within 3 days of review deadlines. We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM).
Weixin Liang*,
Yaohui Zhang*,
Zhengxuan Wu*,
Haley Lepp,
Wenlong Ji,
Xuandong Zhao,
Hancheng Cao,
Sheng Liu,
Siyu He,
Zhi Huang,
Diyi Yang,
Christopher Potts†,
Christopher D Manning†,
James Y. Zou†
Conference on Language Modeling (COLM 2024)
Paper
Twitter
Code
Our new study estimates that ~17% of recent CS arXiv papers used #LLMs substantially in its writing. Around 8% for bioRxiv papers. Moreover, at an aggregate level, our analysis reveals that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and papers of shorter lengths. Our findings suggests that LLMs are being broadly used in scientific writings.
Weixin Liang*,
Yuhui Zhang*,
Hancheng Cao*,
Binglu Wang,
Daisy Ding,
Xinyu Yang,
Kailas Vodrahalli,
Siyu He,
Daniel Smith,
Yian Yin,
Daniel A. McFarland,
James Zou
NEJM AI (2024)
Paper
Twitter
Code
With the breakthrough of large language models (LLM) such as GPT-4, there is growing interest in using LLMs to generate scientific feedback on research manuscripts. However, the utility of LLM-generated feedback has not been systematically studied. To address this gap, we created an automated pipeline using GPT-4 to provide comments on the full PDFs of scientific papers. Our results suggest that LLM and human feedback can complement each other. While human expert review is and should continue to be the foundation of rigorous scientific process, LLM feedback could benefit researchers, especially when timely expert feedback is not available and in earlier stages of manuscript preparation before peer-review.
Weixin Liang,
Nazneen Rajani,
Xinyu Yang,
Ezinwanne Ozoani,
Eric Wu,
Yiqun Chen,
Daniel Scott Smith,
James Zou
Nature Machine Intelligence (2024)
Paper
The rapid proliferation of AI models has underscored the importance of thorough documentation, as it enables users to understand, trust, and effectively utilize these models in various applications. Although developers are encouraged to produce model cards, it's not clear how much information or what information these cards contain. In this study, we conduct a comprehensive analysis of 32,111 AI model documentations on Hugging Face, a leading platform for distributing and deploying AI models. Our investigation sheds light on the prevailing model card documentation practices. Most of the AI models with substantial downloads provide model cards, though the cards have uneven informativeness. We find that sections addressing environmental impact, limitations, and evaluation exhibit the lowest filled-out rates, while the training section is the most consistently filled-out. Our study opens up a new perspective for analyzing community norms and practices for model documentation through large-scale data science and linguistics analysis.
Xinyu Yang*,
Weixin Liang*,
James Zou
ICLR 2024 (2024)
PDF
Code
Advances in machine learning are closely tied to the creation of datasets. While data documentation is widely recognized as essential to the reliability, reproducibility, and transparency of ML, we lack a systematic empirical understanding of current dataset documentation practices. By analyzing all 7,433 dataset documentation on Hugging Face, our investigation provides an overview of the Hugging Face dataset ecosystem and insights into dataset documentation practices.
Weixin Liang*,
Mert Yuksekgonul*,
Yining Mao*
Eric Wu*,
James Zou
Patterns (2023)
Paper
Cell.com
Code
Data
We should be very cautious when using detectors to classify if text is written by AI or human. Our research has shown that such detectors classify over 50% of real text written by non-native English speakers as AI-generated, while most polished essays generated by GPT evade detection. This creates a bias and false positives against non-native speakers, as literary language is often classified as "human."
Media Coverage: The Markup, The Guardian, Fortune, Stanford HAI, Stanford EE, The Hechinger Report
Kevin Jiang*,
Weixin Liang*,
James Zou,
Yongchan Kwon
NeurIPS Datasets and Benchmarks Track (2023)
Paper
Project Page
Code
Twitter
Assessing the quality and impact of individual data points is critical for improving model performance and mitigating undesirable biases within the training dataset. In this paper, we introduce OpenDataVal, an easy-to-use and unified benchmark framework that empowers researchers and practitioners to apply and compare various data valuation algorithms.
Kevin Wu,
Eric Wu,
Brandon Theodorou,
Weixin Liang,
Christina Mack,
Lucas Glass,
Jimeng Sun,
James Zou
NEJM AI (2023)
Paper
We analyze billions of insurance claims data to produce a first-look at medical AI adoption.
Media Coverage: Nature Medicine, The Imaging Wire, Cardiac Wire
Weixin Liang*,
Yining Mao*
Yongchan Kwon*,
Xinyu Yang,
James Zou
International Conference on Machine Learning (ICML 2023)
Paper
Poster
Website
Recording
Code
Recent works empirically find that there is a strong linear relationship between in-distribution (ID) and out-of-distribution (OOD) performance, but we show that this is not necessarily true if there are subpopulation shifts. In this paper, we empirically show that out-of-distribution performance often has nonlinear ("moon shape") correlation with in-distribution performance under subpopulation shifts.
Weixin Liang,
Girmaw Abebe Tadesse,
Daniel Ho,
Li Fei-Fei,
Matei Zaharia,
Ce Zhang,
James Zou
Nature Machine Intelligence (2022)
Paper
Nature.com
Twitter
As AI model-building becomes more automated, much of the resources and time in practice are devoted to designing what data to collect, data cleaning, annotations and data evaluations. Our article discusses the best practices, new challenges and opportunities for each of these key components of the data for AI pipeline.
Weixin Liang*,
Yuhui Zhang*,
Yongchan Kwon*,
Serena Yeung,
James Zou
NeurIPS (2022)
Paper
HTML
Poster
Website
Code
Our new paper explains the intriguing AI ModalityGap: in multi-modal AI, there are large gaps in the representation space separating different data types. We show changing the gap improves zero-shot learning and fairness. Interestingly, modality gaps are created at model initialization and are reinforced by contrastive learning.
Weixin Liang,
Scott Elrod,
Daniel A. McFarland,
James Zou
Patterns (2022)
Paper
Cell.com
Twitter
Recording
Stanford HAI News: Analyzing 50 Years of Stanford Patents
Finding patterns of success across 50 years of innovation | Scope
OTL 50th Anniversary Report: A Half Century of Pioneering Innovation
Computational analysis of 4,512 inventions marketed by Stanford's Office of Technology Licensing between 1970 and 2020 characterizes how the academic innovation landscape changed over time. We identified factors, such as the composition of the inventors, associated with the commercial success of the inventions. We also identified linguistic differences in how high-revenue and low-revenue inventions in the same field are described and marketed.
Weixin Liang,
James Zou
Contributed Talk at ICML 2022 Workshop on
Shift happens: Crowdsourcing metrics and test datasets beyond ImageNet
International Conference on Learning Representations (ICLR 2022)
Paper
HTML
Website
Code
HuggingFace
Recording
Blog
MetaShift introduces a collection of >10K sets of images with annotated contexts! Context is missing in many ML datasets but is critical for understanding model performance. It enables evaluating how ML works in different contexts (e.g. indoor cat vs outdoor cat). Bonus: we give distance between contexts.
Nazneen Rajani,
Weixin Liang,
Lingjiao Chen,
Meg Mitchell,
James Zou
Empirical Methods in Natural Language Processing (EMNLP 2022)
Paper
SEAL toolkit and Demo
Machine learning systems that seemingly perform well on average can still make systematic errors on important subsets of data. We introduce an interactive Systematic Error Analysis and Labeling (SEAL) tool that uses a two-step approach to first identify high error slices of data and then in the second step introduce methods to give human-understandable semantics to those under-performing slices.
Huaxiu Yao,
Yu Wang,
Sai Li,
Linjun Zhang,
Weixin Liang,
James Zou,
Chelsea Finn
International Conference on Machine Learning (ICML 2022)
Paper
HTML
Poster
Code
Recording
To deploy machine learning algorithms in real-world applications, we must pay attention to distribution shift, i.e. when the test distribution is different from the training distribution, which substantially degrades model performance. We propose a simple mixup-based method to learn invariant functions via selective augmentation.
R Daneshjou*,
K Vodrahalli*,
W Liang*,
R Novoa,
M Jenkins,
V Rotemberg,
J Ko,
S Swetter,
E Bailey,
O Gevaert,
P Mukherjee,
M Phung,
K Yekrang,
B Fong,
R Sahasrabudhe,
Albert Chiou,
James Zou
Science Advance (2022)
Machine Learning for Health (ML4H 2021)
Paper
Science.org
Diverse Dermatology Images (DDI) Dataset
Training physicians and algorithms in dermatology diversity | Scope
In order to train and test AI algorithms in dermatology, we need diverse, validated benchmarks. We curated the Diverse Dermatology Images (DDI) dataset to meet this need—the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones.
Weixin Liang,
James Zou
International Symposium on Information Theory (ISIT 2021)
Paper
HTML
Code
Slides
Our new ISIT 2021 paper proposes neural group testing to speed up DeepLearning. The idea is to adaptively apply the network to groups of data pooled at suitable layers, which greatly reduces total compute.
Weixin Liang*,
Kaihui Liang*,
Zhou Yu
Annual Conference of the Association of Computational Linguistics (ACL 2021)
Paper
HTML
Code
Recording
We propose a workflow that automatically labels training data with minimum human efforts involved, built upon our previous ACL 2020 work.
Weixin Liang,
James Zou,
Zhou Yu
Empirical Methods in Natural Language Processing (EMNLP 2020)
Paper
HTML
Slides
Recording
Blog
Review Ratings: 4, 4, 4.5 in 5-point scale
Our new EMNLP paper shows how to teach ML via natural language explanation of contrasts between concepts (e.g. "difference between COVID and flu is ..."). It's much more efficient than using labeled examples. Excited for more human-like learning!
Weixin Liang,
James Zou,
Zhou Yu
Annual Conference of the Association for Computational Linguistics (ACL 2020)
Paper
HTML
Code
Slides
Recording
Blog
Review Ratings: 4.5, 4.5, 5 in 5-point scale
For dialog system evaluation, we found that self-reported dialog ratings are skewed, noisy and insensitive due to bias and variance among different users. We propose a three-stage denoising pipeline to reduce self-reported ratings and, at the same time, build an automatic comparison-based automatic dialog quality predictor.
Weixin Liang*,
Youzhi Tian*,
Chengcai Chen,
Zhou Yu
AAAI Conference on Artificial Intelligence (AAAI 2020)
Paper
HTML
Slides
Press
We propose an end-to-end framework for task-oriented dialog systems, which can flexibly incorporate supervision from multiple intermediate dialog system modules (e.g. natural language understanding, dialog state tracking, dialog policy learning and natural language generation) in an end-to-end manner.
VS Mailthody,
Z Qureshi,
W Liang,
Z Feng,
SG De Gonzalo,
Y Li,
H Franke,
J Xiong,
J Huang,
Wen-mei Hwu
International Symposium on Microarchitecture (MICRO 2019)
Paper
A computer architecture conference paper on in-storage hardware acceleration for deep learning.
Zunlei Feng,
Weixin Liang,
Daocheng Tao,
Li Sun,
Anxiang Zeng,
Mingli Song
International Journal of Computer Vision (IJCV 2019)
Paper
We are the first to leverage computer vision techinques for image-based nondestructive textile fiber identification, which is practically useful in fashion, decoration, and design industry. Existing methods based on physical, chemical and microscopy techniques are normally limited by their long identification cycles, many human factors, high technological barriers, and existing damage.
Weixin Liang,
Kai Bu,
Ke Li,
Jinhong Li,
Arya Tavakoli
Annual Computer Security Applications Conference (ACSAC 2018)
Paper
Outstanding Graduation Thesis, Zhejiang University
Access patterns over untrusted memory have long been exploited to infer sensitive information like program types or even secret keys. We propose a light-weight obfuscation solutions to hide real memory accesses.