Weixin Liang 梁伟欣

I'm a 1st year Ph.D. student in Computer Science at Stanford University, where I have been fortunate to be advised by Prof. James Zou. I am a part of Stanford Artificial Intelligence Laboratory (SAIL), where I have collaborated with Prof. Daniel Jurafsky, Prof. Daniel A. McFarland, and Prof. Serena Yeung.

I did my master at Stanford in Electrical Engineering, working with Prof. James Zou and Prof. Zhou Yu. Prior to Stanford, I received a B.S. in Computer Science from Zhejiang University in 2019, where I worked with Prof. Kai Bu and Prof. Mingli Song. I have also spent time interning at Amazon Alexa AI, Apple, and Tencent.

  • Semantic Scholar
  • Google Scholar
  • DBLP
  • a portrait of Weixin Liang


    Advances, opportunities and challenges in creating data for trustworthy AI

    Weixin Liang, Girmaw Abebe Tadesse, Prof. Daniel Ho, Prof. Fei-Fei Li, Prof. Matei Zaharia, Prof. Ce Zhang, Prof. James Zou
    Nature Machine Intelligence (2022) 
    Paper Nature.com Twitter

    As AI model-building becomes more automated, much of the resources and time in practice are devoted to designing what data to collect, data cleaning, annotations and data evaluations. Our article discusses the best practices, new challenges and opportunities for each of these key components of the data for AI pipeline.

    MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts

    Weixin Liang*, Xinyu Yang* James Zou
    Contributed Talk at ICML 2022 Workshop on Shift happens: Crowdsourcing metrics and test datasets beyond ImageNet
    International Conference on Learning Representations (ICLR 2022) 
    Paper HTML Website Code
    HuggingFace Recording Blog

    MetaShift introduces a collection of >10K sets of images with annotated contexts! Context is missing in many ML datasets but is critical for understanding model performance. It enables evaluating how ML works in different contexts (e.g. indoor cat vs outdoor cat).
    Bonus: we give distance between contexts.

    Improving Out-of-Distribution Robustness via Selective Augmentation

    Huaxiu Yao, Yu Wang, Sai Li, Linjun Zhang, Weixin Liang, James Zou, Chelsea Finn
    International Conference on Machine Learning (ICML 2022) 
    Paper HTML Poster Code Recording

    To deploy machine learning algorithms in real-world applications, we must pay attention to distribution shift, i.e. when the test distribution is different from the training distribution, which substantially degrades model performance. We propose a simple mixup-based method to learn invariant functions via selective augmentation.

    Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning

    Weixin Liang*, Yuhui Zhang*, Yongchan Kwon*, Serena Yeung, James Zou
    ICML 2022 Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward
    Paper HTML Poster Website Code

    Our new paper explains the intriguing AI ModalityGap: in multi-modal AI, there are large gaps in the representation space separating different data types. We show changing the gap improves zero-shot learning and fairness. Interestingly, modality gaps are created at model initialization and are reinforced by contrastive learning.

    On the nonlinear correlation of ML performance between data subpopulations

    Weixin Liang*, Yining Mao* Yongchan Kwon*, Xinyu Yang James Zou
    ICML 2022 Workshop on Principles of Distribution Shift
    ICML 2022 Workshop on Spurious Correlations, Invariance, and Stability
    Paper Poster Website Recording Code

    Recent works empirically find that there is a strong linear relationship between in-distribution (ID) and out-of-distribution (OOD) performance, but we show that this is not necessarily true if there are subpopulation shifts. In this paper, we empirically show that out-of-distribution performance often has nonlinear ("moon shape") correlation with in-distribution performance under subpopulation shifts.

    Manuscripts in Computational Social Science

    Systematic analysis of 50 years of Stanford University Technology Transfer and Commercialization

    Weixin Liang, Scott Elrod, Daniel A. McFarland, James Zou
    Patterns (2022) 
    Paper Cell.com News Twitter Recording
    50th Anniversary Report: A Half Century of Pioneering Innovation

    Computational analysis of 4,512 inventions marketed by Stanford's Office of Technology Licensing between 1970 and 2020 characterizes how the academic innovation landscape changed over time. We identified factors, such as the composition of the inventors, associated with the commercial success of the inventions. We also identified linguistic differences in how high-revenue and low-revenue inventions in the same field are described and marketed.

    How random is the review outcome? A systematic study of the impact of external factors on eLife peer review

    Weixin Liang, Kyle Mahowald, Jennifer Raymond, Vamshi Krishna, Daniel Smith, Daniel Jurafsky, Daniel A. McFarland, James Zou
    Paper (Abstract) Code

    Robust peer review process is essential for the advancement of knowledge. However, peer review outcomes can depend on external factors—e.g. timing of the submission, availability of specific editors and reviewers—that are largely random and orthogonal to the quality of the work. While researchers often complain about luck-of-the-draw of the review process, there lacks systematic analysis of the impact of these external factors.


    Disparities in Dermatology AI Performance on a Diverse, Curated Clinical Image Set

    R Daneshjou*, K Vodrahalli*, W Liang*, R Novoa, M Jenkins, V Rotemberg, J Ko, S Swetter, E Bailey, O Gevaert, P Mukherjee, M Phung, K Yekrang, B Fong, R Sahasrabudhe, Albert Chiou, James Zou
    Science Advance (2022) 
    Machine Learning for Health (ML4H 2021) 
    Paper Science.org Diverse Dermatology Images (DDI) Dataset

    In order to train and test AI algorithms in dermatology, we need diverse, validated benchmarks. We curated the Diverse Dermatology Images (DDI) dataset to meet this need—the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones.

    Neural Group Testing to Accelerate Deep Learning

    Weixin Liang, James Zou
    International Symposium on Information Theory (ISIT 2021) 
    Paper HTML Code Slides

    Our new ISIT 2021 paper proposes neural group testing to speed up DeepLearning. The idea is to adaptively apply the network to groups of data pooled at suitable layers, which greatly reduces total compute.

    HERALD: An Annotation Efficient Method to Train User Engagement Predictors in Dialogs

    Weixin Liang*, Kaihui Liang*, Zhou Yu
    Annual Conference of the Association of Computational Linguistics (ACL 2021)  Paper HTML Code Recording

    We propose a workflow that automatically labels training data with minimum human efforts involved, built upon our previous ACL 2020 work.

    ALICE: Active Learning with Contrastive Natural Language Explanations

    Weixin Liang, Zhou Yu, James Zou
    Empirical Methods in Natural Language Processing (EMNLP 2020) 
    Paper HTML Slides Recording Blog

    Review Ratings: 4, 4, 4.5 in 5-point scale

    Our new EMNLP paper shows how to teach ML via natural language explanation of contrasts between concepts (e.g. "difference between COVID and flu is ..."). It's much more efficient than using labeled examples. Excited for more human-like learning!

    Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation

    Weixin Liang, Zhou Yu, James Zou
    Annual Conference of the Association for Computational Linguistics (ACL 2020) 
    Paper HTML Code Slides Recording Blog

    Review Ratings: 4.5, 4.5, 5 in 5-point scale

    For dialog system evaluation, we found that self-reported dialog ratings are skewed, noisy and insensitive due to bias and variance among different users. We propose a three-stage denoising pipeline to reduce self-reported ratings and, at the same time, build an automatic comparison-based automatic dialog quality predictor.

    MOSS: Training End-to-End Dialog Systems with Modular Supervision

    Weixin Liang*, Youzhi Tian*, Chengcai Chen, Zhou Yu
    AAAI Conference on Artificial Intelligence (AAAI 2020) 
    Paper HTML Slides Press

    We propose an end-to-end framework for task-oriented dialog systems, which can flexibly incorporate supervision from multiple intermediate dialog system modules (e.g. natural language understanding, dialog state tracking, dialog policy learning and natural language generation) in an end-to-end manner.

    DeepStore: In-Storage Acceleration for Intelligent Queries

    VS Mailthody, Z Qureshi, W Liang, Z Feng, SG De Gonzalo, Y Li, H Franke, J Xiong, J Huang, Wen-mei Hwu
    International Symposium on Microarchitecture (MICRO 2019) 

    A computer architecture conference paper on in-storage hardware acceleration for deep learning.

    CU-Net: Component Unmixing Network for Textile Fiber Identification.

    Zunlei Feng, Weixin Liang, Daocheng Tao, Li Sun, Anxiang Zeng, Mingli Song
    International Journal of Computer Vision (IJCV 2019) 

    We are the first to leverage computer vision techinques for image-based nondestructive textile fiber identification, which is practically useful in fashion, decoration, and design industry. Existing methods based on physical, chemical and microscopy techniques are normally limited by their long identification cycles, many human factors, high technological barriers, and existing damage.

    MemCloak: Practical Access Obfuscation for Untrusted Memory

    Weixin Liang, Kai Bu, Ke Li, Jinhong Li, Arya Tavakoli
    Annual Computer Security Applications Conference (ACSAC 2018) 

    Outstanding Graduation Thesis, Zhejiang University

    Access patterns over untrusted memory have long been exploited to infer sensitive information like program types or even secret keys. We propose a light-weight obfuscation solutions to hide real memory accesses.

    Course Projects

    GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering

    Weixin Liang*, Yanhao Jiang*, Zixuan Liu*
    NAACL Workshop on Multimodal Artificial Intelligence (NAACL MAI 2021) 
    Paper Code Stanford CS224W: Machine Learning with Graphs

    Latest News

    • [Aug 22, 2022] Nature Machine Intelligence article on data-centric AI; analysis of 50 years of Stanford research commercialization published in Patterns; Science Advances paper on disparity in skin cancer AI and new Diverse Derm data.
    • [Jul 17, 2022] Look forward to meeting you all at ICML 2022, Baltimore, Maryland!
    • [Apr 23, 2022] New AI4Health Dataset! In order to train and test AI algorithms in dermatology, we need diverse, validated benchmarks. We curated the Diverse Dermatology Images (DDI) dataset to meet this need—the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones.
    • [Mar 3, 2022] Our new paper explains the intriguing AI ModalityGap in multi-modal AI.
    • [Jan 22, 2022] New ICLR paper MetaShift offers a resource of 1000s of distribution shifts.
    • [Jun 13, 2022] Happy to share that I graduated from the master's program at @Stanford today and will stay as a Ph.D. student starting this fall! Endlessly grateful to the people who supported me throughout the journey.
    • [Sep 6, 2019] Arrived @Stanford!
    More >