Bryan He | Publications

Publications

[Show/Hide Abstracts]

Multi-Modal Contrastive Learning across Cardiac Diagnostics

Bryan He, Milos Vukadinovic, Grant Duffy, James Zou, David Ouyang

Preprint, 2025.

[PDF]
A Multimodal Sleep Foundation Model Developed with 500K Hours of Sleep Recordings for Disease Predictions

Rahul Thapa, Magnus Ruud Kjær, Bryan He, Ian Covert, Hyatt Moore, Umaer Hanif, Gauri Ganjoo, Brandon M Westover, Poul Jennum, Andreas Brink-Kjær, Emmanuel Mignot, James Zou

Preprint, 2025.

[Abstract] [PDF] [Code]

Sleep is a fundamental biological process with profound implications for physical and mental health, yet our understanding of its complex patterns and their relationships to a broad spectrum of diseases remains limited. While polysomnography (PSG), the gold standard for sleep analysis, captures rich multimodal physiological data, analyzing these measurements has been challenging due to limited flexibility across recording environments, poor generalizability across cohorts, and difficulty in leveraging information from multiple signals simultaneously. To address this gap, we curated over 585,000 hours of high-quality sleep recordings from approximately 65,000 participants across multiple cohorts and developed SleepFM, a multimodal sleep foundation model trained with a novel contrastive learning approach, designed to accommodate any PSG montage. SleepFM produces informative sleep embeddings that enable predictions of future diseases. We systematically demonstrate that SleepFM embeddings can predict 130 future diseases, as modeled by Phecodes, with C-Index and AUROC of at least 0.75 on held-out participants (Bonferroni-corrected p < 0.01). This includes accurate predictions for death (C-Index: 0.84 [95% CI: 0.81–0.87]), heart failure (C-Index: 0.80 [95% CI: 0.77–0.83]), chronic kidney disease (C-Index: 0.79 [95% CI: 0.77–0.81]), dementia (C-Index: 0.85 [95% CI: 0.82–0.87]), stroke (C-Index: 0.78 [95% CI: 0.76–0.81]), atrial fibrillation (C-Index: 0.78 [95% CI: 0.75–0.81]), and myocardial infarction (C-Index: 0.81 [95% CI: 0.78–0.84]). The model’s generalizability was further validated through strong performance on the Sleep Heart Health Study (SHHS), a dataset unseen during pre-training. Additionally, SleepFM demonstrates strong performance on traditional sleep analysis tasks, achieving competitive results in both sleep staging (mean F1 scores: 0.70–0.78) and sleep apnea diagnosis (AUROC: 0.90–0.94). Beyond these standard applications, our analysis reveals that specific sleep stages and physiological signals carry distinct predictive power for different diseases. This work demonstrates how foundation models can leverage sleep polysomnography data to uncover the extensive relationship between sleep physiology and future disease risk.
EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation

Milos Vukadinovic, Xiu Tang, Neal Yuan, Paul Cheng, Debiao Li, Susan Cheng, Bryan He, David Ouyang

Preprint, 2024.

[Abstract] [PDF] [Code]

Echocardiography is the most widely used cardiac imaging modality, capturing ultrasound video data to assess cardiac structure and function. Artificial intelligence (AI) in echocardiography has the potential to streamline manual tasks and improve reproducibility and precision. However, most echocardiography AI models are single-view, single-task systems that do not synthesize complementary information from multiple views captured during a full exam, and thus lead to limited performance and scope of applications. To address this problem, we introduce EchoPrime, a multi-view, view-informed, video-based vision-language foundation model trained on over 12 million video-report pairs. EchoPrime uses contrastive learning to train a unified embedding model for all standard views in a comprehensive echocardiogram study with representation of both rare and common diseases and diagnoses. EchoPrime then utilizes view-classification and a view-informed anatomic attention model to weight video-specific interpretations that accurately maps the relationship between echocardiographic views and anatomical structures. With retrieval-augmented interpretation, EchoPrime integrates information from all echocardiogram videos in a comprehensive study and performs holistic comprehensive clinical echocardiography interpretation. In datasets from two independent healthcare systems, EchoPrime achieves state-of-the art performance on 23 diverse benchmarks of cardiac form and function, surpassing the performance of both task-specific approaches and prior foundation models. Following rigorous clinical evaluation, EchoPrime can assist physicians in the automated preliminary assessment of comprehensive echocardiography.
SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

Rahul Thapa, Bryan He, Magnus Ruud Kjaer, Hyatt Moore, Gauri Ganjoo, Emmanuel Mignot, and James Zou

International Conference on Machine Learning, 2024.

[Abstract] [PDF] [Code]

Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We show that a novel leave-one-out approach for contrastive learning significantly improves downstream task performance compared to representations from standard pairwise contrastive learning. A logistic regression model trained on SleepFM’s learned embeddings outperforms an end-to-end trained convolutional neural network (CNN) on sleep stage classification (macro AUROC 0.88 vs 0.72 and macro AUPRC 0.72 vs 0.48) and sleep disordered breathing detection (AUROC 0.85 vs 0.69 and AUPRC 0.77 vs 0.61). Notably, the learned embeddings achieve 48% top-1 average accuracy in retrieving modality clip pairs from 90,000 candidates. This work demonstrates the value of holistic multi-modal sleep modeling to fully capture the richness of sleep recordings. SleepFM is open source and available at https://github.com/rthapa84/sleepfm-codebase.
Deep learning for transesophageal echocardiography view classification

Kirsten R. Steffner, Matthew Christensen, George Gill, Michael Bowdish, Justin Rhee, Abirami Kumaresan, Bryan He, James Zou, and David Ouyang

Scientific Reports, 2024.

[Abstract] [PDF] [Code]

Transesophageal echocardiography (TEE) imaging is a vital tool used in the evaluation of complex cardiac pathology and the management of cardiac surgery patients. A key limitation to the application of deep learning strategies to intraoperative and intraprocedural TEE data is the complexity and unstructured nature of these images. In the present study, we developed a deep learning-based, multi-category TEE view classification model that can be used to add structure to intraoperative and intraprocedural TEE imaging data. More specifically, we trained a convolutional neural network (CNN) to predict standardized TEE views using labeled intraoperative and intraprocedural TEE videos from Cedars-Sinai Medical Center (CSMC). We externally validated our model on intraoperative TEE videos from Stanford University Medical Center (SUMC). Accuracy of our model was high across all labeled views. The highest performance was achieved for the Trans-Gastric Left Ventricular Short Axis View (area under the receiver operating curve [AUC] = 0.971 at CSMC, 0.957 at SUMC), the Mid-Esophageal Long Axis View (AUC = 0.954 at CSMC, 0.905 at SUMC), the Mid-Esophageal Aortic Valve Short Axis View (AUC = 0.946 at CSMC, 0.898 at SUMC), and the Mid-Esophageal 4-Chamber View (AUC = 0.939 at CSMC, 0.902 at SUMC). Ultimately, we demonstrate that our deep learning model can accurately classify standardized TEE views, which will facilitate further downstream deep learning analyses for intraoperative and intraprocedural TEE imaging.
Electrocardiographic deep learning for predicting post-procedural mortality: a model development and validation study

David Ouyang, John Theurer, Nathan R. Stein, J. Weston Hughes, Pierre Elias, Bryan He, Neal Yuan, Grant Duffy, Roopinder K. Sandhu, Joseph Ebinger, Patrick Botting, Melvin Jujjavarapu, Brian Claggett, James E. Tooley, Tim Poterucha, Jonathan H. Chen, Michael Nurok, Marco Perez, Adler Perotte, James Y. Zou, Nancy R. Cook, Sumeet S. Chugh, Susan Cheng, and Christine M. Albert.

The Lancet Digital Health, 2024.

[Abstract] [PDF] [Code]

Background: Preoperative risk assessments used in clinical practice are insufficient in their ability to identify risk for postoperative mortality. Deep-learning analysis of electrocardiography can identify hidden risk markers that can help to prognosticate postoperative mortality. We aimed to develop a prognostic model that accurately predicts postoperative mortality in patients undergoing medical procedures and who had received preoperative electrocardiographic diagnostic testing.
Methods: In a derivation cohort of preoperative patients with available electrocardiograms (ECGs) from Cedars-Sinai Medical Center (Los Angeles, CA, USA) between Jan 1, 2015 and Dec 31, 2019, a deep-learning algorithm was developed to leverage waveform signals to discriminate postoperative mortality. We randomly split patients (8:1:1) into subsets for training, internal validation, and final algorithm test analyses. Model performance was assessed using area under the receiver operating characteristic curve (AUC) values in the hold-out test dataset and in two external hospital cohorts and compared with the established Revised Cardiac Risk Index (RCRI) score. The primary outcome was post-procedural mortality across three health-care systems.
Findings: 45 969 patients had a complete ECG waveform image available for at least one 12-lead ECG performed within the 30 days before the procedure date (59 975 inpatient procedures and 112 794 ECGs): 36 839 patients in the training dataset, 4549 in the internal validation dataset, and 4581 in the internal test dataset. In the held-out internal test cohort, the algorithm discriminates mortality with an AUC value of 0.83 (95% CI 0.79-0.87), surpassing the discrimination of the RCRI score with an AUC of 0.67 (0.61-0.72). The algorithm similarly discriminated risk for mortality in two independent US health-care systems, with AUCs of 0.79 (0.75-0.83) and 0.75 (0.74-0.76), respectively. Patients determined to be high risk by the deep-learning model had an unadjusted odds ratio (OR) of 8.83 (5.57-13.20) for postoperative mortality compared with an unadjusted OR of 2.08 (0.77-3.50) for postoperative mortality for RCRI scores of more than 2. The deep-learning algorithm performed similarly for patients undergoing cardiac surgery (AUC 0.85 [0.77-0.92]), non-cardiac surgery (AUC 0.83 [0.79-0.88]), and catheterisation or endoscopy suite procedures (AUC 0.76 [0.72-0.81]).
Interpretation: A deep-learning algorithm interpreting preoperative ECGs can improve discrimination of postoperative mortality. The deep-learning algorithm worked equally well for risk stratification of cardiac surgeries, non-cardiac surgeries, and catheterisation laboratory procedures, and was validated in three independent health-care systems. This algorithm can provide additional information to clinicians making the decision to perform medical procedures and stratify the risk of future complications.
Video-based deep learning for automated assessment of left ventricular ejection fraction in pediatric patients

Charitha D. Reddy, Leo Lopez, David Ouyang, James Y. Zou, and Bryan He

Journal of the American Society of Echocardiography, 2023.

[Abstract] [PDF] [Code] [Dataset]

Background: Significant interobserver and interstudy variability occurs for left ventricular (LV) functional indices despite standardization of measurement techniques. Artificial intelligence models trained on adult echocardiograms are not likely to be applicable to a pediatric population. We present EchoNet-Peds, a video-based deep learning algorithm, which matches human expert performance of LV segmentation and ejection fraction (EF).
Methods: A large pediatric data set of 4,467 echocardiograms was used to develop EchoNet-Peds. EchoNet-Peds was trained on 80% of the data for segmentation of the left ventricle and estimation of LVEF. The remaining 20% was used to fine-tune and validate the algorithm.
Results: In both apical 4-chamber and parasternal short-axis views, EchoNet-Peds segments the left ventricle with a Dice similarity coefficient of 0.89. EchoNet-Peds estimates EF with a mean absolute error of 3.66% and can routinely identify pediatric patients with systolic dysfunction (area under the curve of 0.95). EchoNet-Peds was trained on pediatric echocardiograms and performed significantly better to estimate EF (P < .001) than an adult model applied to the same data.
Conclusions: Accurate, rapid automation of EF assessment and recognition of systolic dysfunction in a pediatric population are feasible using EchoNet-Peds with the potential for far-reaching clinical impact. In addition, the first large pediatric data set of annotated echocardiograms is now publicly available for efforts to develop pediatric-specific artificial intelligence algorithms.
Blinded, randomized trial of sonographer versus AI cardiac function assessment

Bryan He, Alan C. Kwan, Jae Hyung Cho, Neal Yuan, Charles Pollick, Takahiro Shiota, Joseph Ebinger, Natalie A. Bello, Janet Wei, Kiranbir Josan, Grant Duffy, Melvin Jujjavarapu, Robert Siegel, Susan Cheng, James Y. Zou, and David Ouyang

Nature, 2023.

[Abstract] [PDF] [Code]

Artificial intelligence (AI) has been developed for echocardiography, although it has not yet been tested with blinding and randomization. Here we designed a blinded, randomized non-inferiority clinical trial (ClinicalTrials.gov ID: NCT05140642; no outside funding) of AI versus sonographer initial assessment of left ventricular ejection fraction (LVEF) to evaluate the impact of AI in the interpretation workflow. The primary end point was the change in the LVEF between initial AI or sonographer assessment and final cardiologist assessment, evaluated by the proportion of studies with substantial change (more than 5% change). From 3,769 echocardiographic studies screened, 274 studies were excluded owing to poor image quality. The proportion of studies substantially changed was 16.8% in the AI group and 27.2% in the sonographer group (difference of −10.4%, 95% confidence interval: −13.2% to −7.7%, P < 0.001 for non-inferiority, P < 0.001 for superiority). The mean absolute difference between final cardiologist assessment and independent previous cardiologist assessment was 6.29% in the AI group and 7.23% in the sonographer group (difference of −0.96%, 95% confidence interval: −1.34% to −0.54%, P < 0.001 for superiority). The AI-guided workflow saved time for both sonographers and cardiologists, and cardiologists were not able to distinguish between the initial assessments by AI versus the sonographer (blinding index of 0.088). For patients undergoing echocardiographic quantification of cardiac function, initial assessment of LVEF by AI was non-inferior to assessment by sonographers.
AI-enabled assessment of cardiac function and video quality in emergency department point-of-care echocardiograms

Bryan He*, Dev Dash*, Youyou Duanmu, Ting Xu Tan, David Ouyang, James Zou

Journal of Emergency Medicine, 2023.

[Abstract] [PDF]

Background: The adoption of point-of-care ultrasound (POCUS) has greatly improved the ability to rapidly evaluate unstable emergency department (ED) patients at the bedside. One major use of POCUS is to obtain echocardiograms to assess cardiac function.
Objectives: We developed EchoNet-POCUS, a novel deep learning system, to aid emergency physicians (EPs) in interpreting POCUS echocardiograms and to reduce operator-to-operator variability.
Methods: We collected a new dataset of POCUS echocardiogram videos obtained in the ED by EPs and annotated the cardiac function and quality of each video. Using this dataset, we train EchoNet-POCUS to evaluate both cardiac function and video quality in POCUS echocardiograms.
Results: EchoNet-POCUS achieves an area under the receiver operating characteristic curve (AUROC) of 0.92 (0.89–0.94) for predicting whether cardiac function is abnormal and an AUROC of 0.81 (0.78–0.85) for predicting video quality.
Conclusions: EchoNet-POCUS can be applied to bedside echocardiogram videos in real time using commodity hardware, as we demonstrate in a prospective pilot study.
Confounders mediate AI prediction of demographics in medical imaging.

Grant Duffy, Shoa L. Clarke, Matthew Christensen, Bryan He, Neal Yuan, Susan Cheng, and David Ouyang

npj Digital Medicine, 2022.

[Abstract] [PDF]

Deep learning has been shown to accurately assess “hidden” phenotypes from medical imaging beyond traditional clinician interpretation. Using large echocardiography datasets from two healthcare systems, we test whether it is possible to predict age, race, and sex from cardiac ultrasound images using deep learning algorithms and assess the impact of varying confounding variables. Using a total of 433,469 videos from Cedars-Sinai Medical Center and 99,909 videos from Stanford Medical Center, we trained video-based convolutional neural networks to predict age, sex, and race. We found that deep learning models were able to identify age and sex, while unable to reliably predict race. Without considering confounding differences between categories, the AI model predicted sex with an AUC of 0.85 (95% CI 0.84–0.86), age with a mean absolute error of 9.12 years (95% CI 9.00–9.25), and race with AUCs ranging from 0.63 to 0.71. When predicting race, we show that tuning the proportion of confounding variables (age or sex) in the training data significantly impacts model AUC (ranging from 0.53 to 0.85), while sex and age prediction was not particularly impacted by adjusting race proportion in the training dataset AUC of 0.81–0.83 and 0.80–0.84, respectively. This suggests significant proportion of AI’s performance on predicting race could come from confounding features being detected. Further work remains to identify the particular imaging features that associate with demographic information and to better understand the risks of demographic identification in medical AI as it pertains to potentially perpetuating bias and disparities.
CloudPred: Predicting Patient Phenotypes From Single-cell RNA-seq

Bryan He, Matthew Thomson, Meena Subramaniam, Richard Perez, Chun Jimmie Ye, James Zou

Pacific Symposium on Biocomputing, 2022.

[Abstract] [PDF]

Single-cell RNA sequencing (scRNA-seq) has the potential to provide powerful, high-resolution signatures to inform disease prognosis and precision medicine. This paper takes an important first step towards this goal by developing an interpretable machine learning algorithm, CloudPred, to predict individuals' disease phenotypes from their scRNA-seq data. Predicting phenotype from scRNA-seq is challenging for standard machine learning methods -- the number of cells measured can vary by orders of magnitude across individuals and the cell populations are also highly heterogeneous. Typical analysis creates pseudo-bulk samples which are biased toward prior annotations and also lose the single cell resolution. CloudPred addresses these challenges via a novel end-to-end differentiable learning algorithm which is coupled with a biologically informed mixture of cell types model. CloudPred automatically infers the cell subpopulation that are salient for the phenotype without prior annotations. We developed a systematic simulation platform to evaluate the performance of CloudPred and several alternative methods we propose, and find that CloudPred outperforms the alternative methods across several settings. We further validated CloudPred on a real scRNA-seq dataset of 142 lupus patients and controls. CloudPred achieves AUROC of 0.98 while identifying a specific subpopulation of CD4 T cells whose presence is highly indicative of lupus. CloudPred is a powerful new framework to predict clinical phenotypes from scRNA-seq data and to identify relevant cells.
AI-enabled in silico immunohistochemical characterization for Alzheimer's disease

Bryan He, Syed Bukhari, Edward Fox, Abubakar Abid, Jeanne Shen, Claudia Kawas, Maria Corrada, Thomas Montine, and James Zou

Cell Reports Methods, 2022.

[Abstract] [PDF] [Code]

We develop a deep learning approach, in silico immunohistochemistry (IHC), which takes routinely collected histochemical-stained samples as input and computationally generates virtual IHC slide images. We apply in silico IHC to Alzheimer's disease samples, where several hallmark changes are conventionally identified using IHC staining across many regions of the brain. In silico IHC computationally identifies neurofibrillary tangles, β-amyloid plaques, and neuritic plaques at a high spatial resolution directly from the histochemical images, with areas under the receiver operating characteristic curve of between 0.88 and 0.92. In silico IHC learns to identify subtle cellular morphologies associated with these lesions and can generate in silico IHC slides that capture key features of the actual IHC.
High-throughput precision phenotyping of left ventricular hypertrophy with cardiovascular deep learning

Grant Duffy, Paul P Cheng, Neal Yuan, Bryan He, Alan C Kwan, Matthew J Shun-Shin, Kevin M Alexander, Joseph Ebinger, Matthew P Lungren, Florian Rader, David H Liang, Ingela Schnittger, Euan A Ashley, James Y Zou, Jignesh Patel, Ronald Witteles, Susan Cheng, and David Ouyang

JAMA Cardiology, 2022.

[Abstract] [PDF] [Code] [Dataset]

Importance: Early detection and characterization of increased left ventricular (LV) wall thickness can markedly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and difficulty differentiating causes of increased wall thickness, such as hypertrophy, cardiomyopathy, and cardiac amyloidosis.
Objective: To assess the accuracy of a deep learning workflow in quantifying ventricular hypertrophy and predicting the cause of increased LV wall thickness.
Design, settings, and participants: This cohort study included physician-curated cohorts from the Stanford Amyloid Center and Cedars-Sinai Medical Center (CSMC) Advanced Heart Disease Clinic for cardiac amyloidosis and the Stanford Center for Inherited Cardiovascular Disease and the CSMC Hypertrophic Cardiomyopathy Clinic for hypertrophic cardiomyopathy from January 1, 2008, to December 31, 2020. The deep learning algorithm was trained and tested on retrospectively obtained independent echocardiogram videos from Stanford Healthcare, CSMC, and the Unity Imaging Collaborative.
Main outcomes and measures: The main outcome was the accuracy of the deep learning algorithm in measuring left ventricular dimensions and identifying patients with increased LV wall thickness diagnosed with hypertrophic cardiomyopathy and cardiac amyloidosis.
Super-resolved spatial transcriptomics by deep data fusion

Ludvig Bergenstråhle, Bryan He, Joseph Bergenstråhle, Alma Andersson, Joakim Lundeberg, James Zou, Jonas Maaskola

Nature Biotechnology, 2021.

[Abstract] [PDF]

In situ RNA capturing has made it possible to record histology and spatial gene expression from the same tissue section. Here, we introduce a method that combines data from both modalities to infer super-resolved full-transcriptome expression maps. Our method unravels transcriptional heterogeneity in micrometer-scale anatomical features and enables image-based in silico spatial transcriptomics without hybridization or sequencing.
Deep learning evaluation of biomarkers from echocardiogram videos

J Weston Hughes, Neal Yuan, Bryan He, Jiahong Ouyang, Joseph Ebinger, Patrick Botting, Jasper Lee, John Theurer, James E Tooley, Koen Nieman, Matthew P Lungren, David H Liang, Ingela Schnittger, Jonathan H Chen, Euan A Ashley, Susan Cheng, David Ouyang, James Zou

EBioMedicine, 2021.

[Abstract] [PDF]

Background

Laboratory testing is routinely used to assay blood biomarkers to provide information on physiologic state beyond what clinicians can evaluate from interpreting medical imaging. We hypothesized that deep learning interpretation of echocardiogram videos can provide additional value in understanding disease states and can evaluate common biomarkers results.

Methods

We developed EchoNet-Labs, a video-based deep learning algorithm to detect evidence of anemia, elevated B-type natriuretic peptide (BNP), troponin I, and blood urea nitrogen (BUN), as well as values of ten additional lab tests directly from echocardiograms. We included patients (n = 39,460) aged 18 years or older with one or more apical-4-chamber echocardiogram videos (n = 70,066) from Stanford Healthcare for training and internal testing of EchoNet-Lab's performance in estimating the most proximal biomarker result. Without fine-tuning, the performance of EchoNet-Labs was further evaluated on an additional external test dataset (n = 1,301) from Cedars-Sinai Medical Center. We calculated the area under the curve (AUC) of the receiver operating characteristic curve for the internal and external test datasets.

Findings

On the held-out test set of Stanford patients not previously seen during model training, EchoNet-Labs achieved an AUC of 0.80 (0.79-0.81) in detecting anemia (low hemoglobin), 0.86 (0.85-0.88) in detecting elevated BNP, 0.75 (0.73-0.78) in detecting elevated troponin I, and 0.74 (0.72-0.76) in detecting elevated BUN. On the external test dataset from Cedars-Sinai, EchoNet-Labs achieved an AUC of 0.80 (0.77-0.82) in detecting anemia, of 0.82 (0.79-0.84) in detecting elevated BNP, of 0.75 (0.72-0.78) in detecting elevated troponin I, and of 0.69 (0.66-0.71) in detecting elevated BUN. We further demonstrate the utility of the model in detecting abnormalities in 10 additional lab tests. We investigate the features necessary for EchoNet-Labs to make successful detection and identify potential mechanisms for each biomarker using well-known and novel explainability techniques.

Interpretation
These results show that deep learning applied to diagnostic imaging can provide additional clinical value and identify phenotypic information beyond current imaging interpretation methods.
Systematic Quantification of Sources of Variation in Ejection Fraction Calculation Using Deep Learning

Neal Yuan, Ishan Jain, Neeraj Rattehalli, Bryan He, Charles Pollick, David Liang, Paul Heidenreich, James Zou, Susan Cheng, and David Ouyang

JACC: Cardiovascular Imaging, 2021.

[Abstract] [PDF]

Accurate left ventricular (LV) ejection fraction (LVEF) assessment is essential for diagnosing and managing many medical conditions, including heart failure, myocardial infarction, valvular disease, and even cancer. Echocardiography is the most frequently used modality to assess LVEF because of its lack of ionizing radiation, widespread availability, and high temporal resolution. However, echocardiographic assessment is also prone to significant intraprovider variability because of its reliance on expert view acquisition and measurements. Potential sources of error in tracings and view acquisition are known. However, the degree to which small variations affect downstream calculations of LVEF has not been well studied.
How to evaluate deep learning for cancer diagnostics – factors and recommendations

Roxana Daneshjou, Bryan He, David Ouyang, James Zou

Biochimica et Biophysica Acta (BBA) - Reviews on Cancer, 2021.

[Abstract] [PDF]

The large volume of data used in cancer diagnosis presents a unique opportunity for deep learning algorithms, which improve in predictive performance with increasing data. When applying deep learning to cancer diagnosis, the goal is often to learn how to classify an input sample (such as images or biomarkers) into predefined categories (such as benign or cancerous). In this article, we examine examples of how deep learning algorithms have been implemented to make predictions related to cancer diagnosis using clinical, radiological, and pathological image data. We present a systematic approach for evaluating the development and application of clinical deep learning algorithms. Based on these examples and the current state of deep learning in medicine, we discuss the future possibilities in this space and outline a roadmap for implementations of deep learning in cancer diagnosis.
Deep learning for biomedical videos: perspective and recommendations

David Ouyang, Zhenqin Wu, Bryan He, James Zou

Artificial Intelligence in Medicine, 2021.

[Abstract] [PDF]

Medical videos capture dynamic information of motion, velocity, and perturbation, which can assist in the diagnosis and understanding of disease. Common examples of medical videos include cardiac ultrasound to assess cardiac motion, endoscopies to screen for gastrointestinal cancers, natural videos to track human behaviors in population health, and microscopy to understand cellular interactions. Deep learning for medical video analysis is rapidly progressing and holds tremendous potential to extract actionable insights from these rich complex data. Here we provide an overview of deep learning approaches to perform segmentation, object tracking, and motion analysis from medical videos. Using cardiac ultrasound and cellular microscopy as case studies, we highlight the unique challenges of working with videos compared to the more standard models used on still images. We further discuss available video datasets that may search as good training sets and benchmarks. We conclude by discussing the future directions for this field with recommendations to practitioners.
Integrating spatial gene expression and breast tumour morphology via deep learning

Bryan He, Ludvig Bergenstråhle, Linnea Stenbeck, Abubakar Abid, Alma Andersson, Åke Borg, Jonas Maaskola, Joakim Lundeberg, and James Zou

Nature Biomedical Engineering, 2020.

[Abstract] [PDF] [Code] [Dataset]

Spatial transcriptomics allows for the measurement of RNA abundance at a high spatial resolution, making it possible to systematically link the morphology of cellular neighbourhoods and spatially localized gene expression. Here, we report the development of a deep learning algorithm for the prediction of local gene expression from haematoxylin-and-eosin-stained histopathology images using a new dataset of 30,612 spatially resolved gene expression data matched to histopathology images from 23 patients with breast cancer. We identified over 100 genes, including known breast cancer biomarkers of intratumoral heterogeneity and the co-localization of tumour growth and immune activation, the expression of which can be predicted from the histopathology images at a resolution of 100 µm. We also show that the algorithm generalizes well to The Cancer Genome Atlas and to other breast cancer gene expression datasets without the need for re-training. Predicting the spatially resolved transcriptome of a tissue directly from tissue images may enable image-based screening for molecular biomarkers with spatial variation.
The Diversity–Innovation Paradox in Science

Bas Hofstra, Vivek V. Kulkarni, Sebastian Munoz-Najar Galvez, Bryan He, Dan Jurafsky, and Daniel A. McFarland

Proceedings of the National Academy of Sciences of the United States of America, 2020.

[Abstract] [PDF] [Code]

Prior work finds a diversity paradox: Diversity breeds innovation, yet underrepresented groups that diversify organizations have less successful careers within them. Does the diversity paradox hold for scientists as well? We study this by utilizing a near-complete population of ∼1.2 million US doctoral recipients from 1977 to 2015 and following their careers into publishing and faculty positions. We use text analysis and machine learning to answer a series of questions: How do we detect scientific innovations? Are underrepresented groups more likely to generate scientific innovations? And are the innovations of underrepresented groups adopted and rewarded? Our analyses show that underrepresented groups produce higher rates of scientific novelty. However, their novel contributions are devalued and discounted: For example, novel contributions by gender and racial minorities are taken up by other scholars at lower rates than novel contributions by gender and racial majorities, and equally impactful contributions of gender and racial minorities are less likely to result in successful scientific careers than for majority groups. These results suggest there may be unwarranted reproduction of stratification in academic careers that discounts diversity’s role in innovation and partly explains the underrepresentation of some groups in academia.
Video-based AI for beat-to-beat cardiac function assessment

David Ouyang, Bryan He, Amirata Ghorbani, Neal Yuan, Joseph Ebinger, Curt P. Langlotz, Paul A. Heidenreich, Robert A. Harrington, David H. Liang, Euan A. Ashley, and James Y. Zou

Nature, 2020.

[Abstract] [PDF] [Code] [Dataset]

Accurate assessment of cardiac function is crucial for diagnosing cardiovascular disease, screening for cardiotoxicity and deciding clinical management in patients with critical illness. However human assessment of cardiac function focuses on a limited sampling of cardiac cycles and has significant interobserver variability despite years of training. To overcome this challenge, we present the first beat-to-beat deep learning algorithm that surpasses human expert performance in the critical tasks of segmenting the left ventricle, estimating ejection fraction, and assessing cardiomyopathy. Trained on echocardiogram videos, our model accurately segments the left ventricle with a Dice Similarity Coefficient of 0.92, predicts ejection fraction with mean absolute error of 4.1%, and reliably classifies heart failure with reduced ejection fraction (AUC of 0.97). Prospective evaluation with repeated human measurements confirms that our model has less variance than experts. By leveraging information across multiple cardiac cycles, our model can identify subtle changes in ejection fraction, is more reproducible than human evaluation, and lays the foundation for precise diagnosis of cardiovascular disease. As a new resource to promote further innovation, we also make publicly available one of the largest medical video dataset of over 10,000 annotated echocardiograms.
Deep learning interpretation of echocardiograms

Amirata Ghorbani, David Ouyang, Abubakar Abid, Bryan He, Jonathan H. Chen, Robert A. Harrington, David H. Liang, Euan A. Ashley, James Y. Zou

npj Digital Medicine, 2020.

[Abstract] [PDF]

Echocardiography uses ultrasound technology to capture high temporal and spatial resolution images of the heart and surrounding structures, and is the most common imaging modality in cardiovascular medicine. Using convolutional neural networks on a large new dataset, we show that deep learning applied to echocardiography can identify local cardiac structures, estimate cardiac function, and predict systemic phenotypes that modify cardiovascular risk but not readily identifiable to human interpretation. Our deep learning model, EchoNet, accurately identified the presence of pacemaker leads (AUC = 0.89), enlarged left atrium (AUC = 0.86), left ventricular hypertrophy (AUC = 0.75), left ventricular end systolic and diastolic volumes (R2 = 0.74 and R2 = 0.70), and ejection fraction (R2 = 0.50), as well as predicted systemic phenotypes of age (R2 = 0.46), sex (AUC = 0.88), weight (R2 = 0.56), and height (R2 = 0.33). Interpretation analysis validates that EchoNet shows appropriate attention to key cardiac structures when performing human-explainable tasks and highlights hypothesis-generating regions of interest when predicting systemic phenotypes difficult for human interpretation. Machine learning on echocardiography images can streamline repetitive tasks in the clinical workflow, provide preliminary interpretation in areas with insufficient qualified cardiologists, and predict phenotypes challenging for human evaluation.
Accelerated Stochastic Power Iteration

Peng Xu, Bryan He, Christopher De Sa, Ioannis Mitliagkas, Chris Ré

Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, 2018.

[Abstract] [PDF]

Principal component analysis (PCA) is one of the most powerful tools for analyzing matrices in machine learning. In this paper, we study methods to accelerate power iteration in the stochastic setting by adding a momentum term. While in the deterministic setting, power iteration with momentum has optimal iteration complexity, we show that naively adding momentum to a stochastic method does not always result in acceleration. We perform a novel, tight variance analysis that reveals a "breaking-point variance" beyond which this acceleration does not occur. Combining this insight with modern variance reduction techniques yields a simple version of power iteration with momentum that achieves the optimal iteration complexities in both the online and offline setting. Our methods are embarrassingly parallel and can produce wall-clock-time speedups. Our approach is very general and applies to many non-convex optimization problems that can now be accelerated using the same technique.
Inferring Generative Model Structure with Static Analysis

Paroma Varma, Bryan He, Payal Bajaj, Nishith Khandwala, Imon Banerjee, Daniel Rubin, Christopher Ré

Advances in Neural Information Processing Systems (NeurIPS), 2017.

[Abstract] [PDF] [Poster]

Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects the quality of the training labels, but is difficult to learn without any ground truth labels. We instead rely on weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus significantly reducing the amount of data required to learn structure. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations identified, improving over the standard sample complexity, which is exponential in n for learning n-th degree relations. Empirically, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.
Learning the Structure of Generative Models without Labeled Data

Stephen H. Bach, Bryan He, Alexander Ratner, Christopher Ré

Proceedings of the 34th International Conference on Machine Learning, 2017.

[Abstract] [PDF]

Curating labeled training data has become the primary bottleneck in machine learning. Recent frameworks address this bottleneck with generative models to synthesize labels at scale from weak supervision sources. The generative model’s dependency structure directly affects the quality of the estimated labels, but selecting a structure automatically without any labeled data is a distinct challenge. We propose a structure estimation method that maximizes the l₁-regularized marginal pseudolikelihood of the observed data. Our analysis shows that the amount of unlabeled data required to identify the true structure scales sublinearly in the number of possible dependencies for a broad class of models. Simulations show that our method is 100x faster than a maximum likelihood approach and selects 1/4 as many extraneous dependencies. We also show that our method provides an average of 1.5 F1 points of improvement over existing, user-developed information extraction applications on real-world data such as PubMed journal abstracts.
Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much

Bryan He, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

Advances in Neural Information Processing Systems (NeurIPS), 2016.

[Abstract] [PDF] [Poster]

Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance.
Signal Quality of Endovascular Electroencephalography

Bryan He, Mosalam Ebrahimi, Leon Palafox, Lakshminarayan Srinivasan

Journal of Neural Engineering, 2016.

[Abstract] [PDF]

Objective, Approach. A growing number of prototypes for diagnosing and treating neurological and psychiatric diseases are predicated on access to high-quality brain signals, which typically requires surgically opening the skull. Where endovascular navigation previously transformed the treatment of cerebral vascular malformations, we now show that it can provide access to brain signals with substantially higher signal quality than scalp recordings. Main results. While endovascular signals were known to be larger in amplitude than scalp signals, our analysis in rabbits borrows a standard technique from communication theory to show endovascular signals also have up to 100× better signal-to-noise ratio. Significance. With a viable minimally-invasive path to high-quality brain signals, patients with brain diseases could one day receive potent electroceuticals through the bloodstream, in the course of a brief outpatient procedure.
Smooth Interactive Submodular Set Cover

Bryan He, Yisong Yue

Advances in Neural Information Processing Systems (NeurIPS), 2015.

[Abstract] [PDF] [Code] [Poster]

Interactive submodular set cover is an interactive variant of submodular set cover over a hypothesis class of submodular functions, where the goal is to satisfy all sufficiently plausible submodular functions to a target threshold using as few (cost-weighted) actions as possible. It models settings where there is uncertainty regarding which submodular function to optimize. In this paper, we propose a new extension, which we call smooth interactive submodular set cover, that allows the target threshold to vary depending on the plausibility of each hypothesis. We present the first algorithm for this more general setting with theoretical guarantees on optimality. We further show how to extend our approach to deal with real-valued functions, which yields new theoretical results for real-valued submodular set cover for both the interactive and non-interactive settings.
Generalized Analog Thresholding for Spike Acquisition at Ultralow Sampling Rates

Bryan He, Alexander Wein, Lav Varshney, Julius Kusuma, Andrew Richardson, Lakshminarayan Srinivasan

Journal of Neurophysiology, 2015.

[Abstract] [PDF]

Efficient spike acquisition techniques are needed to bridge the divide from creating large multielectrode arrays (MEA) to achieving whole-cortex electrophysiology. In this paper, we introduce generalized analog thresholding (gAT), which achieves millisecond temporal resolution with sampling rates as low as 10 Hz. Consider the torrent of data from a single 1,000-channel MEA, which would generate more than 3 GB/min using standard 30-kHz Nyquist sampling. Recent neural signal processing methods based on compressive sensing still require Nyquist sampling as a first step and use iterative methods to reconstruct spikes. Analog thresholding (AT) remains the best existing alternative, where spike waveforms are passed through an analog comparator and sampled at 1 kHz, with instant spike reconstruction. By generalizing AT, the new method reduces sampling rates another order of magnitude, detects more than one spike per interval, and reconstructs spike width. Unlike compressive sensing, the new method reveals a simple closed-form solution to achieve instant (noniterative) spike reconstruction. The base method is already robust to hardware nonidealities, including realistic quantization error and integration noise. Because it achieves these considerable specifications using hardware-friendly components like integrators and comparators, generalized AT could translate large-scale MEAs into implantable devices for scientific investigation and medical technology.
Feasibility of FRI-based Square-Wave Reconstruction with Quantization Error and Integrator Noise

Bryan He, Alexander Wein, Lakshminarayan Srinivasan

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

[Abstract] [PDF] [Poster]

Conventional Nyquist sampling and reconstruction of square waves at a finite rate will always result in aliasing because square waves are not band limited. Based on methods for signals with finite rate of innovation (FRI), generalized Analog Thresholding (gAT-n) is able to sample square waves at a much lower rate under ideal conditions. The target application is efficient, real-time, implantable neurotechnology that extracts spiking neural signals from the brain. This paper studies the effect of integrator noise and quantization error on the accuracy of reconstructed square waves. We explore realistic values for integrator noise and input signal amplitude, using specifications from the Texas Instruments IVC102 integrator chip as a first-pass example because of its readily-available data sheet. ADC resolution is varied from 1 to 16 bits. This analysis indicates that gAT-1 is robust against these hardware non-idealities where gAT-2 degrades less gracefully, which makes gAT-1 a prime target for hardware implementation in a custom integrated circuit.
A Simple Optimal Binary Representation of Mosaic Floorplans and Baxter Permutations

Bryan He

Theoretical Computer Science, 2014.

[Abstract] [PDF]

Mosaic floorplans are rectangular structures subdivided into smaller rectangular sections and are widely used in VLSI circuit design. Baxter permutations are a set of permutations that have been shown to have a one-to-one correspondence to objects in the Baxter combinatorial family, which includes mosaic floorplans. An important problem in this area is to find short binary string representations of the set of n-block mosaic floorplans and Baxter permutations of length n. The best known representation is the Quarter-State Sequence which uses 4n bits. This paper introduces a simple binary representation of n-block mosaic floorplan using 3n−3 bits. It has been shown that any binary representation of n-block mosaic floorplans must use at least (3n−o(n)) bits. Therefore, the representation presented in this paper is optimal (up to an additive lower order term).
Neural Shaping with Joint Optimization of Controller and Plant under Restricted Dynamics

Bryan He, Lakshminarayan Srinivasan

Information Theory and Applications Workshop (ITA), 2014.

[Abstract] [PDF]

The prototypical brain-computer interface (BCI) algorithm translates brain activity into changes in the states of a computer program, for typing or cursor movement. Most approaches use neural decoding which learns how the user has encoded their intent in their noisy neural signals. Recent adaptive decoders for cursor movement improved BCI performance by modeling the user as a feedback controller; when this model accounts for adaptive control, the neural decoder is termed co-adaptive. This recent collection of control-inspired neural decoding strategies disregards a major antecedent conceptual realization, whereby the user could be induced to adopt an encoding strategy (control policy) such that the encoder-decoder pair (or equivalently, controller-plant pair) is optimal under a desired cost function. We call this alternate conceptual approach neural shaping, in contradistinction to neural decoding. Previous work illuminated the general form of optimal controller-plant pair under a cost representing information gain. For BCI applications requiring the user to issue discrete-valued commands, the information-gain-optimal pair, based on the posterior matching scheme, can be user-friendly. In this paper, we discuss the application of neural shaping to cursor control with continuous-valued states based on continuous-valued user commands. We examine the problem of jointly optimizing controller and plant under quadratic expected cost and restricted linear plant dynamics. This simplification reduces joint controller-plant selection to a static optimization problem, similar to approaches in structural engineering and other areas. This perspective suggests that recent BCI approaches that alternate between adaptive neural decoders and static neural decoders could be local Pareto-optimal, representing a suboptimal iterative-type solution to the optimal joint controller-plant problem.
Dynamic Analysis of Naive Adaptive Brain-Machine Interfaces

Kevin Kowalski, Bryan He, Lakshminarayan Srinivasan

Neural Computation, 2013.

[Abstract] [PDF]

The closed-loop operation of brain-machine interfaces (BMI) provides a context to discover foundational principles behind human-computer interaction, with emerging clinical applications to stroke, neuromuscular diseases, and trauma. In the canonical BMI, a user controls a prosthetic limb through neural signals that are recorded by electrodes and processed by a decoder into limb movements. In laboratory demonstrations with able-bodied test subjects, parameters of the decoder are commonly tuned using training data that include neural signals and corresponding overt arm movements. In the application of BMI to paralysis or amputation, arm movements are not feasible, and imagined movements create weaker, partially unrelated patterns of neural activity. BMI training must begin naive, without access to these prototypical methods for parameter initialization used in most laboratory BMI demonstrations.
Naive adaptive BMI refer to a class of methods recently introduced to address this problem. We first identify the basic elements of existing approaches based on adaptive filtering and define a decoder, ReFIT-PPF to represent these existing approaches. We then present Joint RSE, a novel approach that logically extends prior approaches. Using recently developed human- and synthetic-subjects closed-loop BMI simulation platforms, we show that Joint RSE significantly outperforms ReFIT-PPF and nonadaptive (static) decoders. Control experiments demonstrate the critical role of jointly estimating neural parameters and user intent. In addition, we show that nonzero sensorimotor delay in the user significantly degrades ReFIT-PPF but not Joint RSE, owing to differences in the prior on intended velocity. Paradoxically, substantial differences in the nature of sensory feedback between these methods do not contribute to differences in performance between Joint RSE and ReFIT-PPF. Instead, BMI performance improvement is driven by machine learning, which outpaces rates of human learning in the human-subjects simulation platform. In this regime, nuances of error-related feedback to the human user are less relevant to rapid BMI mastery.

Publications

Background

Methods

Findings

Interpretation