To appear in Pacific Symposium on Biocomputing, 2022.
Single-cell RNA sequencing (scRNA-seq) has the potential to provide powerful, high-resolution signatures to inform disease prognosis and precision medicine. This paper takes an important first step towards this goal by developing an interpretable machine learning algorithm, CloudPred, to predict individuals' disease phenotypes from their scRNA-seq data. Predicting phenotype from scRNA-seq is challenging for standard machine learning methods -- the number of cells measured can vary by orders of magnitude across individuals and the cell populations are also highly heterogeneous. Typical analysis creates pseudo-bulk samples which are biased toward prior annotations and also lose the single cell resolution. CloudPred addresses these challenges via a novel end-to-end differentiable learning algorithm which is coupled with a biologically informed mixture of cell types model. CloudPred automatically infers the cell subpopulation that are salient for the phenotype without prior annotations. We developed a systematic simulation platform to evaluate the performance of CloudPred and several alternative methods we propose, and find that CloudPred outperforms the alternative methods across several settings. We further validated CloudPred on a real scRNA-seq dataset of 142 lupus patients and controls. CloudPred achieves AUROC of 0.98 while identifying a specific subpopulation of CD4 T cells whose presence is highly indicative of lupus. CloudPred is a powerful new framework to predict clinical phenotypes from scRNA-seq data and to identify relevant cells.
To appear in Nature Biotechnology, 2021.
In situ RNA capturing has made it possible to record histology and spatial gene expression from the same tissue section. Here, we introduce a method that combines data from both modalities to infer super-resolved full-transcriptome expression maps. Our method unravels transcriptional heterogeneity in micrometer-scale anatomical features and enables image-based in silico spatial transcriptomics without hybridization or sequencing.
Laboratory testing is routinely used to assay blood biomarkers to provide information on physiologic state beyond what clinicians can evaluate from interpreting medical imaging. We hypothesized that deep learning interpretation of echocardiogram videos can provide additional value in understanding disease states and can evaluate common biomarkers results.
We developed EchoNet-Labs, a video-based deep learning algorithm to detect evidence of anemia, elevated B-type natriuretic peptide (BNP), troponin I, and blood urea nitrogen (BUN), as well as values of ten additional lab tests directly from echocardiograms. We included patients (n = 39,460) aged 18 years or older with one or more apical-4-chamber echocardiogram videos (n = 70,066) from Stanford Healthcare for training and internal testing of EchoNet-Lab's performance in estimating the most proximal biomarker result. Without fine-tuning, the performance of EchoNet-Labs was further evaluated on an additional external test dataset (n = 1,301) from Cedars-Sinai Medical Center. We calculated the area under the curve (AUC) of the receiver operating characteristic curve for the internal and external test datasets.
On the held-out test set of Stanford patients not previously seen during model training, EchoNet-Labs achieved an AUC of 0.80 (0.79-0.81) in detecting anemia (low hemoglobin), 0.86 (0.85-0.88) in detecting elevated BNP, 0.75 (0.73-0.78) in detecting elevated troponin I, and 0.74 (0.72-0.76) in detecting elevated BUN. On the external test dataset from Cedars-Sinai, EchoNet-Labs achieved an AUC of 0.80 (0.77-0.82) in detecting anemia, of 0.82 (0.79-0.84) in detecting elevated BNP, of 0.75 (0.72-0.78) in detecting elevated troponin I, and of 0.69 (0.66-0.71) in detecting elevated BUN. We further demonstrate the utility of the model in detecting abnormalities in 10 additional lab tests. We investigate the features necessary for EchoNet-Labs to make successful detection and identify potential mechanisms for each biomarker using well-known and novel explainability techniques.
These results show that deep learning applied to diagnostic imaging can provide additional clinical value and identify phenotypic information beyond current imaging interpretation methods.
JACC: Cardiovascular Imaging, 2021.
Accurate left ventricular (LV) ejection fraction (LVEF) assessment is essential for diagnosing and managing many medical conditions, including heart failure, myocardial infarction, valvular disease, and even cancer. Echocardiography is the most frequently used modality to assess LVEF because of its lack of ionizing radiation, widespread availability, and high temporal resolution. However, echocardiographic assessment is also prone to significant intraprovider variability because of its reliance on expert view acquisition and measurements. Potential sources of error in tracings and view acquisition are known. However, the degree to which small variations affect downstream calculations of LVEF has not been well studied.
Biochimica et Biophysica Acta (BBA) - Reviews on Cancer, 2021.
The large volume of data used in cancer diagnosis presents a unique opportunity for deep learning algorithms, which improve in predictive performance with increasing data. When applying deep learning to cancer diagnosis, the goal is often to learn how to classify an input sample (such as images or biomarkers) into predefined categories (such as benign or cancerous). In this article, we examine examples of how deep learning algorithms have been implemented to make predictions related to cancer diagnosis using clinical, radiological, and pathological image data. We present a systematic approach for evaluating the development and application of clinical deep learning algorithms. Based on these examples and the current state of deep learning in medicine, we discuss the future possibilities in this space and outline a roadmap for implementations of deep learning in cancer diagnosis.
Artificial Intelligence in Medicine, 2021.
Medical videos capture dynamic information of motion, velocity, and perturbation, which can assist in the diagnosis and understanding of disease. Common examples of medical videos include cardiac ultrasound to assess cardiac motion, endoscopies to screen for gastrointestinal cancers, natural videos to track human behaviors in population health, and microscopy to understand cellular interactions. Deep learning for medical video analysis is rapidly progressing and holds tremendous potential to extract actionable insights from these rich complex data. Here we provide an overview of deep learning approaches to perform segmentation, object tracking, and motion analysis from medical videos. Using cardiac ultrasound and cellular microscopy as case studies, we highlight the unique challenges of working with videos compared to the more standard models used on still images. We further discuss available video datasets that may search as good training sets and benchmarks. We conclude by discussing the future directions for this field with recommendations to practitioners.
Nature Biomedical Engineering, 2020.
Spatial transcriptomics allows for the measurement of RNA abundance at a high spatial resolution, making it possible to systematically link the morphology of cellular neighbourhoods and spatially localized gene expression. Here, we report the development of a deep learning algorithm for the prediction of local gene expression from haematoxylin-and-eosin-stained histopathology images using a new dataset of 30,612 spatially resolved gene expression data matched to histopathology images from 23 patients with breast cancer. We identified over 100 genes, including known breast cancer biomarkers of intratumoral heterogeneity and the co-localization of tumour growth and immune activation, the expression of which can be predicted from the histopathology images at a resolution of 100 µm. We also show that the algorithm generalizes well to The Cancer Genome Atlas and to other breast cancer gene expression datasets without the need for re-training. Predicting the spatially resolved transcriptome of a tissue directly from tissue images may enable image-based screening for molecular biomarkers with spatial variation.
Proceedings of the National Academy of Sciences of the United States of America, 2020.
Prior work finds a diversity paradox: Diversity breeds innovation, yet underrepresented groups that diversify organizations have less successful careers within them. Does the diversity paradox hold for scientists as well? We study this by utilizing a near-complete population of ∼1.2 million US doctoral recipients from 1977 to 2015 and following their careers into publishing and faculty positions. We use text analysis and machine learning to answer a series of questions: How do we detect scientific innovations? Are underrepresented groups more likely to generate scientific innovations? And are the innovations of underrepresented groups adopted and rewarded? Our analyses show that underrepresented groups produce higher rates of scientific novelty. However, their novel contributions are devalued and discounted: For example, novel contributions by gender and racial minorities are taken up by other scholars at lower rates than novel contributions by gender and racial majorities, and equally impactful contributions of gender and racial minorities are less likely to result in successful scientific careers than for majority groups. These results suggest there may be unwarranted reproduction of stratification in academic careers that discounts diversity’s role in innovation and partly explains the underrepresentation of some groups in academia.
Accurate assessment of cardiac function is crucial for diagnosing cardiovascular disease, screening for cardiotoxicity and deciding clinical management in patients with critical illness. However human assessment of cardiac function focuses on a limited sampling of cardiac cycles and has significant interobserver variability despite years of training. To overcome this challenge, we present the first beat-to-beat deep learning algorithm that surpasses human expert performance in the critical tasks of segmenting the left ventricle, estimating ejection fraction, and assessing cardiomyopathy. Trained on echocardiogram videos, our model accurately segments the left ventricle with a Dice Similarity Coefficient of 0.92, predicts ejection fraction with mean absolute error of 4.1%, and reliably classifies heart failure with reduced ejection fraction (AUC of 0.97). Prospective evaluation with repeated human measurements confirms that our model has less variance than experts. By leveraging information across multiple cardiac cycles, our model can identify subtle changes in ejection fraction, is more reproducible than human evaluation, and lays the foundation for precise diagnosis of cardiovascular disease. As a new resource to promote further innovation, we also make publicly available one of the largest medical video dataset of over 10,000 annotated echocardiograms.
npj Digital Medicine, 2020.
Echocardiography uses ultrasound technology to capture high temporal and spatial resolution images of the heart and surrounding structures, and is the most common imaging modality in cardiovascular medicine. Using convolutional neural networks on a large new dataset, we show that deep learning applied to echocardiography can identify local cardiac structures, estimate cardiac function, and predict systemic phenotypes that modify cardiovascular risk but not readily identifiable to human interpretation. Our deep learning model, EchoNet, accurately identified the presence of pacemaker leads (AUC = 0.89), enlarged left atrium (AUC = 0.86), left ventricular hypertrophy (AUC = 0.75), left ventricular end systolic and diastolic volumes (R2 = 0.74 and R2 = 0.70), and ejection fraction (R2 = 0.50), as well as predicted systemic phenotypes of age (R2 = 0.46), sex (AUC = 0.88), weight (R2 = 0.56), and height (R2 = 0.33). Interpretation analysis validates that EchoNet shows appropriate attention to key cardiac structures when performing human-explainable tasks and highlights hypothesis-generating regions of interest when predicting systemic phenotypes difficult for human interpretation. Machine learning on echocardiography images can streamline repetitive tasks in the clinical workflow, provide preliminary interpretation in areas with insufficient qualified cardiologists, and predict phenotypes challenging for human evaluation.
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, 2018.
Principal component analysis (PCA) is one of the most powerful tools for analyzing matrices in machine learning. In this paper, we study methods to accelerate power iteration in the stochastic setting by adding a momentum term. While in the deterministic setting, power iteration with momentum has optimal iteration complexity, we show that naively adding momentum to a stochastic method does not always result in acceleration. We perform a novel, tight variance analysis that reveals a "breaking-point variance" beyond which this acceleration does not occur. Combining this insight with modern variance reduction techniques yields a simple version of power iteration with momentum that achieves the optimal iteration complexities in both the online and offline setting. Our methods are embarrassingly parallel and can produce wall-clock-time speedups. Our approach is very general and applies to many non-convex optimization problems that can now be accelerated using the same technique.
Advances in Neural Information Processing Systems (NeurIPS), 2017.
Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects the quality of the training labels, but is difficult to learn without any ground truth labels. We instead rely on weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus significantly reducing the amount of data required to learn structure. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations identified, improving over the standard sample complexity, which is exponential in n for learning n-th degree relations. Empirically, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.
Proceedings of the 34th International Conference on Machine Learning, 2017.
Curating labeled training data has become the primary bottleneck in machine learning. Recent frameworks address this bottleneck with generative models to synthesize labels at scale from weak supervision sources. The generative model’s dependency structure directly affects the quality of the estimated labels, but selecting a structure automatically without any labeled data is a distinct challenge. We propose a structure estimation method that maximizes the l1-regularized marginal pseudolikelihood of the observed data. Our analysis shows that the amount of unlabeled data required to identify the true structure scales sublinearly in the number of possible dependencies for a broad class of models. Simulations show that our method is 100x faster than a maximum likelihood approach and selects 1/4 as many extraneous dependencies. We also show that our method provides an average of 1.5 F1 points of improvement over existing, user-developed information extraction applications on real-world data such as PubMed journal abstracts.
Advances in Neural Information Processing Systems (NeurIPS), 2016.
Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance.
Journal of Neural Engineering, 2016.
Objective, Approach. A growing number of prototypes for diagnosing and treating neurological and psychiatric diseases are predicated on access to high-quality brain signals, which typically requires surgically opening the skull. Where endovascular navigation previously transformed the treatment of cerebral vascular malformations, we now show that it can provide access to brain signals with substantially higher signal quality than scalp recordings. Main results. While endovascular signals were known to be larger in amplitude than scalp signals, our analysis in rabbits borrows a standard technique from communication theory to show endovascular signals also have up to 100× better signal-to-noise ratio. Significance. With a viable minimally-invasive path to high-quality brain signals, patients with brain diseases could one day receive potent electroceuticals through the bloodstream, in the course of a brief outpatient procedure.
Advances in Neural Information Processing Systems (NeurIPS), 2015.
Interactive submodular set cover is an interactive variant of submodular set cover over a hypothesis class of submodular functions, where the goal is to satisfy all sufficiently plausible submodular functions to a target threshold using as few (cost-weighted) actions as possible. It models settings where there is uncertainty regarding which submodular function to optimize. In this paper, we propose a new extension, which we call smooth interactive submodular set cover, that allows the target threshold to vary depending on the plausibility of each hypothesis. We present the first algorithm for this more general setting with theoretical guarantees on optimality. We further show how to extend our approach to deal with real-valued functions, which yields new theoretical results for real-valued submodular set cover for both the interactive and non-interactive settings.
Journal of Neurophysiology, 2015.
Efficient spike acquisition techniques are needed to bridge the divide from creating large multielectrode arrays (MEA) to achieving whole-cortex electrophysiology. In this paper, we introduce generalized analog thresholding (gAT), which achieves millisecond temporal resolution with sampling rates as low as 10 Hz. Consider the torrent of data from a single 1,000-channel MEA, which would generate more than 3 GB/min using standard 30-kHz Nyquist sampling. Recent neural signal processing methods based on compressive sensing still require Nyquist sampling as a first step and use iterative methods to reconstruct spikes. Analog thresholding (AT) remains the best existing alternative, where spike waveforms are passed through an analog comparator and sampled at 1 kHz, with instant spike reconstruction. By generalizing AT, the new method reduces sampling rates another order of magnitude, detects more than one spike per interval, and reconstructs spike width. Unlike compressive sensing, the new method reveals a simple closed-form solution to achieve instant (noniterative) spike reconstruction. The base method is already robust to hardware nonidealities, including realistic quantization error and integration noise. Because it achieves these considerable specifications using hardware-friendly components like integrators and comparators, generalized AT could translate large-scale MEAs into implantable devices for scientific investigation and medical technology.
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
Conventional Nyquist sampling and reconstruction of square waves at a finite rate will always result in aliasing because square waves are not band limited. Based on methods for signals with finite rate of innovation (FRI), generalized Analog Thresholding (gAT-n) is able to sample square waves at a much lower rate under ideal conditions. The target application is efficient, real-time, implantable neurotechnology that extracts spiking neural signals from the brain. This paper studies the effect of integrator noise and quantization error on the accuracy of reconstructed square waves. We explore realistic values for integrator noise and input signal amplitude, using specifications from the Texas Instruments IVC102 integrator chip as a first-pass example because of its readily-available data sheet. ADC resolution is varied from 1 to 16 bits. This analysis indicates that gAT-1 is robust against these hardware non-idealities where gAT-2 degrades less gracefully, which makes gAT-1 a prime target for hardware implementation in a custom integrated circuit.
Theoretical Computer Science, 2014.
Mosaic floorplans are rectangular structures subdivided into smaller rectangular sections and are widely used in VLSI circuit design. Baxter permutations are a set of permutations that have been shown to have a one-to-one correspondence to objects in the Baxter combinatorial family, which includes mosaic floorplans. An important problem in this area is to find short binary string representations of the set of n-block mosaic floorplans and Baxter permutations of length n. The best known representation is the Quarter-State Sequence which uses 4n bits. This paper introduces a simple binary representation of n-block mosaic floorplan using 3n−3 bits. It has been shown that any binary representation of n-block mosaic floorplans must use at least (3n−o(n)) bits. Therefore, the representation presented in this paper is optimal (up to an additive lower order term).
Information Theory and Applications Workshop (ITA), 2014.
The prototypical brain-computer interface (BCI) algorithm translates brain activity into changes in the states of a computer program, for typing or cursor movement. Most approaches use neural decoding which learns how the user has encoded their intent in their noisy neural signals. Recent adaptive decoders for cursor movement improved BCI performance by modeling the user as a feedback controller; when this model accounts for adaptive control, the neural decoder is termed co-adaptive. This recent collection of control-inspired neural decoding strategies disregards a major antecedent conceptual realization, whereby the user could be induced to adopt an encoding strategy (control policy) such that the encoder-decoder pair (or equivalently, controller-plant pair) is optimal under a desired cost function. We call this alternate conceptual approach neural shaping, in contradistinction to neural decoding. Previous work illuminated the general form of optimal controller-plant pair under a cost representing information gain. For BCI applications requiring the user to issue discrete-valued commands, the information-gain-optimal pair, based on the posterior matching scheme, can be user-friendly. In this paper, we discuss the application of neural shaping to cursor control with continuous-valued states based on continuous-valued user commands. We examine the problem of jointly optimizing controller and plant under quadratic expected cost and restricted linear plant dynamics. This simplification reduces joint controller-plant selection to a static optimization problem, similar to approaches in structural engineering and other areas. This perspective suggests that recent BCI approaches that alternate between adaptive neural decoders and static neural decoders could be local Pareto-optimal, representing a suboptimal iterative-type solution to the optimal joint controller-plant problem.
Neural Computation, 2013.
The closed-loop operation of brain-machine interfaces (BMI) provides a context to discover foundational principles behind human-computer interaction, with emerging clinical applications to stroke, neuromuscular diseases, and trauma. In the canonical BMI, a user controls a prosthetic limb through neural signals that are recorded by electrodes and processed by a decoder into limb movements. In laboratory demonstrations with able-bodied test subjects, parameters of the decoder are commonly tuned using training data that include neural signals and corresponding overt arm movements. In the application of BMI to paralysis or amputation, arm movements are not feasible, and imagined movements create weaker, partially unrelated patterns of neural activity. BMI training must begin naive, without access to these prototypical methods for parameter initialization used in most laboratory BMI demonstrations.
Naive adaptive BMI refer to a class of methods recently introduced to address this problem. We first identify the basic elements of existing approaches based on adaptive filtering and define a decoder, ReFIT-PPF to represent these existing approaches. We then present Joint RSE, a novel approach that logically extends prior approaches. Using recently developed human- and synthetic-subjects closed-loop BMI simulation platforms, we show that Joint RSE significantly outperforms ReFIT-PPF and nonadaptive (static) decoders. Control experiments demonstrate the critical role of jointly estimating neural parameters and user intent. In addition, we show that nonzero sensorimotor delay in the user significantly degrades ReFIT-PPF but not Joint RSE, owing to differences in the prior on intended velocity. Paradoxically, substantial differences in the nature of sensory feedback between these methods do not contribute to differences in performance between Joint RSE and ReFIT-PPF. Instead, BMI performance improvement is driven by machine learning, which outpaces rates of human learning in the human-subjects simulation platform. In this regime, nuances of error-related feedback to the human user are less relevant to rapid BMI mastery.
Frontiers in Algorithmics and Algorithmic Aspects in Information and , May 2012.
A floorplan is a rectangle subdivided into smaller rectangular blocks by horizontal and vertical line segments. Two floorplans are considered equivalent if and only if there is a bijection between the blocks in the two floorplans such that the corresponding blocks have the same horizontal and vertical boundaries. Mosaic floorplans use the same objects as floorplans but use an alternative definition of equivalence. Two mosaic floorplans are considered equivalent if and only if they can be converted into equivalent floorplans by sliding the line segments that divide the blocks. The Quarter-State Sequence method of representing mosaic floorplans uses 4n bits, where n is the number of blocks. This paper introduces a method of representing an n-block mosaic floorplan with a (3n − 3)-bit binary string. It has been proven that the shortest possible binary string representation of a mosaic floorplan has a length of (3n − o(n)) bits. Therefore, the representation presented in this paper is asymptotically optimal. Baxter permutations are a set of permutations defined by prohibited subsequences. There exists a bijection between mosaic floorplans and Baxter permutations. As a result, the methods introduced in this paper also create an optimal binary string representation of Baxter permutations.