Rich Probabilistic Models for Gene Expression
E. Segal, B. Taskar, A. Gasch, N. Friedman, and D. Koller
In 9th Inter. Conf. on Intelligent Systems for Molecular Biology (ISMB), 2001.
Clustering is commonly used for analyzing gene expression data.
Despite their successes, clustering methods suffer from a number of
limitations. First, these methods reveal similarities that exist over
all of the measurements, while obscuring relationships that exist over
only a subset of the data. Second, clustering methods cannot readily
incorporate additional types of information, such as clinical data or
known attributes of genes. To circumvent these shortcomings, we
propose the use of a single coherent probabilistic model, that
encompasses much of the rich structure in the genomic expression data,
while incorporating additional information such as experiment type,
putative binding sites, or functional information. We show how this
model can be learned from the data, allowing us to discover patterns
in the data and dependencies between the gene expression patterns and
additional attributes. The learned model reveals
context-specific relationships, that exist only over a subset
of the experiments in the dataset. We demonstrate the power of our
approach on synthetic data and on two real-world gene expression data
sets for yeast. For example, we demonstrate a novel functionality
that falls naturally out of our framework: predicting the ``cluster''
of the array resulting from a gene mutation based only on the gene's
expression pattern in the context of other mutations.
Back to Nir's publications page