We propose a probabilistic model for cellular processes, and an algorithm
for discovering them from gene expression data. A process is associated with
a set of genes that participate in it; unlike clustering techniques, our
model allows genes to participate in multiple processes. Each process may be
active to a different degree in each experiment. The expression measurement
for gene g in array a is a sum, over all processes in which
g participates, of the activity levels of these processes in array
a. We describe an iterative procedure, based on the EM algorithm, for
decomposing the expression matrix into a given number of processes. We
present results on Yeast gene expression data, which indicate that our
approach identifies real biological processes.