Many domains are naturally organized in an abstraction hierarchy or
taxonomy, where the instances in "nearby" classes in the taxonomy
are similar.  In this paper, we provide a general probabilistic
framework for clustering data into a set of classes organized as a
taxonomy, where each class is associated with a probabilistic model
from which the data was generated.  The clustering algorithm
simultaneously optimizes three things: the assignment of data
instances to clusters, the models associated with the clusters, and
the structure of the abstraction hierarchy.  A unique feature of our
approach is that it utilizes global optimization algorithms for both
of the last two steps, reducing the sensitivity to noise and the
propensity to local maxima that are characteristic of algorithms that
only take local steps.  We provide a theoretical analysis for our
algorithm, showing that it converges to a local maximum of the
probability of model and data.  We present experimental results on
synthetic data, and on real data in the domains of gene expression and
text.
<p>