Biological data, such as gene expression profiles or protein sequences, is often organized in a hierarchy of classes, where the instances assigned to nearby classes in the tree are similar. Most approaches for constructing a hierarchy use simple local operations, that are very sensitive to noise or variation in the data. In this paper, we describe probabilistic abstraction hierarchies (PAH), a general probabilistic framework for clustering data into a hierarchy, and show how it can be applied to a wide variety of biological data sets. In a PAH, each class is associated with a probabilistic generative model for the data in the class. The PAH clustering algorithm simultaneously optimizes three things: the assignment of data instances to clusters, the models associated with the clusters, and the structure of the abstraction hierarchy. A unique feature of the PAH approach is that it utilizes global optimization algorithms for the last two steps, substantially reducing the sensitivity to noise and the propensity to local maxima. We show how to apply this framework to gene expression data, protein sequence data, and HIV protease sequence data. We also show how our framework supports hierarchies involving more than one type of data. We demonstrate that our method extracts useful biological knowledge and is substantially more robust than hierarchical agglomerative clustering.