Clustering

When analyzing biomolecular data, researchers are often confronted with questions such as "How many groups are in my data?" or "How robust is the identified grouping?"

Typical cluster analysis workflow


Cluster analysis provides the mathematical and algorithmic fundamentals for identifying groups of similar objects. However, performance issues quickly arise when analyzing typical data sets comprising thousands of samples and up to millions of features. We research the adaptation of clustering approaches to high-dimensional biomolecular data, including parallel cluster algorithms. Furthermore, we develop new methods for the evaluation of clusterings with respect to their stability, such as the combination of multiple cluster validation indices.

 

Selected publications

 

J. Kraus, L. Lausser, and H. A. Kestler. Exhaustive k-nearest-neighbour subspace clustering. Journal of Statistical Computation and Simulation, 85(1):30–46, 2015.

J. M. Kraus, C. Müssel, G. Palm, and H. A. Kestler. Multi-objective selection for collecting cluster alternatives. Computational Statistics, 26(2):341–353, 2011.

J. M. Kraus and H. A. Kestler. A highly efficient multi-core algorithm for clustering extremely large datasets. BMC Bioinformatics, 11(1):169, 2010.

H. A. Kestler, J. Kraus, G. Palm, and F. Schwenker. On the effects of constraints in semi-supervised hierarchical clustering. In F. Schwenker and S. Marinai, editors, Artificial Neural Networks in Pattern Recognition (ANNPR 06), volume LNAI 4087, pages 57–66. Springer-Verlag, Heidelberg, 2006.

T. Mattfeldt, H. Wolter, R. Kemmerling, H.-W. Gottfried, and H. A. Kestler. Cluster analysis of comparative genomic hybridization (CGH) data using self-organizing maps: Application to prostate carcinomas. Analytical Cellular Pathology, 23(1):29–37, 2001.