A Deterministic Information Bottleneck Method for Clustering Mixed-Type Data

Efthymios Costa; Ioanna Papatsouma; Angelos Markos

Conferences CIMPA, 18th International Federation of Classification Societies

Efthymios Costa, Ioanna Papatsouma, Angelos Markos

Last modified: 2024-05-14

Abstract

In this paper, we present an information-theoretic method for clustering mixed-type data, that is, data consisting of both continuous and categorical variables. The method is a variant of the Deterministic Information Bottleneck algorithm which optimally compresses the data while retaining relevant information about the underlying structure. We compare the performance of the proposed method to that of three well-established clustering methods (KAMILA, K-Prototypes, and Partitioning Around Medoids with Gower’s dissimilarity) on simulated and real-world datasets. The results demonstrate that the proposed approach represents a competitive alternative to conventional clustering techniques, particularly in scenarios with unbalanced clusters and significant overlap between clusters.

Keywords

Deterministic Information Bottleneck, Clustering, Mixed-type Data, Mutual Information