Conferences CIMPA, 18th International Federation of Classification Societies

Font Size: 
Classifying multivariate observations in data sets with asymmetric features and outlying observations
Brian Franczak

Last modified: 2024-04-25

Abstract


Classification can be defined as the process of sorting similar objects into groups. Classification can be performed in unsupervised, semi-supervised, or fully supervised settings. In the unsupervised setting, also known as clustering, no prior information is used, while the other two settings use some prior knowledge. Model-based clustering is the process of using a finite mixture model for unsupervised classification. This talk will discuss an approach for performing model-based clustering and outlier detection for incomplete multivariate data sets. A expectation-maximization (EM) based parameter estimation scheme is discussed and utilized for the considered mixtures of contaminated shifted asymmetric Laplace distributions. This EM based scheme iteratively performs single imputation while estimating the maximum likelihood estimates of the model of interest. At convergence, we use traditional likelihood-based criteria like the Bayesian information criterion for model selection. We assess classification performance using the adjusted Rand index and give other relevant statistics demonstrating the overall performance of the parameter estimation scheme. We demonstrate the effectiveness of the proposed model using simulated and real data sets.

Keywords


Model-based clustering, outlier detection, imputation, finite mixture models, expectation-maximization algorithm