Conferences CIMPA, 18th International Federation of Classification Societies

Font Size: 
Clustering for High-Dimensional, Nested Data with Categorical Outcomes Using a Generalized Linear Mixed Effects Model with Simultaneous Variable Selection
Samantha Manning

Last modified: 2024-05-14

Abstract


I propose a model-based clustering method for high-dimensional, longitudinal data with categorical outcomes via regularization. The development of this method was motivated in part by a study on 177 Thai mother-child dyads to identify risk factors for early childhood caries (ECC). Another considerable motivation was a dental visit study of 308 pregnant women to ascertain determinants of successful dental appointment attendance. There is no available method capable of clustering longitudinal categorical outcomes while also selecting relevant variables. Within each cluster, a generalized linear mixed-effects model is fit with a convex penalty function imposed on the fixed effect parameters. Through the expectation-maximization algorithm, model coefficients are estimated using the Laplace approximation within the coordinate descent algorithm, and the estimated values are then used to cluster subjects via k-means clustering for longitudinal data. The Bayesian information criterion can be used to determine the optimal number of clusters and the tuning parameters through a grid search. My simulation studies demonstrate that this method has satisfactory performance and is able to accommodate high-dimensional, multi-level effects as well as identify longitudinal patterns in categorical outcomes.

Keywords


Model-based clustering, Modelling high-dimensional and complex data, Generalized linear models, Mixed-effects models