Conferences CIMPA, 18th International Federation of Classification Societies

Font Size: 
Clustering of human gut microbiome data using the finite mixture of generalized Dirichlet-multinomial models
Xiaoke Qin

Last modified: 2024-05-15

Abstract


The composition of the human gut microbiome has been reported to be associated with health conditions and the pathogenesis of diseases. Since then, the clustering of microbiome has been of increasing interest, aiming to investigate the subgroups within the population, or enterotypes, in which the people share similar compositions of microbiome. The finite mixture of Dirichlet-multinomial distribution has been widely used for cluster analysis but it is limited by the covariance pattern and the neutrality of Dirichlet distribution. In this paper, we propose to use a finite mixture of the generalized Dirichlet-multinomial model (GDM) which allows for a flexible covariance matrix and less need for neutrality of data. Furthermore, we discuss the non-permutation invariance of GDM. Some examples are presented to show the necessity to account for the orders of datasets in the selection of models. Based on these features, a generalized expectation-maximization algorithm is developed to fit the model, and a stepwise process to permute the column is suggested. A series of simulations are conducted to show the performance of the proposed approach. We apply the model to two human microbiome datasets to capture the latent components and show the correlation structures of the data. The potential association between the order of microbial taxa and the keystone species in the human gut is also discussed. Our model provides a novel perspective for the clustering of compositional data and could mine new information from the permutation-variant data.

Keywords


Model-based clustering, Compositional data, Mixture model, Microbiome data