Conferences CIMPA, 18th International Federation of Classification Societies

Font Size: 
Predicting soil bacterial and fungal communities at different taxonomic levels using machine learning
Vladimir Makarenkov, Zahia Aouabed, Mohamed Achraf Bouaoune, Mohamed Hijri

Last modified: 2024-06-19

Abstract


It is widely known that predictions about macrobiological communities depend on the taxonomic scale. Nevertheless, the applicability of such predictions remains uncertain when extended to microbial communities of the soil. This study employs various traditional machine learning techniques to forecast bacterial and fungal communities within the soil across different taxonomic levels. To investigate this avenue, we use an extensive soil microbiome dataset collected by diverse research groups. Our bacterial results indicate significantly superior prediction accuracy at the Phylum, Class, and Order taxonomic levels compared to the Family and Genus levels. Lower prediction scores, compared to bacteria, were generally found for fungi, with the best results obtained at the Phylum and Class taxonomic levels. Overall, our findings suggest a consistent trend across taxonomic scales, bridging macrobiological and soil microbiological communities. For bacterial data, our prediction results obtained using the Random Forest and Gradient Boosting methods were generally better than those found by Averill and co-authors, who used the Dirichlet multivariate regression model in their study recently published in Nature Ecology and Evolution. For fungal data, we recommend using Random Forest to provide the soil community predictions.

Keywords


biological data prediction, linear regression, decision trees, random forest, gradient boosting