Conferences CIMPA, 18th International Federation of Classification Societies

Font Size: 
Robust distance-based generalized linear models: A new tool for classification
Eva Boj, Aurea Grane, Agustín Mayo-Íscar

Last modified: 2024-05-14

Abstract


Understanding the nature of the data, dealing with outliers and redundant information are key issues when designing a proper metric for clustering and classification. Distance-based generalized linear models are prediction tools which can be applied to any kind of data whenever a distance measure can be computed among units. In this work, robust ad-hoc metrics are proposed to be used in the predictors’ space of these models, incorporating more flexibility to this tool. Their performance is evaluated by means of a simulation study and compared to those based on Gower’s and generalized Gower's metrics through several datasets of multivariate heterogeneous data with the presence of anomalous observations. Misclassification rate is used to evaluate the effectiveness in the prediction of responses. Additionally, ensemble methods are explored for such models in the context of big data. Applications on real data are provided in order to illustrate the predictive power of these models. Computations are made using the dbstats package for R.