Font Size:
A Robust approach of the Clusterwise Regression method for distributional data
Last modified: 2024-05-14
Abstract
This work deals with a robustness approach of the clusterwise regression algorithm for distributional data (CRM-D).The CRM-D is based on a new regression method for distributional data (Bock and Diaday, 1999), which maps density functions in a Hilbert space via a logarithmic transformation of the derived quantile functions (LDQ) (Petersen A., Muller H., 2016). Let us consider an extension for distributional data of the LDQ functions using a functional representation of the data. The elements of the explicative distributional variable, assumed as LDQ functions are represented as functional data, considering a smoothing B-splines with knots corresponding to the quantiles of the distributions.CRM-D predicts the response variable in K subclass in which the set of objects is partitioned. In accordance with the clusterwise criterion, the partitioning of the set of objects is performed according to the best fit of the local regression models. The main contribution of the present proposal is to reduce the instability of the results due to the greater variability of the lowest and highest quantiles of the distributions. By using a suitable trimmed of the distributional data, more stable results are achieved for the prediction of the response variable of the K clusters of data. The improving of the fitting of the partitioned data to the respective cluster regression models allows to evaluate the performance of the new approach on real data
Keywords
Distributional Data, LDQ, Regression model for distributional data
References
1. Bock H., Diday E.: Analysis of symbolic data: exploratory methods for extracting statisticalinformation from complex data. Springer Science & Business Media, (1999).
2. Brito P., Dias S.: Analysis of Distributional Data. Chapman Hall (2022)
3. Petersen A., M¨uller H.: Functional data analysis for density functions by transformation to aHilbert space. The Annals of Statistics, Ann. Statist. 44(1), 183-218, (2016)