Last modified: 2024-05-15
Abstract
This work deals with a robustness approach of the clusterwise regression algorithm (Spaeth H., 1979) for distributional data (CRM-D) (Bock and Diaday, 1999).
The CRM-D is based on a new regression method for distributional data, which maps density functions in a Hilbert space via a logarithmic transformation of the derived quantile functions (LDQ) (Petersen A., Muller H., 2016). Let us consider an extension for distributional data of the LDQ functions using a functional representation of the data (Ramsay, J. O. & Silverman, B., 2005). The elements of the explicative distributional variable, assumed as LDQ functions are represented as functional data, considering a smoothing B-splines with knots corresponding to the quantiles of the distributions.
CRM-D predicts the response variable in K subclass in which the set of objects is partitioned. In accordance with the clusterwise criterion, the partitioning of the set of objects is performed according to the best fit of the local regression models. The main contribution of the present proposal is to reduce the instability of the results due to the greater variability of the lowest and highest quantiles of the distributions. By using a suitable trimmed of the distributional data, more stable results are achieved for the prediction of the response variable of the K clusters of data. The improving of the fitting of the partitioned data to the respective cluster regression models allows to evaluate the performance of the new approach. Preliminary results on real data have confirmed the effectiveness of the proposed method.
Keywords
References
Bock H., Diday E.: Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer Science & Business Media, (1999).
Petersen A., Muller H.: Functional data analysis for density functions by transformation to a Hilbert space. The Annals of Statistics, Ann. Statist. 44(1), 183-218, (2016).
Ramsay, J. O. & Silverman, B. W. Functional Data Analysis, 2nd Edition, Springer, New York. (2005).
Spaeth, H.: Clusterwise Linear Regression. Computing 22 (4), 367-373 (1979).