Conferences CIMPA, 18th International Federation of Classification Societies

Font Size: 
A new distance for categorical data with moderate association
Aurea Grané, Silvia Salini, Gabriele Infante

Last modified: 2024-05-14

Abstract


Categorical variables coming form surveys usually share high percentages of information. Redundant information may lead to misleading results in data visualization techniques and clustering procedures, since units with similar characteristics can be considered as completely different. In general, this situation is encountered when additive dissimilarity coefficients are used in datasets with moderate or high association, producing the typical horseshoe effect which arises when visualizing the data in low-dimension. In this work we propose a new distance for categorical data, able to take into account the association/correlation structure of the data. Its performance is evaluated and compared to Hamming distance in MDS configurations. Additionally, applications to novel data on co-creation antecedents of telemedicine and vehicle accident rate data are given to illustrate the methodology.

Keywords


association, categorical data, horseshoe effect, MDS, redundant information