Font Size:
A new distance for categorical data with moderate association
Last modified: 2024-05-14
Abstract
Categorical variables coming form surveys usually share high percentages of information. Redundant information may lead to misleading results in data visualization techniques and clustering procedures, since units with similar characteristics can be considered as completely different. In general, this situation is encountered when additive dissimilarity coefficients are used in datasets with moderate or high association, producing the typical horseshoe effect which arises when visualizing the data in low-dimension. In this work we propose a new distance for categorical data, able to take into account the association/correlation structure of the data. Its performance is evaluated and compared to Hamming distance in MDS configurations. Additionally, applications to novel data on co-creation antecedents of telemedicine and vehicle accident rate data are given to illustrate the methodology.
Keywords
association, categorical data, horseshoe effect, MDS, redundant information