Conferences CIMPA, 18th International Federation of Classification Societies

Font Size: 
A toolbox for clustering ordinal data in the presence of missing values
Lena Ortega Menjivar

Last modified: 2024-05-15

Abstract


Ordinal response scales and ’Don’t know’-options are ubiquitous response
options in surveys. As survey results are a common source for separating respondents
into (consumer) segments, there is a great need for clustering algorithms able to
handle ordinal, and mixed-with-ordinal data with missing values. While there have
been significant advances in this field in the last years, especially in the ambit of
model-based clustering, solutions tend to be tailored towards specific applications,
and no general review is known to the authors. In this work, an in-depth investigation
of existing implementations is made. Common strategies for handling missing values
in ordinal clustering include (1) the imputation and down-weighing of missing
distances, (2) the conjecture of cluster memberships for missing observations in bi-
or other multiview-clustering methods from their clustering memberships in non-
missing dimensions, and (3) including models for missingness patterns in mixtures
of ordinal or mixed-type models. As both categorization and quantification are
common strategies in the handling of ordinal data, common methods for clustering
interval and categorical data with missing values will also be included (4), and
used to benchmark the methods designed specifically for ordinal data. The collected
implementations are categorized regarding their assumptions towards variable types
and missingness mechanisms, and applied to real data sets. Their performance is
evaluated numerically via common cluster indices, as well as content-wise regarding
their practicality. Thus, the result of this work is a toolbox of clustering algorithms
dealing with ordinal data and missing values, and serves as decision support for
selecting methods in future applications.

Keywords


clustering, ordinal data, missing values