Conferences CIMPA, 18th International Federation of Classification Societies

Font Size: 
Unsupervised methods for the creation of orthonormal bases in compositional data: R-mode clustering
Jose Antonio Martin Fernandez

Last modified: 2024-05-14

Abstract


R-mode hierarchical clustering (HC) identifies interrelationships between variables which are useful for variable selection and dimension reduction. The application of HC in R-mode to Compositional Data (CoDa) must be consistent with the fundamental properties of the compositional geometry, also known as the Aitchison geometry.  A composition is a multivariate quantitative description of the parts or components of a whole conveying relative information, commonly expressed as a vector of proportions. The critical element of the Aitchison geometry is the inner product defined via the log-ratio coordinates of the compositions. This geometry allows to express a composition as coordinates in an orthonormal basis, formed by log-ratios and called olr-coordinates.

Recent publications introduce R-mode agglomerative HC methods in CoDa for creating orthonormal log-ratio basis. The HC methods form hierarchical groups of mutually exclusive subsets of parts which can be associated to a sequential binary partition of the parts. In this talk, we explore the basic concepts of the R-mode clustering algorithms and the connections between concepts such as distance between parts, cluster representative of a group of parts, and compositional biplot. Practical examples will be presented to visually illustrate the proposed approach.


Keywords


compositional data, logratio, simplex, R-mode clustering