Conferences CIMPA, 18th International Federation of Classification Societies

Font Size: 
A Gene Selection Method for Classification with Three Classes Using Proportional Overlapping Scores
Anusa Suwanwong, Andrew Harrison, Osama Mahmoud

Last modified: 2024-05-15

Abstract


Genomics experiments, such as microarrays, allow measurements of thousands of gene expression levels within individual samples. They play an important role in distinguishing multiple stages or phenotypes of diseases such as cancer. Implementing a classification that solely relies on specific discriminative genes improves a classifier’s interpretability and prediction accuracy. A feature selection method for binary classification within genomics experiments, the Proportional Overlapping Scores (POS), has been proposed, and shown to have good prediction accuracy [1, 2]. Here we propose an extension, named 3-class POS (3cPOS), which deals with the feature selection for classification problems with three classes.

3cPOS analyses the gene expressions data and derives a score for the overlap
across the three classes taking into account the proportions of overlapped samples. For each feature, we define a representative mask describing the capability of its gene in distinguishing between the target classes. 3cPOS scores, along with the feature masks, are then utilised to select a subset of informative genes for the classification of interest.


3cPOS is compared with Kruskal Wallis Test, Least Absolute Shrinkage and
Selection Operator, and Minimum Redundancy and Maximum Relevant, on seven benchmark gene expression datasets. The classification accuracy of the Random Forest, K-Nearest Neighbours, Support Vector Machine, and Extreme Gradient Boost classifiers using the subset of features selected by all methods and the full feature set were examined using 20 repetitions of 5-fold cross validation. Our experiments show that 3cPOS provides an outstanding accuracy for the majority of datasets and classification models.


Keywords


feature selection, microarray classification, proportional overlap score