The University of Southampton
University of Southampton Institutional Repository

Identifying (quasi) equally informative subsets in feature selection problems for classification: a max-relevance min-redundancy approach

Identifying (quasi) equally informative subsets in feature selection problems for classification: a max-relevance min-redundancy approach
Identifying (quasi) equally informative subsets in feature selection problems for classification: a max-relevance min-redundancy approach
An emerging trend in feature selection is the development of two-objective algorithms that analyze the tradeoff between the number of features and the classification performance of the model built with these features. Since these two objectives are conflicting, a typical result stands in a set of Pareto-efficient subsets, each having a different cardinality and a corresponding discriminating power. However, this approach overlooks the fact that, for a given cardinality, there can be several subsets with similar information content. The study reported here addresses this problem, and introduces a novel multiobjective feature selection approach conceived to identify: 1) a subset that maximizes the performance of a given classifier and 2) a set of subsets that are quasi equally informative, i.e., have almost same classification performance, to the performance maximizing subset. The approach consists of a wrapper [Wrapper for Quasi Equally Informative Subset Selection (W-QEISS)] built on the formulation of a four-objective optimization problem, which is aimed at maximizing the accuracy of a classifier, minimizing the number of features, and optimizing two entropy-based measures of relevance and redundancy. This allows conducting the search in a larger space, thus enabling the wrapper to generate a large number of Pareto-efficient solutions. The algorithm is compared against the mRMR algorithm, a two-objective wrapper and a computationally efficient filter [Filter for Quasi Equally Informative Subset Selection (F-QEISS)] on 24 University of California, Irvine, (UCI) datasets including both binary and multiclass classification. Experimental results show that W-QEISS has the capability of evolving a rich and diverse set of Pareto-efficient solutions, and that their availability helps in: 1) studying the tradeoff between multiple measures of classification performance and 2) understanding the relative importance of each feature. The quasi equally informative subsets are identified at the cost of a marginal increase in the computational time thanks to the adoption of Borg Multiobjective Evolutionary Algorithm and Extreme Learning Machine as global optimization and learning algorithms, respectively.
Classification algorithms, extreme learning machine, feature selection, multiobjective optimization, neural networks, redundancy, relevance
2168-2267
1424-1437
Karakaya, Gulsah
3d67489a-d661-4485-b7b9-a01e90e503ba
Galelli, Stefano
ed3c03d1-d1b2-4e51-8409-fa79e0c6a160
Ahipasaoglu, Selin Damla
d69f1b80-5c05-4d50-82df-c13b87b02687
Taormina, Riccardo
0adddd95-ace1-4025-880c-bbb0a79982a7
Karakaya, Gulsah
3d67489a-d661-4485-b7b9-a01e90e503ba
Galelli, Stefano
ed3c03d1-d1b2-4e51-8409-fa79e0c6a160
Ahipasaoglu, Selin Damla
d69f1b80-5c05-4d50-82df-c13b87b02687
Taormina, Riccardo
0adddd95-ace1-4025-880c-bbb0a79982a7

Karakaya, Gulsah, Galelli, Stefano, Ahipasaoglu, Selin Damla and Taormina, Riccardo (2016) Identifying (quasi) equally informative subsets in feature selection problems for classification: a max-relevance min-redundancy approach. IEEE Transactions on Cybernetics, 46 (6), 1424-1437. (doi:10.1109/TCYB.2015.2444435).

Record type: Article

Abstract

An emerging trend in feature selection is the development of two-objective algorithms that analyze the tradeoff between the number of features and the classification performance of the model built with these features. Since these two objectives are conflicting, a typical result stands in a set of Pareto-efficient subsets, each having a different cardinality and a corresponding discriminating power. However, this approach overlooks the fact that, for a given cardinality, there can be several subsets with similar information content. The study reported here addresses this problem, and introduces a novel multiobjective feature selection approach conceived to identify: 1) a subset that maximizes the performance of a given classifier and 2) a set of subsets that are quasi equally informative, i.e., have almost same classification performance, to the performance maximizing subset. The approach consists of a wrapper [Wrapper for Quasi Equally Informative Subset Selection (W-QEISS)] built on the formulation of a four-objective optimization problem, which is aimed at maximizing the accuracy of a classifier, minimizing the number of features, and optimizing two entropy-based measures of relevance and redundancy. This allows conducting the search in a larger space, thus enabling the wrapper to generate a large number of Pareto-efficient solutions. The algorithm is compared against the mRMR algorithm, a two-objective wrapper and a computationally efficient filter [Filter for Quasi Equally Informative Subset Selection (F-QEISS)] on 24 University of California, Irvine, (UCI) datasets including both binary and multiclass classification. Experimental results show that W-QEISS has the capability of evolving a rich and diverse set of Pareto-efficient solutions, and that their availability helps in: 1) studying the tradeoff between multiple measures of classification performance and 2) understanding the relative importance of each feature. The quasi equally informative subsets are identified at the cost of a marginal increase in the computational time thanks to the adoption of Borg Multiobjective Evolutionary Algorithm and Extreme Learning Machine as global optimization and learning algorithms, respectively.

This record has no associated files available for download.

More information

Published date: June 2016
Keywords: Classification algorithms, extreme learning machine, feature selection, multiobjective optimization, neural networks, redundancy, relevance

Identifiers

Local EPrints ID: 443176
URI: http://eprints.soton.ac.uk/id/eprint/443176
ISSN: 2168-2267
PURE UUID: fb0cb088-d000-45cb-9a9d-dcc53cd4519a
ORCID for Selin Damla Ahipasaoglu: ORCID iD orcid.org/0000-0003-1371-315X

Catalogue record

Date deposited: 13 Aug 2020 16:38
Last modified: 17 Mar 2024 04:03

Export record

Altmetrics

Contributors

Author: Gulsah Karakaya
Author: Stefano Galelli
Author: Riccardo Taormina

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×