Identifying (quasi) equally informative subsets in feature selection problems for classification: a max-relevance min-redundancy approach
Identifying (quasi) equally informative subsets in feature selection problems for classification: a max-relevance min-redundancy approach
An emerging trend in feature selection is the development of two-objective algorithms that analyze the tradeoff between the number of features and the classification performance of the model built with these features. Since these two objectives are conflicting, a typical result stands in a set of Pareto-efficient subsets, each having a different cardinality and a corresponding discriminating power. However, this approach overlooks the fact that, for a given cardinality, there can be several subsets with similar information content. The study reported here addresses this problem, and introduces a novel multiobjective feature selection approach conceived to identify: 1) a subset that maximizes the performance of a given classifier and 2) a set of subsets that are quasi equally informative, i.e., have almost same classification performance, to the performance maximizing subset. The approach consists of a wrapper [Wrapper for Quasi Equally Informative Subset Selection (W-QEISS)] built on the formulation of a four-objective optimization problem, which is aimed at maximizing the accuracy of a classifier, minimizing the number of features, and optimizing two entropy-based measures of relevance and redundancy. This allows conducting the search in a larger space, thus enabling the wrapper to generate a large number of Pareto-efficient solutions. The algorithm is compared against the mRMR algorithm, a two-objective wrapper and a computationally efficient filter [Filter for Quasi Equally Informative Subset Selection (F-QEISS)] on 24 University of California, Irvine, (UCI) datasets including both binary and multiclass classification. Experimental results show that W-QEISS has the capability of evolving a rich and diverse set of Pareto-efficient solutions, and that their availability helps in: 1) studying the tradeoff between multiple measures of classification performance and 2) understanding the relative importance of each feature. The quasi equally informative subsets are identified at the cost of a marginal increase in the computational time thanks to the adoption of Borg Multiobjective Evolutionary Algorithm and Extreme Learning Machine as global optimization and learning algorithms, respectively.
Classification algorithms, extreme learning machine, feature selection, multiobjective optimization, neural networks, redundancy, relevance
1424-1437
Karakaya, Gulsah
3d67489a-d661-4485-b7b9-a01e90e503ba
Galelli, Stefano
ed3c03d1-d1b2-4e51-8409-fa79e0c6a160
Ahipasaoglu, Selin Damla
d69f1b80-5c05-4d50-82df-c13b87b02687
Taormina, Riccardo
0adddd95-ace1-4025-880c-bbb0a79982a7
June 2016
Karakaya, Gulsah
3d67489a-d661-4485-b7b9-a01e90e503ba
Galelli, Stefano
ed3c03d1-d1b2-4e51-8409-fa79e0c6a160
Ahipasaoglu, Selin Damla
d69f1b80-5c05-4d50-82df-c13b87b02687
Taormina, Riccardo
0adddd95-ace1-4025-880c-bbb0a79982a7
Karakaya, Gulsah, Galelli, Stefano, Ahipasaoglu, Selin Damla and Taormina, Riccardo
(2016)
Identifying (quasi) equally informative subsets in feature selection problems for classification: a max-relevance min-redundancy approach.
IEEE Transactions on Cybernetics, 46 (6), .
(doi:10.1109/TCYB.2015.2444435).
Abstract
An emerging trend in feature selection is the development of two-objective algorithms that analyze the tradeoff between the number of features and the classification performance of the model built with these features. Since these two objectives are conflicting, a typical result stands in a set of Pareto-efficient subsets, each having a different cardinality and a corresponding discriminating power. However, this approach overlooks the fact that, for a given cardinality, there can be several subsets with similar information content. The study reported here addresses this problem, and introduces a novel multiobjective feature selection approach conceived to identify: 1) a subset that maximizes the performance of a given classifier and 2) a set of subsets that are quasi equally informative, i.e., have almost same classification performance, to the performance maximizing subset. The approach consists of a wrapper [Wrapper for Quasi Equally Informative Subset Selection (W-QEISS)] built on the formulation of a four-objective optimization problem, which is aimed at maximizing the accuracy of a classifier, minimizing the number of features, and optimizing two entropy-based measures of relevance and redundancy. This allows conducting the search in a larger space, thus enabling the wrapper to generate a large number of Pareto-efficient solutions. The algorithm is compared against the mRMR algorithm, a two-objective wrapper and a computationally efficient filter [Filter for Quasi Equally Informative Subset Selection (F-QEISS)] on 24 University of California, Irvine, (UCI) datasets including both binary and multiclass classification. Experimental results show that W-QEISS has the capability of evolving a rich and diverse set of Pareto-efficient solutions, and that their availability helps in: 1) studying the tradeoff between multiple measures of classification performance and 2) understanding the relative importance of each feature. The quasi equally informative subsets are identified at the cost of a marginal increase in the computational time thanks to the adoption of Borg Multiobjective Evolutionary Algorithm and Extreme Learning Machine as global optimization and learning algorithms, respectively.
This record has no associated files available for download.
More information
Published date: June 2016
Keywords:
Classification algorithms, extreme learning machine, feature selection, multiobjective optimization, neural networks, redundancy, relevance
Identifiers
Local EPrints ID: 443176
URI: http://eprints.soton.ac.uk/id/eprint/443176
ISSN: 2168-2267
PURE UUID: fb0cb088-d000-45cb-9a9d-dcc53cd4519a
Catalogue record
Date deposited: 13 Aug 2020 16:38
Last modified: 17 Mar 2024 04:03
Export record
Altmetrics
Contributors
Author:
Gulsah Karakaya
Author:
Stefano Galelli
Author:
Riccardo Taormina
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics