The University of Southampton
University of Southampton Institutional Repository

Quantile encoder: tackling high cardinality categorical features in regression problems

Quantile encoder: tackling high cardinality categorical features in regression problems
Quantile encoder: tackling high cardinality categorical features in regression problems
Regression problems have been widely studied in machinelearning literature resulting in a plethora of regression models and performance measures. However, there are few techniques specially dedicated to solve the problem of how to incorporate categorical features to regression problems. Usually, categorical feature encoders are general enough to cover both classification and regression problems. This lack of specificity results in underperforming regression models. In this paper,we provide an in-depth analysis of how to tackle high cardinality categor-ical features with the quantile. Our proposal outperforms state-of-the-encoders, including the traditional statistical mean target encoder, when considering the Mean Absolute Error, especially in the presence of long-tailed or skewed distributions. Besides, to deal with possible overfitting when there are categories with small support, our encoder benefits from additive smoothing. Finally, we describe how to expand the encoded values by creating a set of features with different quantiles. This expanded encoder provides a more informative output about the categorical feature in question, further boosting the performance of the regression model.
0302-9743
168-180
Springer
Mougan, Carlos
229c7631-f1da-4896-a06a-fd27e77e5742
Masip, David
84a6fad3-5623-442e-8d2b-3e8e009f397b
Nin, Jordi
fa1b636a-221f-4ecb-90e0-de54e41cafe7
Pujol, Oriol
b2b93c90-0fc0-4553-bb23-1d39191a09a1
Torra, Vicenç
Narukawa, Yasuo
Mougan, Carlos
229c7631-f1da-4896-a06a-fd27e77e5742
Masip, David
84a6fad3-5623-442e-8d2b-3e8e009f397b
Nin, Jordi
fa1b636a-221f-4ecb-90e0-de54e41cafe7
Pujol, Oriol
b2b93c90-0fc0-4553-bb23-1d39191a09a1
Torra, Vicenç
Narukawa, Yasuo

Mougan, Carlos, Masip, David, Nin, Jordi and Pujol, Oriol (2021) Quantile encoder: tackling high cardinality categorical features in regression problems. Torra, Vicenç and Narukawa, Yasuo (eds.) In Modeling Decisions for Artificial Intelligence. vol. 12898, Springer. pp. 168-180 . (doi:10.1007/978-3-030-85529-1_14).

Record type: Conference or Workshop Item (Paper)

Abstract

Regression problems have been widely studied in machinelearning literature resulting in a plethora of regression models and performance measures. However, there are few techniques specially dedicated to solve the problem of how to incorporate categorical features to regression problems. Usually, categorical feature encoders are general enough to cover both classification and regression problems. This lack of specificity results in underperforming regression models. In this paper,we provide an in-depth analysis of how to tackle high cardinality categor-ical features with the quantile. Our proposal outperforms state-of-the-encoders, including the traditional statistical mean target encoder, when considering the Mean Absolute Error, especially in the presence of long-tailed or skewed distributions. Besides, to deal with possible overfitting when there are categories with small support, our encoder benefits from additive smoothing. Finally, we describe how to expand the encoded values by creating a set of features with different quantiles. This expanded encoder provides a more informative output about the categorical feature in question, further boosting the performance of the regression model.

This record has no associated files available for download.

More information

Published date: 20 September 2021

Identifiers

Local EPrints ID: 479473
URI: http://eprints.soton.ac.uk/id/eprint/479473
ISSN: 0302-9743
PURE UUID: ef726233-fef3-4527-a909-d1324a3b9e02

Catalogue record

Date deposited: 25 Jul 2023 16:31
Last modified: 17 Mar 2024 03:34

Export record

Altmetrics

Contributors

Author: Carlos Mougan
Author: David Masip
Author: Jordi Nin
Author: Oriol Pujol
Editor: Vicenç Torra
Editor: Yasuo Narukawa

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×