The University of Southampton
University of Southampton Institutional Repository

On the discriminative power of credit scoring systems trained on independent samples

On the discriminative power of credit scoring systems trained on independent samples
On the discriminative power of credit scoring systems trained on independent samples
The aim of this work is to assess the importance of independence assumption in behavioral scorings created using logistic regression. We develop four sampling methods that control which observations associated to each client are to be included in the training set, avoiding a functional dependence between observations of the same client. We then calibrate logistic regressions with variable selection on the samples created by each method, plus one using all the data in the training set (biased base method), and validate the models on an independent data set. We find that the regression built using all the observations shows the highest area under the ROC curve and Kolmogorv–Smirnov statistics, while the regression that uses the least amount of observations shows the lowest performance and highest variance of these indicators. Nevertheless, the fourth selection algorithm presented shows almost the same performance as the base method using just 14 % of the dataset, and 14 less variables. We conclude that violating the independence assumption does not impact strongly on results and, furthermore, trying to control it by using less data can harm the performance of calibrated models, although a better sampling method does lead to equivalent results with a far smaller dataset needed
247-254
Springer
Biron, Miguel
6ad4f5b8-d615-42b1-bd98-a2ae254869d0
Bravo, Cristian
b22c4145-644e-40ee-85d8-431c59c3c71b
Biron, Miguel
6ad4f5b8-d615-42b1-bd98-a2ae254869d0
Bravo, Cristian
b22c4145-644e-40ee-85d8-431c59c3c71b

Biron, Miguel and Bravo, Cristian (2013) On the discriminative power of credit scoring systems trained on independent samples. In, Data Analysis, Machine Learning and Knowledge Discovery. (Data Analysis, Machine Learning and Knowledge Discovery) New York, US. Springer, pp. 247-254. (doi:10.1007/978-3-319-01595-8_27).

Record type: Book Section

Abstract

The aim of this work is to assess the importance of independence assumption in behavioral scorings created using logistic regression. We develop four sampling methods that control which observations associated to each client are to be included in the training set, avoiding a functional dependence between observations of the same client. We then calibrate logistic regressions with variable selection on the samples created by each method, plus one using all the data in the training set (biased base method), and validate the models on an independent data set. We find that the regression built using all the observations shows the highest area under the ROC curve and Kolmogorv–Smirnov statistics, while the regression that uses the least amount of observations shows the lowest performance and highest variance of these indicators. Nevertheless, the fourth selection algorithm presented shows almost the same performance as the base method using just 14 % of the dataset, and 14 less variables. We conclude that violating the independence assumption does not impact strongly on results and, furthermore, trying to control it by using less data can harm the performance of calibrated models, although a better sampling method does lead to equivalent results with a far smaller dataset needed

This record has no associated files available for download.

More information

Published date: 10 October 2013
Organisations: Southampton Business School

Identifiers

Local EPrints ID: 396678
URI: http://eprints.soton.ac.uk/id/eprint/396678
PURE UUID: 03ef5e98-8425-45f8-b6f0-4c1127784347
ORCID for Cristian Bravo: ORCID iD orcid.org/0000-0003-1579-1565

Catalogue record

Date deposited: 10 Jun 2016 10:19
Last modified: 15 Mar 2024 03:33

Export record

Altmetrics

Contributors

Author: Miguel Biron
Author: Cristian Bravo ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×