Selected modelling problems in credit scoring

Bijak, Katarzyna (2013) Selected modelling problems in credit scoring. University of Southampton, School of Management, Doctoral Thesis, 179pp.

Record type: Thesis (Doctoral)

Abstract

This research addresses three selected modelling problems that occur in credit scoring. The focus is on segmentation, modelling Loss Given Default (LGD) for unsecured loans and affordability assessment. It is usually expected that segmentation, i.e. dividing the population into a number of groups and building separate scorecards for them, will improve the model performance. The most common statistical methods for segmentation are the two-step approaches, where logistic regression follows Classification and Regression Trees (CART) or Chi-square Automatic Interaction Detection (CHAID) trees. In this research, these approaches and a simultaneous method, in which both segmentation and scorecards are optimised at the same time: Logistic Trees with Unbiased Selection (LOTUS), are applied to the data provided by two UK banks and a European credit bureau. The model performance measures are compared to assess an improvement due to the segmentation. For unsecured retail loans, LGD is often found difficult to model. In the frequentist (classical) two-step approach, the first model (logistic regression) is used to separate positive values from zeroes and the second model (e.g. linear regression) is applied to estimate these values. Instead, one can build a Bayesian hierarchical model, which is a more coherent approach. In this research, Bayesian methods and the frequentist approach are applied to the data on personal loans provided by a UK bank. The Bayesian model generates an individual predictive distribution of LGD for each loan, whose potential applications include approximating the downturn LGD and stress testing LGD under Basel II. An applicant’s affordability (ability to repay) is often checked using a simple, static approach. In this research, a theoretical framework for dynamic affordability assessment is proposed. Both income and consumption are allowed to vary over time and their changes are described with random effects models for panel data. On their basis a simulation is run for a given applicant. The ability to repay is checked over the life of the loan and for all possible instalment amounts. As a result, a probability of default is assigned to each amount, which can help find the maximum affordable instalment. This is illustrated with an example based on artificial data.

Text

Final PhD thesis - Kasia Bijak.pdf - Other

Available under License University of Southampton Thesis Licence.

Download (1MB)