Applications of machine learning in consumer credit risk modelling
Applications of machine learning in consumer credit risk modelling
This thesis investigates three separate type of prediction problems, in differing contexts, with a common theme of experimental comparison of standard methods with more advanced machine learning methods. The objective is evaluation of the predictive power of machine learning methods through experiments on real world data. The first paper is an application of machine learning classification methods to predict mortgage arrears. It finds that both machine learning and a flexible statistical model outperform standard approaches. This can help identification of important predictive factors for the management of loan arrears within banks and loan servicers. The second paper applies both regression and classification methods to prediction of Peer to Peer (P2P) loan returns and default using different types of information.The main findings are that linear methods perform well on several (but not all) criteria; whether machine learning ensemble methods perform better than individual methods depends on the performance measure used to assess them. Use of alternative text-based information does not improve predictive outcomes. As a consequence, investors can be more informed about investments in this market. The third uses survival analysis to predict time to sale of property collateral used for mortgage loans. When property sales occur, as separate set of statistical and machine-learning models are used to predict the haircut or discount between the indexed property valuation at the point of sale and the actual transaction price. Random survival forests worked well to predict the time to sale; while deep learning, random forests, and neural network regression methods performed best predicting the discount. Based on predictive models for these two parameters, a sensitivity analysis illustrated how predictive modelling of these parameters produces more conservative (i.e., higher) loss estimates than one current industry approach.
University of Southampton
Fitzpatrick, Trevor
b3d78774-8c4d-4f7d-875c-8483843da9ef
June 2020
Fitzpatrick, Trevor
b3d78774-8c4d-4f7d-875c-8483843da9ef
Mues, Christophe
07438e46-bad6-48ba-8f56-f945bc2ff934
Fitzpatrick, Trevor
(2020)
Applications of machine learning in consumer credit risk modelling.
University of Southampton, Doctoral Thesis, 134pp.
Record type:
Thesis
(Doctoral)
Abstract
This thesis investigates three separate type of prediction problems, in differing contexts, with a common theme of experimental comparison of standard methods with more advanced machine learning methods. The objective is evaluation of the predictive power of machine learning methods through experiments on real world data. The first paper is an application of machine learning classification methods to predict mortgage arrears. It finds that both machine learning and a flexible statistical model outperform standard approaches. This can help identification of important predictive factors for the management of loan arrears within banks and loan servicers. The second paper applies both regression and classification methods to prediction of Peer to Peer (P2P) loan returns and default using different types of information.The main findings are that linear methods perform well on several (but not all) criteria; whether machine learning ensemble methods perform better than individual methods depends on the performance measure used to assess them. Use of alternative text-based information does not improve predictive outcomes. As a consequence, investors can be more informed about investments in this market. The third uses survival analysis to predict time to sale of property collateral used for mortgage loans. When property sales occur, as separate set of statistical and machine-learning models are used to predict the haircut or discount between the indexed property valuation at the point of sale and the actual transaction price. Random survival forests worked well to predict the time to sale; while deep learning, random forests, and neural network regression methods performed best predicting the discount. Based on predictive models for these two parameters, a sensitivity analysis illustrated how predictive modelling of these parameters produces more conservative (i.e., higher) loss estimates than one current industry approach.
Text
thesis_final_electronic
- Version of Record
Text
Final_Permission to deposit thesis form_both_signed.pdf
Restricted to Repository staff only
More information
Published date: June 2020
Identifiers
Local EPrints ID: 447638
URI: http://eprints.soton.ac.uk/id/eprint/447638
PURE UUID: ce1663ce-413e-4901-96b8-11b7374a541b
Catalogue record
Date deposited: 17 Mar 2021 17:33
Last modified: 17 Mar 2024 06:26
Export record
Contributors
Author:
Trevor Fitzpatrick
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics