The University of Southampton
University of Southampton Institutional Repository

Foundation models for credit risk prediction. A game changer?

Foundation models for credit risk prediction. A game changer?
Foundation models for credit risk prediction. A game changer?
Predictive models play a pivotal role in credit risk management, guiding critical decisions through accurate estimation of default probabilities and losses. Extensive research has introduced new modeling techniques, complemented by large-scale benchmarking studies consolidating the state-of-the-art. Today, quasi-standards such as gradient-boosting models paired with SHAP explainers have emerged, yet continuous improvement of risk models remains a top priority. Concurrently, rapid advancements in AI, most notably large language models, have disrupted predictive modeling paradigms. Foundation models, pretrained on extensive datasets from diverse domains, have demonstrated remarkable performance by leveraging prior knowledge. While prevalent in natural language processing and computer vision, foundation models for tabular data have only recently emerged. We conjecture that pretraining on out-of-domain data is particularly beneficial in small-data settings, such as SME lending or specialized corporate portfolios, and may help address longstanding challenges including low default portfolios and class imbalance. This paper benchmarks recently proposed tabular foundation models against a broad set of competitors, including established and advanced machine learning techniques, across two core tasks: PD and LGD modeling. Our evaluation encompasses various datasets, performance indicators, and experimental conditions. We find that tabular foundation models generally perform best across datasets and tasks. Moreover, they offer significant improvement in predictive performance as dataset size shrinks. These results are remarkable given that the models are tested out-of-the-box, without hyperparameter tuning, ensuring ease of use and mitigating computational costs.
cs.LG
arXiv
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
Goethals, Andreas
bd25db5a-80ff-4b53-86af-aa192daa8d7b
Lessmann, Stefan
3b9f8133-67bb-4bcc-9183-e1a5db294b01
Vos, Simon De
1fa651b2-e688-4318-8d35-6bfc138033c3
Bravo, Cristián
a5d0f685-d730-497c-8e12-b1e1d2e560e4
Martens, David
42e7e141-fb3d-4ead-8e3a-96b39bab65f9
Medina-Olivares, Victor
cdcf35d8-2c4d-4954-8df2-23093e45976f
Mues, Christophe
07438e46-bad6-48ba-8f56-f945bc2ff934
Oskarsdóttir, Maria
d159ed8f-9dd3-4ff3-8b00-d43579ab71be
Broucke, Seppe vanden
0b17d31c-7378-4aa6-a1a8-715ddd08b3b5
Verdonck, Tim
8558b8f8-d412-4fb9-9784-9aba1d7323b6
Verbeke, Wouter
57c0d98a-130a-4202-b6dd-cdc6914f4732
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
Goethals, Andreas
bd25db5a-80ff-4b53-86af-aa192daa8d7b
Lessmann, Stefan
3b9f8133-67bb-4bcc-9183-e1a5db294b01
Vos, Simon De
1fa651b2-e688-4318-8d35-6bfc138033c3
Bravo, Cristián
a5d0f685-d730-497c-8e12-b1e1d2e560e4
Martens, David
42e7e141-fb3d-4ead-8e3a-96b39bab65f9
Medina-Olivares, Victor
cdcf35d8-2c4d-4954-8df2-23093e45976f
Mues, Christophe
07438e46-bad6-48ba-8f56-f945bc2ff934
Oskarsdóttir, Maria
d159ed8f-9dd3-4ff3-8b00-d43579ab71be
Broucke, Seppe vanden
0b17d31c-7378-4aa6-a1a8-715ddd08b3b5
Verdonck, Tim
8558b8f8-d412-4fb9-9784-9aba1d7323b6
Verbeke, Wouter
57c0d98a-130a-4202-b6dd-cdc6914f4732

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Predictive models play a pivotal role in credit risk management, guiding critical decisions through accurate estimation of default probabilities and losses. Extensive research has introduced new modeling techniques, complemented by large-scale benchmarking studies consolidating the state-of-the-art. Today, quasi-standards such as gradient-boosting models paired with SHAP explainers have emerged, yet continuous improvement of risk models remains a top priority. Concurrently, rapid advancements in AI, most notably large language models, have disrupted predictive modeling paradigms. Foundation models, pretrained on extensive datasets from diverse domains, have demonstrated remarkable performance by leveraging prior knowledge. While prevalent in natural language processing and computer vision, foundation models for tabular data have only recently emerged. We conjecture that pretraining on out-of-domain data is particularly beneficial in small-data settings, such as SME lending or specialized corporate portfolios, and may help address longstanding challenges including low default portfolios and class imbalance. This paper benchmarks recently proposed tabular foundation models against a broad set of competitors, including established and advanced machine learning techniques, across two core tasks: PD and LGD modeling. Our evaluation encompasses various datasets, performance indicators, and experimental conditions. We find that tabular foundation models generally perform best across datasets and tasks. Moreover, they offer significant improvement in predictive performance as dataset size shrinks. These results are remarkable given that the models are tested out-of-the-box, without hyperparameter tuning, ensuring ease of use and mitigating computational costs.

Text
2605.18147v1 - Author's Original
Available under License Other.
Download (682kB)

More information

Published date: 18 May 2026
Keywords: cs.LG

Identifiers

Local EPrints ID: 511815
URI: http://eprints.soton.ac.uk/id/eprint/511815
PURE UUID: 9b09ac7d-7043-4cc0-af06-6a98dd845e15
ORCID for Bart Baesens: ORCID iD orcid.org/0000-0002-5831-5668
ORCID for Christophe Mues: ORCID iD orcid.org/0000-0002-6289-5490
ORCID for Maria Oskarsdóttir: ORCID iD orcid.org/0000-0001-5095-5356

Catalogue record

Date deposited: 03 Jun 2026 16:50
Last modified: 04 Jun 2026 02:18

Export record

Altmetrics

Contributors

Author: Bart Baesens ORCID iD
Author: Andreas Goethals
Author: Stefan Lessmann
Author: Simon De Vos
Author: Cristián Bravo
Author: David Martens
Author: Victor Medina-Olivares
Author: Christophe Mues ORCID iD
Author: Maria Oskarsdóttir ORCID iD
Author: Seppe vanden Broucke
Author: Tim Verdonck
Author: Wouter Verbeke

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×