The University of Southampton
University of Southampton Institutional Repository

Using machine learning to detect financial distress from sustainability reports

Using machine learning to detect financial distress from sustainability reports
Using machine learning to detect financial distress from sustainability reports
This study examines the incremental predictive value of sustainability reports in forecasting corporate financial distress. We first construct a unique sample of 1,220 sustainability reports produced by 244 firms from S\&P 500 index between 2018 to 2022. We then employ natural language processing (NLP) techniques to extract key features from the textual content of corporate sustainability reports, introducing them as a novel input to financial distress prediction models. A suite of machine learning algorithms is then applied to assess predictive performance. Our results show that incorporating textual sustainability disclosures significantly improves model performance relative to using only quantitative variables. These textual reports outline the corporate strategies on sustainability, providing additional insights that enhance the prediction of financial distress. Among the tested models, Random Forest and XGBoost regressors exhibit superior performance. We also find that the materiality of specific ESG issues in predicting financial distress varies across sectors. Overall, this study offers a framework for integrating sustainability reports and ensemble learning into corporate credit risk assessment.
ESG, NLP, credit risk, machine learning, sustainability, textual analysis
0964-4733
Qin, Songshan
d03dd66c-19d4-4576-aaae-91ae8b588ae5
Bakoush, Mohamed
09d43d33-abd2-4db0-a26a-2f5831ea0a01
McGroarty, Frank
693a5396-8e01-4d68-8973-d74184c03072
Qin, Songshan
d03dd66c-19d4-4576-aaae-91ae8b588ae5
Bakoush, Mohamed
09d43d33-abd2-4db0-a26a-2f5831ea0a01
McGroarty, Frank
693a5396-8e01-4d68-8973-d74184c03072

Qin, Songshan, Bakoush, Mohamed and McGroarty, Frank (2026) Using machine learning to detect financial distress from sustainability reports. Business Strategy and the Environment. (doi:10.1002/bse.70563).

Record type: Article

Abstract

This study examines the incremental predictive value of sustainability reports in forecasting corporate financial distress. We first construct a unique sample of 1,220 sustainability reports produced by 244 firms from S\&P 500 index between 2018 to 2022. We then employ natural language processing (NLP) techniques to extract key features from the textual content of corporate sustainability reports, introducing them as a novel input to financial distress prediction models. A suite of machine learning algorithms is then applied to assess predictive performance. Our results show that incorporating textual sustainability disclosures significantly improves model performance relative to using only quantitative variables. These textual reports outline the corporate strategies on sustainability, providing additional insights that enhance the prediction of financial distress. Among the tested models, Random Forest and XGBoost regressors exhibit superior performance. We also find that the materiality of specific ESG issues in predicting financial distress varies across sectors. Overall, this study offers a framework for integrating sustainability reports and ensemble learning into corporate credit risk assessment.

Text
Using_ML_to_detect_financial_distress_BSE - Accepted Manuscript
Restricted to Repository staff only until 16 January 2028.
Request a copy

More information

Accepted/In Press date: 4 January 2026
e-pub ahead of print date: 16 January 2026
Keywords: ESG, NLP, credit risk, machine learning, sustainability, textual analysis

Identifiers

Local EPrints ID: 509383
URI: http://eprints.soton.ac.uk/id/eprint/509383
ISSN: 0964-4733
PURE UUID: 0817bc98-88bb-44c0-8ece-77c98ea3b3ef
ORCID for Mohamed Bakoush: ORCID iD orcid.org/0000-0001-9624-9828
ORCID for Frank McGroarty: ORCID iD orcid.org/0000-0003-2962-0927

Catalogue record

Date deposited: 19 Feb 2026 17:52
Last modified: 20 Feb 2026 02:58

Export record

Altmetrics

Contributors

Author: Songshan Qin
Author: Mohamed Bakoush ORCID iD
Author: Frank McGroarty ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×