Using machine learning to detect financial distress from sustainability reports
Using machine learning to detect financial distress from sustainability reports
This study examines the incremental predictive value of sustainability reports in forecasting corporate financial distress. We first construct a unique sample of 1,220 sustainability reports produced by 244 firms from S\&P 500 index between 2018 to 2022. We then employ natural language processing (NLP) techniques to extract key features from the textual content of corporate sustainability reports, introducing them as a novel input to financial distress prediction models. A suite of machine learning algorithms is then applied to assess predictive performance. Our results show that incorporating textual sustainability disclosures significantly improves model performance relative to using only quantitative variables. These textual reports outline the corporate strategies on sustainability, providing additional insights that enhance the prediction of financial distress. Among the tested models, Random Forest and XGBoost regressors exhibit superior performance. We also find that the materiality of specific ESG issues in predicting financial distress varies across sectors. Overall, this study offers a framework for integrating sustainability reports and ensemble learning into corporate credit risk assessment.
ESG, NLP, credit risk, machine learning, sustainability, textual analysis
Qin, Songshan
d03dd66c-19d4-4576-aaae-91ae8b588ae5
Bakoush, Mohamed
09d43d33-abd2-4db0-a26a-2f5831ea0a01
McGroarty, Frank
693a5396-8e01-4d68-8973-d74184c03072
Qin, Songshan
d03dd66c-19d4-4576-aaae-91ae8b588ae5
Bakoush, Mohamed
09d43d33-abd2-4db0-a26a-2f5831ea0a01
McGroarty, Frank
693a5396-8e01-4d68-8973-d74184c03072
Qin, Songshan, Bakoush, Mohamed and McGroarty, Frank
(2026)
Using machine learning to detect financial distress from sustainability reports.
Business Strategy and the Environment.
(doi:10.1002/bse.70563).
Abstract
This study examines the incremental predictive value of sustainability reports in forecasting corporate financial distress. We first construct a unique sample of 1,220 sustainability reports produced by 244 firms from S\&P 500 index between 2018 to 2022. We then employ natural language processing (NLP) techniques to extract key features from the textual content of corporate sustainability reports, introducing them as a novel input to financial distress prediction models. A suite of machine learning algorithms is then applied to assess predictive performance. Our results show that incorporating textual sustainability disclosures significantly improves model performance relative to using only quantitative variables. These textual reports outline the corporate strategies on sustainability, providing additional insights that enhance the prediction of financial distress. Among the tested models, Random Forest and XGBoost regressors exhibit superior performance. We also find that the materiality of specific ESG issues in predicting financial distress varies across sectors. Overall, this study offers a framework for integrating sustainability reports and ensemble learning into corporate credit risk assessment.
Text
Using_ML_to_detect_financial_distress_BSE
- Accepted Manuscript
Restricted to Repository staff only until 16 January 2028.
Request a copy
More information
Accepted/In Press date: 4 January 2026
e-pub ahead of print date: 16 January 2026
Keywords:
ESG, NLP, credit risk, machine learning, sustainability, textual analysis
Identifiers
Local EPrints ID: 509383
URI: http://eprints.soton.ac.uk/id/eprint/509383
ISSN: 0964-4733
PURE UUID: 0817bc98-88bb-44c0-8ece-77c98ea3b3ef
Catalogue record
Date deposited: 19 Feb 2026 17:52
Last modified: 20 Feb 2026 02:58
Export record
Altmetrics
Contributors
Author:
Songshan Qin
Author:
Frank McGroarty
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics