The University of Southampton
University of Southampton Institutional Repository

Reinforcement learning for bond portfolio management: an actor-critic approach

Reinforcement learning for bond portfolio management: an actor-critic approach
Reinforcement learning for bond portfolio management: an actor-critic approach
Portfolio management poses unique challenges for traditional forecasting methods due to its complex, sequential decision-making process. This study leverages reinforcement learning (RL) to address these challenges, focusing on fixed income portfolio management. We develop a novel autonomous RL system using a custom environment for bond exchange-traded fund (ETF) dynamics and the Deep Deterministic Policy Gradient (DDPG) algorithm. Unlike prior studies that merely report algorithmic instability, our work systematically addresses this issue by introducing a robust agent selection process during training. To illustrate the practical benefits, we construct a simple equally weighted ensemble of selected agents that outperforms the static buy-and-hold benchmark by 4.3% and achieves a total return comparable to the portfolio's best-performing asset, while exhibiting superior risk characteristics during periods of market stress. Our methodology also incorporates methodological innovations, including a scaled reward structure to improve learning in bond markets. While instability is observed in the DDPG algorithm, our results demonstrate that this challenge can be systematically mitigated through robust agent selection and ensemble methods. These findings establish RL as a powerful tool for financial strategies where direct forecasting is complex and uncertain, offering a practical framework for implementation in fixed income markets.
portfolio management, reinforcement learning, actor-critic algorithm, deep deterministic policy gradient, fixed income
1351-847X
Nunes, Manuel
af597793-a85a-463c-9d12-0ae4be7e0a69
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
McGroarty, Frank
693a5396-8e01-4d68-8973-d74184c03072
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Nunes, Manuel
af597793-a85a-463c-9d12-0ae4be7e0a69
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
McGroarty, Frank
693a5396-8e01-4d68-8973-d74184c03072
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Nunes, Manuel, Gerding, Enrico, McGroarty, Frank and Niranjan, Mahesan (2026) Reinforcement learning for bond portfolio management: an actor-critic approach. European Journal of Finance. (doi:10.1080/1351847X.2025.2605061).

Record type: Article

Abstract

Portfolio management poses unique challenges for traditional forecasting methods due to its complex, sequential decision-making process. This study leverages reinforcement learning (RL) to address these challenges, focusing on fixed income portfolio management. We develop a novel autonomous RL system using a custom environment for bond exchange-traded fund (ETF) dynamics and the Deep Deterministic Policy Gradient (DDPG) algorithm. Unlike prior studies that merely report algorithmic instability, our work systematically addresses this issue by introducing a robust agent selection process during training. To illustrate the practical benefits, we construct a simple equally weighted ensemble of selected agents that outperforms the static buy-and-hold benchmark by 4.3% and achieves a total return comparable to the portfolio's best-performing asset, while exhibiting superior risk characteristics during periods of market stress. Our methodology also incorporates methodological innovations, including a scaled reward structure to improve learning in bond markets. While instability is observed in the DDPG algorithm, our results demonstrate that this challenge can be systematically mitigated through robust agent selection and ensemble methods. These findings establish RL as a powerful tool for financial strategies where direct forecasting is complex and uncertain, offering a practical framework for implementation in fixed income markets.

Text
Reinforcement learning for bond portfolio management an actor-critic approach - Version of Record
Available under License Creative Commons Attribution.
Download (3MB)

More information

Accepted/In Press date: 11 December 2025
e-pub ahead of print date: 4 January 2026
Keywords: portfolio management, reinforcement learning, actor-critic algorithm, deep deterministic policy gradient, fixed income

Identifiers

Local EPrints ID: 507872
URI: http://eprints.soton.ac.uk/id/eprint/507872
ISSN: 1351-847X
PURE UUID: 636030ff-9ec1-495d-95f2-c2191d32509b
ORCID for Manuel Nunes: ORCID iD orcid.org/0000-0002-7116-5502
ORCID for Enrico Gerding: ORCID iD orcid.org/0000-0001-7200-552X
ORCID for Frank McGroarty: ORCID iD orcid.org/0000-0003-2962-0927
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 07 Jan 2026 00:47
Last modified: 08 Jan 2026 03:14

Export record

Altmetrics

Contributors

Author: Manuel Nunes ORCID iD
Author: Enrico Gerding ORCID iD
Author: Frank McGroarty ORCID iD
Author: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×