Reinforcement learning for bond portfolio management: an actor-critic approach

Portfolio management poses unique challenges for traditional forecasting methods due to its complex, sequential decision-making process. This study leverages reinforcement learning (RL) to address these challenges, focusing on fixed income portfolio management. We develop a novel autonomous RL system using a custom environment for bond exchange-traded fund (ETF) dynamics and the Deep Deterministic Policy Gradient (DDPG) algorithm. Unlike prior studies that merely report algorithmic instability, our work systematically addresses this issue by introducing a robust agent selection process during training. To illustrate the practical benefits, we construct a simple equally weighted ensemble of selected agents that outperforms the static buy-and-hold benchmark by 4.3% and achieves a total return comparable to the portfolio's best-performing asset, while exhibiting superior risk characteristics during periods of market stress. Our methodology also incorporates methodological innovations, including a scaled reward structure to improve learning in bond markets. While instability is observed in the DDPG algorithm, our results demonstrate that this challenge can be systematically mitigated through robust agent selection and ensemble methods. These findings establish RL as a powerful tool for financial strategies where direct forecasting is complex and uncertain, offering a practical framework for implementation in fixed income markets.

actor-critic algorithm, deep deterministic policy gradient, fixed income, portfolio management, reinforcement learning

10.1080/1351847X.2025.2605061

1351-847X

Nunes, Manuel

af597793-a85a-463c-9d12-0ae4be7e0a69

Gerding, Enrico

d9e92ee5-1a8c-4467-a689-8363e7743362

McGroarty, Frank

693a5396-8e01-4d68-8973-d74184c03072

Niranjan, Mahesan

5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Nunes, Manuel

af597793-a85a-463c-9d12-0ae4be7e0a69

Gerding, Enrico

d9e92ee5-1a8c-4467-a689-8363e7743362

McGroarty, Frank

693a5396-8e01-4d68-8973-d74184c03072

Niranjan, Mahesan

5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Nunes, Manuel, Gerding, Enrico, McGroarty, Frank and Niranjan, Mahesan (2026) Reinforcement learning for bond portfolio management: an actor-critic approach. European Journal of Finance. (doi:10.1080/1351847X.2025.2605061).

Record type: Article

Abstract

Text

Reinforcement learning for bond portfolio management an actor-critic approach - Version of Record

Available under License Creative Commons Attribution.

Download (3MB)

More information

Accepted/In Press date: 11 December 2025

e-pub ahead of print date: 4 January 2026

Keywords: actor-critic algorithm, deep deterministic policy gradient, fixed income, portfolio management, reinforcement learning

Learn more about the Agents, Interactions and Complexity Learn more about the Vision, Learning and Control

Identifiers

Local EPrints ID: 507872

URI: http://eprints.soton.ac.uk/id/eprint/507872

DOI: doi:10.1080/1351847X.2025.2605061

ISSN: 1351-847X

PURE UUID: 636030ff-9ec1-495d-95f2-c2191d32509b

ORCID for Manuel Nunes:

orcid.org/0000-0002-7116-5502

ORCID for Enrico Gerding:

orcid.org/0000-0001-7200-552X

ORCID for Frank McGroarty:

orcid.org/0000-0003-2962-0927

ORCID for Mahesan Niranjan:

orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 07 Jan 2026 00:47

Last modified: 07 Mar 2026 04:11

Export record

Altmetrics

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Manuel Nunes

Author: Enrico Gerding

Author: Frank McGroarty

Author: Mahesan Niranjan

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information