Reinforcement learning for bond portfolio management: an actor-critic approach
Reinforcement learning for bond portfolio management: an actor-critic approach
Portfolio management poses unique challenges for traditional forecasting methods due to its complex, sequential decision-making process. This study leverages reinforcement learning (RL) to address these challenges, focusing on fixed income portfolio management. We develop a novel autonomous RL system using a custom environment for bond exchange-traded fund (ETF) dynamics and the Deep Deterministic Policy Gradient (DDPG) algorithm. Unlike prior studies that merely report algorithmic instability, our work systematically addresses this issue by introducing a robust agent selection process during training. To illustrate the practical benefits, we construct a simple equally weighted ensemble of selected agents that outperforms the static buy-and-hold benchmark by 4.3% and achieves a total return comparable to the portfolio's best-performing asset, while exhibiting superior risk characteristics during periods of market stress. Our methodology also incorporates methodological innovations, including a scaled reward structure to improve learning in bond markets. While instability is observed in the DDPG algorithm, our results demonstrate that this challenge can be systematically mitigated through robust agent selection and ensemble methods. These findings establish RL as a powerful tool for financial strategies where direct forecasting is complex and uncertain, offering a practical framework for implementation in fixed income markets.
portfolio management, reinforcement learning, actor-critic algorithm, deep deterministic policy gradient, fixed income
Nunes, Manuel
af597793-a85a-463c-9d12-0ae4be7e0a69
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
McGroarty, Frank
693a5396-8e01-4d68-8973-d74184c03072
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Nunes, Manuel
af597793-a85a-463c-9d12-0ae4be7e0a69
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
McGroarty, Frank
693a5396-8e01-4d68-8973-d74184c03072
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Nunes, Manuel, Gerding, Enrico, McGroarty, Frank and Niranjan, Mahesan
(2026)
Reinforcement learning for bond portfolio management: an actor-critic approach.
European Journal of Finance.
(doi:10.1080/1351847X.2025.2605061).
Abstract
Portfolio management poses unique challenges for traditional forecasting methods due to its complex, sequential decision-making process. This study leverages reinforcement learning (RL) to address these challenges, focusing on fixed income portfolio management. We develop a novel autonomous RL system using a custom environment for bond exchange-traded fund (ETF) dynamics and the Deep Deterministic Policy Gradient (DDPG) algorithm. Unlike prior studies that merely report algorithmic instability, our work systematically addresses this issue by introducing a robust agent selection process during training. To illustrate the practical benefits, we construct a simple equally weighted ensemble of selected agents that outperforms the static buy-and-hold benchmark by 4.3% and achieves a total return comparable to the portfolio's best-performing asset, while exhibiting superior risk characteristics during periods of market stress. Our methodology also incorporates methodological innovations, including a scaled reward structure to improve learning in bond markets. While instability is observed in the DDPG algorithm, our results demonstrate that this challenge can be systematically mitigated through robust agent selection and ensemble methods. These findings establish RL as a powerful tool for financial strategies where direct forecasting is complex and uncertain, offering a practical framework for implementation in fixed income markets.
Text
Reinforcement learning for bond portfolio management an actor-critic approach
- Version of Record
More information
Accepted/In Press date: 11 December 2025
e-pub ahead of print date: 4 January 2026
Keywords:
portfolio management, reinforcement learning, actor-critic algorithm, deep deterministic policy gradient, fixed income
Identifiers
Local EPrints ID: 507872
URI: http://eprints.soton.ac.uk/id/eprint/507872
ISSN: 1351-847X
PURE UUID: 636030ff-9ec1-495d-95f2-c2191d32509b
Catalogue record
Date deposited: 07 Jan 2026 00:47
Last modified: 08 Jan 2026 03:14
Export record
Altmetrics
Contributors
Author:
Manuel Nunes
Author:
Enrico Gerding
Author:
Frank McGroarty
Author:
Mahesan Niranjan
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics