Deep reinforcement learning for portfolio selection

This study proposes an advanced model-free deep reinforcement learning (DRL) framework to construct optimal portfolio strategies in dynamic, complex, and large-dimensional financial markets. Investors' risk aversion and transaction cost constraints are embedded in an extended Markowitz's mean-variance reward function by employing a twin-delayed deep deterministic policy gradient (TD3) algorithm. This study designs a DRL-TD3-based risk and transaction cost-sensitive portfolio that combines advanced exploration strategies and dynamic policy updates. The proposed portfolio method effectively addresses the challenges posed by high-dimensional state and action spaces in complex financial markets. This methodology provides two optimal portfolios by flexibly controlling transaction and risk costs with (i) the constituents of the Dow Jones Industrial Average and (ii) the constituents of the S&P100 index. Results demonstrate a strong portfolio performance of the proposed DRL portfolio compared to those of several competitors from the traditional and DRL literatures.

Deep reinforcement learning, Portfolio constraint, Portfolio risk awareness, Portfolio trading, Transaction cost

10.1016/j.gfj.2024.101016

1044-0283

Jiang, Yifu

cff1ccf8-1299-45de-95ec-f449f30fa0b8

Olmo, Jose

706f68c8-f991-4959-8245-6657a591056e

Atwi, Majed

a713c2fd-6b12-412d-9065-8a72ae788ad7