Boosting reinforcement learning with strongly delayed feedback through auxiliary short delays

Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments. Specifically, AD-RL learns a value function for short delays and uses bootstrapping and policy improvement techniques to adjust it for long delays. We theoretically show that this can greatly reduce the sample complexity. On deterministic and stochastic benchmarks, our method significantly outperforms the SOTAs in both sample efficiency and policy performance. Code is available at https://github.com/QingyuanWuNothing/AD-RL.

JMLR.org

Wu, Qingyuan

c0101d61-5388-417a-b3a8-3eb3aaab1e5d

Zhan, Simon Sinong

a1183e07-c3a7-4b82-b01e-991a3cdd997f

Wang, Yixuan

bd79cf17-6e58-4d7f-bf8d-482a35260a90

Wang, Yuhui

845ed006-3dfc-4b83-b915-74730425c8e1

Lin, Chung-Wei

53a3aa06-dc6d-4115-816b-8ec3a64ab4d1

Lv, Chen

ad87a9c6-1b5b-4670-8ec3-75c30e6a8ed7

Zhu, Qi

aea85729-2a65-4f3c-8926-58deb8159a14

Schmidhuber, Jurgen

db542103-19a7-41f0-b249-b6d01b17307a

Huang, Chao

d04ceba3-2293-4792-bdb9-11e05b5a9d41

Salakhutdinov, Ruslan

Kolter, Zico

Heller, Katherine

Weller, Adrian

Scarlett, Jonathan

Berkenkamp, Felix

21 July 2024

Wu, Qingyuan

c0101d61-5388-417a-b3a8-3eb3aaab1e5d

Zhan, Simon Sinong

a1183e07-c3a7-4b82-b01e-991a3cdd997f

Wang, Yixuan

bd79cf17-6e58-4d7f-bf8d-482a35260a90

Wang, Yuhui

845ed006-3dfc-4b83-b915-74730425c8e1

Lin, Chung-Wei

53a3aa06-dc6d-4115-816b-8ec3a64ab4d1

Lv, Chen

ad87a9c6-1b5b-4670-8ec3-75c30e6a8ed7

Zhu, Qi

aea85729-2a65-4f3c-8926-58deb8159a14

Schmidhuber, Jurgen

db542103-19a7-41f0-b249-b6d01b17307a

Huang, Chao

d04ceba3-2293-4792-bdb9-11e05b5a9d41

Salakhutdinov, Ruslan

Kolter, Zico

Heller, Katherine

Weller, Adrian

Scarlett, Jonathan

Berkenkamp, Felix

Wu, Qingyuan, Zhan, Simon Sinong, Wang, Yixuan, Wang, Yuhui, Lin, Chung-Wei, Lv, Chen, Zhu, Qi, Schmidhuber, Jurgen and Huang, Chao (2024) Boosting reinforcement learning with strongly delayed feedback through auxiliary short delays. Salakhutdinov, Ruslan, Kolter, Zico, Heller, Katherine, Weller, Adrian, Scarlett, Jonathan and Berkenkamp, Felix (eds.) In Proceedings of the 41st International Conference on Machine Learning. vol. 235, JMLR.org. 26 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

Text

2735_Boosting_Reinforcement_Le - Version of Record

Restricted to Repository staff only

Request a copy

More information

Published date: 21 July 2024

Venue - Dates: <br/>ICML'24: International Conference on Machine Learning, , Vienna, Austria, 2024-07-21 - 2024-07-27

Learn more about the School of Electronics and Computer Science Learn more about the Cyber Physical Systems

Identifiers

Local EPrints ID: 500737

URI: http://eprints.soton.ac.uk/id/eprint/500737

PURE UUID: 78e55ad0-94f2-4a55-87d8-7e0cf011dd8d

ORCID for Chao Huang:

orcid.org/0000-0002-9300-1787

Catalogue record

Date deposited: 12 May 2025 16:41

Last modified: 13 May 2025 02:09

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Qingyuan Wu

Author: Simon Sinong Zhan

Author: Yixuan Wang

Author: Yuhui Wang

Author: Chung-Wei Lin

Author: Chen Lv

Author: Qi Zhu

Author: Jurgen Schmidhuber

Author: Chao Huang

Editor: Ruslan Salakhutdinov

Editor: Zico Kolter

Editor: Katherine Heller

Editor: Adrian Weller

Editor: Jonathan Scarlett

Editor: Felix Berkenkamp

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information