The University of Southampton
University of Southampton Institutional Repository

Boosting reinforcement learning with strongly delayed feedback through auxiliary short delays

Boosting reinforcement learning with strongly delayed feedback through auxiliary short delays
Boosting reinforcement learning with strongly delayed feedback through auxiliary short delays
Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments. Specifically, AD-RL learns a value function for short delays and uses bootstrapping and policy improvement techniques to adjust it for long delays. We theoretically show that this can greatly reduce the sample complexity. On deterministic and stochastic benchmarks, our method significantly outperforms the SOTAs in both sample efficiency and policy performance. Code is available at https://github.com/QingyuanWuNothing/AD-RL.
JMLR.org
Wu, Qingyuan
c0101d61-5388-417a-b3a8-3eb3aaab1e5d
Zhan, Simon Sinong
a1183e07-c3a7-4b82-b01e-991a3cdd997f
Wang, Yixuan
bd79cf17-6e58-4d7f-bf8d-482a35260a90
Wang, Yuhui
845ed006-3dfc-4b83-b915-74730425c8e1
Lin, Chung-Wei
53a3aa06-dc6d-4115-816b-8ec3a64ab4d1
Lv, Chen
ad87a9c6-1b5b-4670-8ec3-75c30e6a8ed7
Zhu, Qi
aea85729-2a65-4f3c-8926-58deb8159a14
Schmidhuber, Jurgen
db542103-19a7-41f0-b249-b6d01b17307a
Huang, Chao
d04ceba3-2293-4792-bdb9-11e05b5a9d41
Salakhutdinov, Ruslan
Kolter, Zico
Heller, Katherine
Weller, Adrian
Scarlett, Jonathan
Berkenkamp, Felix
Wu, Qingyuan
c0101d61-5388-417a-b3a8-3eb3aaab1e5d
Zhan, Simon Sinong
a1183e07-c3a7-4b82-b01e-991a3cdd997f
Wang, Yixuan
bd79cf17-6e58-4d7f-bf8d-482a35260a90
Wang, Yuhui
845ed006-3dfc-4b83-b915-74730425c8e1
Lin, Chung-Wei
53a3aa06-dc6d-4115-816b-8ec3a64ab4d1
Lv, Chen
ad87a9c6-1b5b-4670-8ec3-75c30e6a8ed7
Zhu, Qi
aea85729-2a65-4f3c-8926-58deb8159a14
Schmidhuber, Jurgen
db542103-19a7-41f0-b249-b6d01b17307a
Huang, Chao
d04ceba3-2293-4792-bdb9-11e05b5a9d41
Salakhutdinov, Ruslan
Kolter, Zico
Heller, Katherine
Weller, Adrian
Scarlett, Jonathan
Berkenkamp, Felix

Wu, Qingyuan, Zhan, Simon Sinong, Wang, Yixuan, Wang, Yuhui, Lin, Chung-Wei, Lv, Chen, Zhu, Qi, Schmidhuber, Jurgen and Huang, Chao (2024) Boosting reinforcement learning with strongly delayed feedback through auxiliary short delays. Salakhutdinov, Ruslan, Kolter, Zico, Heller, Katherine, Weller, Adrian, Scarlett, Jonathan and Berkenkamp, Felix (eds.) In Proceedings of the 41st International Conference on Machine Learning. vol. 235, JMLR.org. 26 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments. Specifically, AD-RL learns a value function for short delays and uses bootstrapping and policy improvement techniques to adjust it for long delays. We theoretically show that this can greatly reduce the sample complexity. On deterministic and stochastic benchmarks, our method significantly outperforms the SOTAs in both sample efficiency and policy performance. Code is available at https://github.com/QingyuanWuNothing/AD-RL.

Text
2735_Boosting_Reinforcement_Le - Version of Record
Restricted to Repository staff only
Request a copy

More information

Published date: 21 July 2024
Venue - Dates: <br/>ICML'24: International Conference on Machine Learning, , Vienna, Austria, 2024-07-21 - 2024-07-27

Identifiers

Local EPrints ID: 500737
URI: http://eprints.soton.ac.uk/id/eprint/500737
PURE UUID: 78e55ad0-94f2-4a55-87d8-7e0cf011dd8d
ORCID for Chao Huang: ORCID iD orcid.org/0000-0002-9300-1787

Catalogue record

Date deposited: 12 May 2025 16:41
Last modified: 13 May 2025 02:09

Export record

Contributors

Author: Qingyuan Wu
Author: Simon Sinong Zhan
Author: Yixuan Wang
Author: Yuhui Wang
Author: Chung-Wei Lin
Author: Chen Lv
Author: Qi Zhu
Author: Jurgen Schmidhuber
Author: Chao Huang ORCID iD
Editor: Ruslan Salakhutdinov
Editor: Zico Kolter
Editor: Katherine Heller
Editor: Adrian Weller
Editor: Jonathan Scarlett
Editor: Felix Berkenkamp

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×