The University of Southampton
University of Southampton Institutional Repository

Continual deep reinforcement learning with task-agnostic policy distillation

Continual deep reinforcement learning with task-agnostic policy distillation
Continual deep reinforcement learning with task-agnostic policy distillation
Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)–(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal and maximizes only its intrinsic motivation. The knowledge gained during this phase is later distilled for further exploration. Therefore, the agent acts in a self-supervised manner by systematically seeking novel states. By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently, leading to improved sample efficiency. Our code is available at the repository: https://github.com/wabbajack1/TAPD.
Continual learning, Reinforcement learning, Self-supervised learning, Task-agnostic learning
2045-2322
Hafez, Muhammad Burhan
e8c991ab-d800-46f2-abeb-cb169a1ed47e
Erekmen, Kerim
42304dcf-be24-4970-a953-26eab8cd323e
Hafez, Muhammad Burhan
e8c991ab-d800-46f2-abeb-cb169a1ed47e
Erekmen, Kerim
42304dcf-be24-4970-a953-26eab8cd323e

Hafez, Muhammad Burhan and Erekmen, Kerim (2024) Continual deep reinforcement learning with task-agnostic policy distillation. Scientific Reports, 14 (1), [31661]. (doi:10.1038/s41598-024-80774-8).

Record type: Article

Abstract

Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)–(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal and maximizes only its intrinsic motivation. The knowledge gained during this phase is later distilled for further exploration. Therefore, the agent acts in a self-supervised manner by systematically seeking novel states. By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently, leading to improved sample efficiency. Our code is available at the repository: https://github.com/wabbajack1/TAPD.

Text
preprint-Nature SR - Accepted Manuscript
Restricted to Repository staff only
Request a copy
Text
s41598-024-80774-8 - Version of Record
Available under License Creative Commons Attribution.
Download (4MB)

More information

Accepted/In Press date: 21 November 2024
Published date: 30 December 2024
Keywords: Continual learning, Reinforcement learning, Self-supervised learning, Task-agnostic learning

Identifiers

Local EPrints ID: 496809
URI: http://eprints.soton.ac.uk/id/eprint/496809
ISSN: 2045-2322
PURE UUID: 559956e4-6fbf-441e-8100-9d85a2e84034
ORCID for Muhammad Burhan Hafez: ORCID iD orcid.org/0000-0003-1670-8962

Catalogue record

Date deposited: 08 Jan 2025 07:09
Last modified: 22 Aug 2025 02:42

Export record

Altmetrics

Contributors

Author: Muhammad Burhan Hafez ORCID iD
Author: Kerim Erekmen

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×