The University of Southampton
University of Southampton Institutional Repository

Lifetime policy reuse and the importance of task capacity

Lifetime policy reuse and the importance of task capacity
Lifetime policy reuse and the importance of task capacity
A long-standing challenge in artificial intelligence is lifelong reinforcement learning, where learners are given many tasks in sequence and must transfer knowledge between tasks while avoiding catastrophic forgetting. Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies. This paper presents two novel contributions, namely 1) Lifetime Policy Reuse, a model-agnostic policy reuse algorithm that avoids generating many policies by optimising a fixed number of near-optimal policies through a combination of policy optimisation and adaptive policy selection; and 2) the task capacity, a measure for the maximal number of tasks that a policy can accurately solve. Comparing two state-of-the-art base-learners, the results demonstrate the importance of Lifetime Policy Reuse and task capacity based pre-selection on an 18-task partially observable Pacman domain and a Cartpole domain of up to 125 tasks.
0921-7126
Bossens, David M.
519875d6-653c-47e6-8caa-9d1f2eb24246
Sobey, Adam J.
e850606f-aa79-4c99-8682-2cfffda3cd28
Bossens, David M.
519875d6-653c-47e6-8caa-9d1f2eb24246
Sobey, Adam J.
e850606f-aa79-4c99-8682-2cfffda3cd28

Bossens, David M. and Sobey, Adam J. (2023) Lifetime policy reuse and the importance of task capacity. AI Communications. (doi:10.3233/AIC-230040).

Record type: Article

Abstract

A long-standing challenge in artificial intelligence is lifelong reinforcement learning, where learners are given many tasks in sequence and must transfer knowledge between tasks while avoiding catastrophic forgetting. Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies. This paper presents two novel contributions, namely 1) Lifetime Policy Reuse, a model-agnostic policy reuse algorithm that avoids generating many policies by optimising a fixed number of near-optimal policies through a combination of policy optimisation and adaptive policy selection; and 2) the task capacity, a measure for the maximal number of tasks that a policy can accurately solve. Comparing two state-of-the-art base-learners, the results demonstrate the importance of Lifetime Policy Reuse and task capacity based pre-selection on an 18-task partially observable Pacman domain and a Cartpole domain of up to 125 tasks.

Text
manuscript_AICfinal - Accepted Manuscript
Available under License Creative Commons Attribution.
Download (1MB)

More information

Accepted/In Press date: 3 October 2023
e-pub ahead of print date: 24 October 2023
Additional Information: This research was funded by the Engineering and Physical Sciences Research Council and by Lloyd’s Register Foundation.

Identifiers

Local EPrints ID: 483617
URI: http://eprints.soton.ac.uk/id/eprint/483617
ISSN: 0921-7126
PURE UUID: 9902098f-8a2a-4751-99c4-9ed5ee3e21fe
ORCID for Adam J. Sobey: ORCID iD orcid.org/0000-0001-6880-8338

Catalogue record

Date deposited: 02 Nov 2023 17:49
Last modified: 18 Mar 2024 03:07

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×