Lifetime policy reuse and the importance of task capacity
Lifetime policy reuse and the importance of task capacity
A long-standing challenge in artificial intelligence is lifelong reinforcement learning, where learners are given many tasks in sequence and must transfer knowledge between tasks while avoiding catastrophic forgetting. Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies. This paper presents two novel contributions, namely 1) Lifetime Policy Reuse, a model-agnostic policy reuse algorithm that avoids generating many policies by optimising a fixed number of near-optimal policies through a combination of policy optimisation and adaptive policy selection; and 2) the task capacity, a measure for the maximal number of tasks that a policy can accurately solve. Comparing two state-of-the-art base-learners, the results demonstrate the importance of Lifetime Policy Reuse and task capacity based pre-selection on an 18-task partially observable Pacman domain and a Cartpole domain of up to 125 tasks.
Bossens, David M.
519875d6-653c-47e6-8caa-9d1f2eb24246
Sobey, Adam J.
e850606f-aa79-4c99-8682-2cfffda3cd28
Bossens, David M.
519875d6-653c-47e6-8caa-9d1f2eb24246
Sobey, Adam J.
e850606f-aa79-4c99-8682-2cfffda3cd28
Bossens, David M. and Sobey, Adam J.
(2023)
Lifetime policy reuse and the importance of task capacity.
AI Communications.
(doi:10.3233/AIC-230040).
Abstract
A long-standing challenge in artificial intelligence is lifelong reinforcement learning, where learners are given many tasks in sequence and must transfer knowledge between tasks while avoiding catastrophic forgetting. Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies. This paper presents two novel contributions, namely 1) Lifetime Policy Reuse, a model-agnostic policy reuse algorithm that avoids generating many policies by optimising a fixed number of near-optimal policies through a combination of policy optimisation and adaptive policy selection; and 2) the task capacity, a measure for the maximal number of tasks that a policy can accurately solve. Comparing two state-of-the-art base-learners, the results demonstrate the importance of Lifetime Policy Reuse and task capacity based pre-selection on an 18-task partially observable Pacman domain and a Cartpole domain of up to 125 tasks.
Text
manuscript_AICfinal
- Accepted Manuscript
More information
Accepted/In Press date: 3 October 2023
e-pub ahead of print date: 24 October 2023
Additional Information:
This research was funded by the Engineering and Physical Sciences Research Council and by Lloyd’s Register Foundation.
Identifiers
Local EPrints ID: 483617
URI: http://eprints.soton.ac.uk/id/eprint/483617
ISSN: 0921-7126
PURE UUID: 9902098f-8a2a-4751-99c4-9ed5ee3e21fe
Catalogue record
Date deposited: 02 Nov 2023 17:49
Last modified: 18 Mar 2024 03:07
Export record
Altmetrics
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics