How to train your agent: active learning from human preferences and justifications in safety-critical environments
How to train your agent: active learning from human preferences and justifications in safety-critical environments
Training reinforcement learning agents in real-world environments is costly, particularly for safety-critical applications. Human input can enable an agent to learn a good policy while avoiding unsafe actions, but at the cost of bothering the human with repeated queries. We present a model for safe learning in safety-critical environments from human input that minimises bother cost. Our model, JPAL-HA, proposes an efficient mechanism to harness human preferences and justifications to significantly improve safety during the learning process without increasing the number of interactions with a user. We show this with both simulation and human experiments.
paper, human-agent interaction, Reinforcement Learning, Supervised learning, human-robot interface
Kazantzidis, Ilias
10862613-d212-44fb-980c-ff8c6d0c4a95
Norman, Timothy
663e522f-807c-4569-9201-dc141c8eb50d
Du, Yali
0b0d4eef-0820-4753-b384-72db5058df32
Freeman, Christopher
ccdd1272-cdc7-43fb-a1bb-b1ef0bdf5815
9 May 2022
Kazantzidis, Ilias
10862613-d212-44fb-980c-ff8c6d0c4a95
Norman, Timothy
663e522f-807c-4569-9201-dc141c8eb50d
Du, Yali
0b0d4eef-0820-4753-b384-72db5058df32
Freeman, Christopher
ccdd1272-cdc7-43fb-a1bb-b1ef0bdf5815
Kazantzidis, Ilias, Norman, Timothy, Du, Yali and Freeman, Christopher
(2022)
How to train your agent: active learning from human preferences and justifications in safety-critical environments.
International Conference on Autonomous Agents and Milti-Agent Systems 2022, Auckland, Auckland, New Zealand.
09 - 13 May 2022.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Training reinforcement learning agents in real-world environments is costly, particularly for safety-critical applications. Human input can enable an agent to learn a good policy while avoiding unsafe actions, but at the cost of bothering the human with repeated queries. We present a model for safe learning in safety-critical environments from human input that minimises bother cost. Our model, JPAL-HA, proposes an efficient mechanism to harness human preferences and justifications to significantly improve safety during the learning process without increasing the number of interactions with a user. We show this with both simulation and human experiments.
Text
JPAL_AAMAS_2022_Extended_Abstract
- Version of Record
More information
Accepted/In Press date: 2022
Published date: 9 May 2022
Venue - Dates:
International Conference on Autonomous Agents and Milti-Agent Systems 2022, Auckland, Auckland, New Zealand, 2022-05-09 - 2022-05-13
Keywords:
paper, human-agent interaction, Reinforcement Learning, Supervised learning, human-robot interface
Identifiers
Local EPrints ID: 454808
URI: http://eprints.soton.ac.uk/id/eprint/454808
PURE UUID: eb507c70-5ab0-427d-9507-632bc06f9eb5
Catalogue record
Date deposited: 24 Feb 2022 21:49
Last modified: 17 Mar 2024 04:04
Export record
Contributors
Author:
Ilias Kazantzidis
Author:
Yali Du
Author:
Christopher Freeman
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics