The University of Southampton
University of Southampton Institutional Repository

How to train your agent: active learning from human preferences and justifications in safety-critical environments

How to train your agent: active learning from human preferences and justifications in safety-critical environments
How to train your agent: active learning from human preferences and justifications in safety-critical environments
Training reinforcement learning agents in real-world environments is costly, particularly for safety-critical applications. Human input can enable an agent to learn a good policy while avoiding unsafe actions, but at the cost of bothering the human with repeated queries. We present a model for safe learning in safety-critical environments from human input that minimises bother cost. Our model, JPAL-HA, proposes an efficient mechanism to harness human preferences and justifications to significantly improve safety during the learning process without increasing the number of interactions with a user. We show this with both simulation and human experiments.
paper, human-agent interaction, Reinforcement Learning, Supervised learning, human-robot interface
Kazantzidis, Ilias
10862613-d212-44fb-980c-ff8c6d0c4a95
Norman, Timothy
663e522f-807c-4569-9201-dc141c8eb50d
Du, Yali
0b0d4eef-0820-4753-b384-72db5058df32
Freeman, Christopher
ccdd1272-cdc7-43fb-a1bb-b1ef0bdf5815
Kazantzidis, Ilias
10862613-d212-44fb-980c-ff8c6d0c4a95
Norman, Timothy
663e522f-807c-4569-9201-dc141c8eb50d
Du, Yali
0b0d4eef-0820-4753-b384-72db5058df32
Freeman, Christopher
ccdd1272-cdc7-43fb-a1bb-b1ef0bdf5815

Kazantzidis, Ilias, Norman, Timothy, Du, Yali and Freeman, Christopher (2022) How to train your agent: active learning from human preferences and justifications in safety-critical environments. International Conference on Autonomous Agents and Milti-Agent Systems 2022, Auckland, Auckland, New Zealand. 09 - 13 May 2022.

Record type: Conference or Workshop Item (Paper)

Abstract

Training reinforcement learning agents in real-world environments is costly, particularly for safety-critical applications. Human input can enable an agent to learn a good policy while avoiding unsafe actions, but at the cost of bothering the human with repeated queries. We present a model for safe learning in safety-critical environments from human input that minimises bother cost. Our model, JPAL-HA, proposes an efficient mechanism to harness human preferences and justifications to significantly improve safety during the learning process without increasing the number of interactions with a user. We show this with both simulation and human experiments.

Text
JPAL_AAMAS_2022_Extended_Abstract - Version of Record
Download (1MB)

More information

Accepted/In Press date: 2022
Published date: 9 May 2022
Venue - Dates: International Conference on Autonomous Agents and Milti-Agent Systems 2022, Auckland, Auckland, New Zealand, 2022-05-09 - 2022-05-13
Keywords: paper, human-agent interaction, Reinforcement Learning, Supervised learning, human-robot interface

Identifiers

Local EPrints ID: 454808
URI: http://eprints.soton.ac.uk/id/eprint/454808
PURE UUID: eb507c70-5ab0-427d-9507-632bc06f9eb5
ORCID for Ilias Kazantzidis: ORCID iD orcid.org/0000-0002-1127-3843
ORCID for Timothy Norman: ORCID iD orcid.org/0000-0002-6387-4034

Catalogue record

Date deposited: 24 Feb 2022 21:49
Last modified: 17 Mar 2024 04:04

Export record

Contributors

Author: Ilias Kazantzidis ORCID iD
Author: Timothy Norman ORCID iD
Author: Yali Du
Author: Christopher Freeman

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×