The University of Southampton
University of Southampton Institutional Repository

Multi-trainer binary feedback interactive reinforcement learning

Multi-trainer binary feedback interactive reinforcement learning
Multi-trainer binary feedback interactive reinforcement learning

Interactive reinforcement learning is an effective way to train agents via human feedback. However, it often requires the trainer (a human who provides feedback to the agent) to know the correct action for the agent. If the trainer is not always reliable, the wrong feedback may hinder the agent’s training. In addition, there is no consensus on the best form of human feedback in interactive reinforcement learning. To address these problems, in this paper, we explore the performance of binary reward as the reward form. Moreover, we propose a novel interactive reinforcement learning system called Multi-Trainer Interactive Reinforcement Learning (MTIRL), which can aggregate binary feedback from multiple imperfect trainers into a reliable reward for agent training in a reward-sparse environment. In addition, the review model in MTIRL can correct the unreliable rewards. In particular, our experiments for evaluating reward forms show that binary reward outperforms other reward forms, including ranking reward, scaling reward, and state value reward. In addition, our question-answer experiments show that our aggregation method outperforms the state-of-the-art aggregation methods, including majority voting, weighted voting, and the Bayesian aggregation method. Finally, we conduct grid-world experiments to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.

68T01, Human-in-the-loop reinforcement learning, Interactive reinforcement learning, Multiple people decision
1012-2443
Guo, Zhaori
d339a997-b5bc-46bf-a9cf-bc7726db96f1
Norman, Timothy J.
663e522f-807c-4569-9201-dc141c8eb50d
Gerding, Enrico H.
d9e92ee5-1a8c-4467-a689-8363e7743362
Guo, Zhaori
d339a997-b5bc-46bf-a9cf-bc7726db96f1
Norman, Timothy J.
663e522f-807c-4569-9201-dc141c8eb50d
Gerding, Enrico H.
d9e92ee5-1a8c-4467-a689-8363e7743362

Guo, Zhaori, Norman, Timothy J. and Gerding, Enrico H. (2024) Multi-trainer binary feedback interactive reinforcement learning. Annals of Mathematics and Artificial Intelligence. (doi:10.1007/s10472-024-09956-4).

Record type: Article

Abstract

Interactive reinforcement learning is an effective way to train agents via human feedback. However, it often requires the trainer (a human who provides feedback to the agent) to know the correct action for the agent. If the trainer is not always reliable, the wrong feedback may hinder the agent’s training. In addition, there is no consensus on the best form of human feedback in interactive reinforcement learning. To address these problems, in this paper, we explore the performance of binary reward as the reward form. Moreover, we propose a novel interactive reinforcement learning system called Multi-Trainer Interactive Reinforcement Learning (MTIRL), which can aggregate binary feedback from multiple imperfect trainers into a reliable reward for agent training in a reward-sparse environment. In addition, the review model in MTIRL can correct the unreliable rewards. In particular, our experiments for evaluating reward forms show that binary reward outperforms other reward forms, including ranking reward, scaling reward, and state value reward. In addition, our question-answer experiments show that our aggregation method outperforms the state-of-the-art aggregation methods, including majority voting, weighted voting, and the Bayesian aggregation method. Finally, we conduct grid-world experiments to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.

This record has no associated files available for download.

More information

Accepted/In Press date: 17 September 2024
Published date: 2 October 2024
Keywords: 68T01, Human-in-the-loop reinforcement learning, Interactive reinforcement learning, Multiple people decision

Identifiers

Local EPrints ID: 504717
URI: http://eprints.soton.ac.uk/id/eprint/504717
ISSN: 1012-2443
PURE UUID: 2015ace6-8c9d-4c99-82ec-8496d007716d
ORCID for Zhaori Guo: ORCID iD orcid.org/0000-0002-1957-7059
ORCID for Timothy J. Norman: ORCID iD orcid.org/0000-0002-6387-4034
ORCID for Enrico H. Gerding: ORCID iD orcid.org/0000-0001-7200-552X

Catalogue record

Date deposited: 18 Sep 2025 16:40
Last modified: 19 Sep 2025 01:50

Export record

Altmetrics

Contributors

Author: Zhaori Guo ORCID iD
Author: Enrico H. Gerding ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×