Multi-trainer binary feedback interactive reinforcement learning

Interactive reinforcement learning is an effective way to train agents via human feedback. However, it often requires the trainer (a human who provides feedback to the agent) to know the correct action for the agent. If the trainer is not always reliable, the wrong feedback may hinder the agent’s training. In addition, there is no consensus on the best form of human feedback in interactive reinforcement learning. To address these problems, in this paper, we explore the performance of binary reward as the reward form. Moreover, we propose a novel interactive reinforcement learning system called Multi-Trainer Interactive Reinforcement Learning (MTIRL), which can aggregate binary feedback from multiple imperfect trainers into a reliable reward for agent training in a reward-sparse environment. In addition, the review model in MTIRL can correct the unreliable rewards. In particular, our experiments for evaluating reward forms show that binary reward outperforms other reward forms, including ranking reward, scaling reward, and state value reward. In addition, our question-answer experiments show that our aggregation method outperforms the state-of-the-art aggregation methods, including majority voting, weighted voting, and the Bayesian aggregation method. Finally, we conduct grid-world experiments to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.

68T01, Human-in-the-loop reinforcement learning, Interactive reinforcement learning, Multiple people decision

10.1007/s10472-024-09956-4

1012-2443

Guo, Zhaori

d339a997-b5bc-46bf-a9cf-bc7726db96f1

Norman, Timothy J.

663e522f-807c-4569-9201-dc141c8eb50d

Gerding, Enrico H.

d9e92ee5-1a8c-4467-a689-8363e7743362

2 October 2024

Guo, Zhaori

d339a997-b5bc-46bf-a9cf-bc7726db96f1

Norman, Timothy J.

663e522f-807c-4569-9201-dc141c8eb50d

Gerding, Enrico H.

d9e92ee5-1a8c-4467-a689-8363e7743362

Guo, Zhaori, Norman, Timothy J. and Gerding, Enrico H. (2024) Multi-trainer binary feedback interactive reinforcement learning. Annals of Mathematics and Artificial Intelligence. (doi:10.1007/s10472-024-09956-4).

Record type: Article

Abstract

This record has no associated files available for download.

More information

Accepted/In Press date: 17 September 2024

Published date: 2 October 2024

Keywords: 68T01, Human-in-the-loop reinforcement learning, Interactive reinforcement learning, Multiple people decision

Identifiers

Local EPrints ID: 504717

URI: http://eprints.soton.ac.uk/id/eprint/504717

DOI: doi:10.1007/s10472-024-09956-4

ISSN: 1012-2443

PURE UUID: 2015ace6-8c9d-4c99-82ec-8496d007716d

ORCID for Zhaori Guo:

orcid.org/0000-0002-1957-7059

ORCID for Timothy J. Norman:

orcid.org/0000-0002-6387-4034

ORCID for Enrico H. Gerding:

orcid.org/0000-0001-7200-552X

Catalogue record

Date deposited: 18 Sep 2025 16:40

Last modified: 19 Sep 2025 01:50

Export record

Altmetrics

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Zhaori Guo

Author: Timothy J. Norman

Author: Enrico H. Gerding

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information