Multi-trainer binary feedback interactive reinforcement learning
Multi-trainer binary feedback interactive reinforcement learning
Interactive reinforcement learning is an effective way to train agents via human feedback. However, it often requires the trainer (a human who provides feedback to the agent) to know the correct action for the agent. If the trainer is not always reliable, the wrong feedback may hinder the agent’s training. In addition, there is no consensus on the best form of human feedback in interactive reinforcement learning. To address these problems, in this paper, we explore the performance of binary reward as the reward form. Moreover, we propose a novel interactive reinforcement learning system called Multi-Trainer Interactive Reinforcement Learning (MTIRL), which can aggregate binary feedback from multiple imperfect trainers into a reliable reward for agent training in a reward-sparse environment. In addition, the review model in MTIRL can correct the unreliable rewards. In particular, our experiments for evaluating reward forms show that binary reward outperforms other reward forms, including ranking reward, scaling reward, and state value reward. In addition, our question-answer experiments show that our aggregation method outperforms the state-of-the-art aggregation methods, including majority voting, weighted voting, and the Bayesian aggregation method. Finally, we conduct grid-world experiments to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.
68T01, Human-in-the-loop reinforcement learning, Interactive reinforcement learning, Multiple people decision
Guo, Zhaori
d339a997-b5bc-46bf-a9cf-bc7726db96f1
Norman, Timothy J.
663e522f-807c-4569-9201-dc141c8eb50d
Gerding, Enrico H.
d9e92ee5-1a8c-4467-a689-8363e7743362
2 October 2024
Guo, Zhaori
d339a997-b5bc-46bf-a9cf-bc7726db96f1
Norman, Timothy J.
663e522f-807c-4569-9201-dc141c8eb50d
Gerding, Enrico H.
d9e92ee5-1a8c-4467-a689-8363e7743362
Guo, Zhaori, Norman, Timothy J. and Gerding, Enrico H.
(2024)
Multi-trainer binary feedback interactive reinforcement learning.
Annals of Mathematics and Artificial Intelligence.
(doi:10.1007/s10472-024-09956-4).
Abstract
Interactive reinforcement learning is an effective way to train agents via human feedback. However, it often requires the trainer (a human who provides feedback to the agent) to know the correct action for the agent. If the trainer is not always reliable, the wrong feedback may hinder the agent’s training. In addition, there is no consensus on the best form of human feedback in interactive reinforcement learning. To address these problems, in this paper, we explore the performance of binary reward as the reward form. Moreover, we propose a novel interactive reinforcement learning system called Multi-Trainer Interactive Reinforcement Learning (MTIRL), which can aggregate binary feedback from multiple imperfect trainers into a reliable reward for agent training in a reward-sparse environment. In addition, the review model in MTIRL can correct the unreliable rewards. In particular, our experiments for evaluating reward forms show that binary reward outperforms other reward forms, including ranking reward, scaling reward, and state value reward. In addition, our question-answer experiments show that our aggregation method outperforms the state-of-the-art aggregation methods, including majority voting, weighted voting, and the Bayesian aggregation method. Finally, we conduct grid-world experiments to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.
This record has no associated files available for download.
More information
Accepted/In Press date: 17 September 2024
Published date: 2 October 2024
Keywords:
68T01, Human-in-the-loop reinforcement learning, Interactive reinforcement learning, Multiple people decision
Identifiers
Local EPrints ID: 504717
URI: http://eprints.soton.ac.uk/id/eprint/504717
ISSN: 1012-2443
PURE UUID: 2015ace6-8c9d-4c99-82ec-8496d007716d
Catalogue record
Date deposited: 18 Sep 2025 16:40
Last modified: 19 Sep 2025 01:50
Export record
Altmetrics
Contributors
Author:
Zhaori Guo
Author:
Enrico H. Gerding
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics