MTIRL: Multi-trainer interactive reinforcement learning system
MTIRL: Multi-trainer interactive reinforcement learning system
Interactive reinforcement learning can effectively facilitate the agent training via human feedback. However, such methods often require the human teacher to know what is the correct action that the agent should take. In other words, if the human teacher is not always reliable, then it will not be consistently able to guide the agent through its training. In this paper, we propose a more effective interactive reinforcement learning system by introducing multiple trainers, namely Multi-Trainer Interactive Reinforcement Learning (MTIRL), which could aggregate the binary feedback from multiple non-perfect trainers into a more reliable reward for an agent training in a reward-sparse environment. In particular, our trainer feedback aggregation experiments show that our aggregation method has the best accuracy when compared with the majority voting, the weighted voting, and the Bayesian method. Finally, we conduct a grid-world experiment to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.
Human-in-the-loop Reinforcement learning, Interactive Reinforcement Learning, Multiple People Decision, Interactive reinforcement learning, Human-in-the-loop reinforcement learning, Multiple people decision
227 - 242
Guo, Zhaori
d339a997-b5bc-46bf-a9cf-bc7726db96f1
Norman, Timothy
663e522f-807c-4569-9201-dc141c8eb50d
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
16 November 2022
Guo, Zhaori
d339a997-b5bc-46bf-a9cf-bc7726db96f1
Norman, Timothy
663e522f-807c-4569-9201-dc141c8eb50d
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
Guo, Zhaori, Norman, Timothy and Gerding, Enrico
(2022)
MTIRL: Multi-trainer interactive reinforcement learning system.
Aydoğan, Reyhan, Criado, Natalia, Sanchez-Anguix, Victor, Lang, Jérôme and Serramia, Marc
(eds.)
In PRIMA 2022: Principles and Practice of Multi-Agent Systems: Principles and Practice of Multi-Agent Systems - 24th International Conference, Proceedings.
vol. 13753 LNAI,
.
(doi:10.1007/978-3-031-21203-1_14).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Interactive reinforcement learning can effectively facilitate the agent training via human feedback. However, such methods often require the human teacher to know what is the correct action that the agent should take. In other words, if the human teacher is not always reliable, then it will not be consistently able to guide the agent through its training. In this paper, we propose a more effective interactive reinforcement learning system by introducing multiple trainers, namely Multi-Trainer Interactive Reinforcement Learning (MTIRL), which could aggregate the binary feedback from multiple non-perfect trainers into a more reliable reward for an agent training in a reward-sparse environment. In particular, our trainer feedback aggregation experiments show that our aggregation method has the best accuracy when compared with the majority voting, the weighted voting, and the Bayesian method. Finally, we conduct a grid-world experiment to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.
Text
MTIRL_PRIMA
- Accepted Manuscript
Restricted to Repository staff only
Request a copy
More information
e-pub ahead of print date: 12 November 2022
Published date: 16 November 2022
Additional Information:
Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Venue - Dates:
PRIMA 2022: Principles and Practice of Multi-Agent Systems, Valencia, Valencia, Spain, 2022-11-16 - 2022-11-18
Keywords:
Human-in-the-loop Reinforcement learning, Interactive Reinforcement Learning, Multiple People Decision, Interactive reinforcement learning, Human-in-the-loop reinforcement learning, Multiple people decision
Identifiers
Local EPrints ID: 471577
URI: http://eprints.soton.ac.uk/id/eprint/471577
ISSN: 0302-9743
PURE UUID: 347e7e74-7940-487d-bdc5-ffd70db13235
Catalogue record
Date deposited: 14 Nov 2022 17:31
Last modified: 06 Jun 2024 01:55
Export record
Altmetrics
Contributors
Author:
Zhaori Guo
Author:
Enrico Gerding
Editor:
Reyhan Aydoğan
Editor:
Natalia Criado
Editor:
Victor Sanchez-Anguix
Editor:
Jérôme Lang
Editor:
Marc Serramia
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics