MTIRL: Multi-trainer interactive reinforcement learning system

Interactive reinforcement learning can effectively facilitate the agent training via human feedback. However, such methods often require the human teacher to know what is the correct action that the agent should take. In other words, if the human teacher is not always reliable, then it will not be consistently able to guide the agent through its training. In this paper, we propose a more effective interactive reinforcement learning system by introducing multiple trainers, namely Multi-Trainer Interactive Reinforcement Learning (MTIRL), which could aggregate the binary feedback from multiple non-perfect trainers into a more reliable reward for an agent training in a reward-sparse environment. In particular, our trainer feedback aggregation experiments show that our aggregation method has the best accuracy when compared with the majority voting, the weighted voting, and the Bayesian method. Finally, we conduct a grid-world experiment to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.

Human-in-the-loop Reinforcement learning, Interactive Reinforcement Learning, Multiple People Decision, Interactive reinforcement learning, Human-in-the-loop reinforcement learning, Multiple people decision

10.1007/978-3-031-21203-1_14

0302-9743

227 - 242

Guo, Zhaori

d339a997-b5bc-46bf-a9cf-bc7726db96f1

Norman, Timothy

663e522f-807c-4569-9201-dc141c8eb50d

Gerding, Enrico

d9e92ee5-1a8c-4467-a689-8363e7743362

Aydoğan, Reyhan

Criado, Natalia

Sanchez-Anguix, Victor

Lang, Jérôme

Serramia, Marc

16 November 2022

Guo, Zhaori

d339a997-b5bc-46bf-a9cf-bc7726db96f1

Norman, Timothy

663e522f-807c-4569-9201-dc141c8eb50d

Gerding, Enrico

d9e92ee5-1a8c-4467-a689-8363e7743362

Aydoğan, Reyhan

Criado, Natalia

Sanchez-Anguix, Victor

Lang, Jérôme

Serramia, Marc

Guo, Zhaori, Norman, Timothy and Gerding, Enrico (2022) MTIRL: Multi-trainer interactive reinforcement learning system. Aydoğan, Reyhan, Criado, Natalia, Sanchez-Anguix, Victor, Lang, Jérôme and Serramia, Marc (eds.) In PRIMA 2022: Principles and Practice of Multi-Agent Systems: Principles and Practice of Multi-Agent Systems - 24th International Conference, Proceedings. vol. 13753 LNAI, 227 - 242 . (doi:10.1007/978-3-031-21203-1_14).

Record type: Conference or Workshop Item (Paper)

Abstract

Text

MTIRL_PRIMA - Accepted Manuscript

Restricted to Repository staff only

Request a copy

More information

e-pub ahead of print date: 12 November 2022

Published date: 16 November 2022

Venue - Dates: PRIMA 2022: Principles and Practice of Multi-Agent Systems, Valencia, Valencia, Spain, 2022-11-16 - 2022-11-18

Keywords: Human-in-the-loop Reinforcement learning, Interactive Reinforcement Learning, Multiple People Decision, Interactive reinforcement learning, Human-in-the-loop reinforcement learning, Multiple people decision

Learn more about Agents, Interactions and Complexity research Learn more about School of Electronics and Computer Science research