The University of Southampton
University of Southampton Institutional Repository

MTIRL: Multi-trainer interactive reinforcement learning system

MTIRL: Multi-trainer interactive reinforcement learning system
MTIRL: Multi-trainer interactive reinforcement learning system
Interactive reinforcement learning can effectively facilitate the agent training via human feedback. However, such methods often require the human teacher to know what is the correct action that the agent should take. In other words, if the human teacher is not always reliable, then it will not be consistently able to guide the agent through its training. In this paper, we propose a more effective interactive reinforcement learning system by introducing multiple trainers, namely Multi-Trainer Interactive Reinforcement Learning (MTIRL), which could aggregate the binary feedback from multiple non-perfect trainers into a more reliable reward for an agent training in a reward-sparse environment. In particular, our trainer feedback aggregation experiments show that our aggregation method has the best accuracy when compared with the majority voting, the weighted voting, and the Bayesian method. Finally, we conduct a grid-world experiment to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.
Human-in-the-loop Reinforcement learning, Interactive Reinforcement Learning, Multiple People Decision, Interactive reinforcement learning, Human-in-the-loop reinforcement learning, Multiple people decision
0302-9743
227 - 242
Guo, Zhaori
d339a997-b5bc-46bf-a9cf-bc7726db96f1
Norman, Timothy
663e522f-807c-4569-9201-dc141c8eb50d
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
Aydoğan, Reyhan
Criado, Natalia
Sanchez-Anguix, Victor
Lang, Jérôme
Serramia, Marc
Guo, Zhaori
d339a997-b5bc-46bf-a9cf-bc7726db96f1
Norman, Timothy
663e522f-807c-4569-9201-dc141c8eb50d
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
Aydoğan, Reyhan
Criado, Natalia
Sanchez-Anguix, Victor
Lang, Jérôme
Serramia, Marc

Guo, Zhaori, Norman, Timothy and Gerding, Enrico (2022) MTIRL: Multi-trainer interactive reinforcement learning system. Aydoğan, Reyhan, Criado, Natalia, Sanchez-Anguix, Victor, Lang, Jérôme and Serramia, Marc (eds.) In PRIMA 2022: Principles and Practice of Multi-Agent Systems: Principles and Practice of Multi-Agent Systems - 24th International Conference, Proceedings. vol. 13753 LNAI, 227 - 242 . (doi:10.1007/978-3-031-21203-1_14).

Record type: Conference or Workshop Item (Paper)

Abstract

Interactive reinforcement learning can effectively facilitate the agent training via human feedback. However, such methods often require the human teacher to know what is the correct action that the agent should take. In other words, if the human teacher is not always reliable, then it will not be consistently able to guide the agent through its training. In this paper, we propose a more effective interactive reinforcement learning system by introducing multiple trainers, namely Multi-Trainer Interactive Reinforcement Learning (MTIRL), which could aggregate the binary feedback from multiple non-perfect trainers into a more reliable reward for an agent training in a reward-sparse environment. In particular, our trainer feedback aggregation experiments show that our aggregation method has the best accuracy when compared with the majority voting, the weighted voting, and the Bayesian method. Finally, we conduct a grid-world experiment to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.

Text
MTIRL_PRIMA - Accepted Manuscript
Restricted to Repository staff only
Request a copy

More information

e-pub ahead of print date: 12 November 2022
Published date: 16 November 2022
Additional Information: Publisher Copyright: © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Venue - Dates: PRIMA 2022: Principles and Practice of Multi-Agent Systems, Valencia, Valencia, Spain, 2022-11-16 - 2022-11-18
Keywords: Human-in-the-loop Reinforcement learning, Interactive Reinforcement Learning, Multiple People Decision, Interactive reinforcement learning, Human-in-the-loop reinforcement learning, Multiple people decision

Identifiers

Local EPrints ID: 471577
URI: http://eprints.soton.ac.uk/id/eprint/471577
ISSN: 0302-9743
PURE UUID: 347e7e74-7940-487d-bdc5-ffd70db13235
ORCID for Zhaori Guo: ORCID iD orcid.org/0000-0002-1957-7059
ORCID for Timothy Norman: ORCID iD orcid.org/0000-0002-6387-4034
ORCID for Enrico Gerding: ORCID iD orcid.org/0000-0001-7200-552X

Catalogue record

Date deposited: 14 Nov 2022 17:31
Last modified: 17 Mar 2024 04:04

Export record

Altmetrics

Contributors

Author: Zhaori Guo ORCID iD
Author: Timothy Norman ORCID iD
Author: Enrico Gerding ORCID iD
Editor: Reyhan Aydoğan
Editor: Natalia Criado
Editor: Victor Sanchez-Anguix
Editor: Jérôme Lang
Editor: Marc Serramia

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×