The University of Southampton
University of Southampton Institutional Repository

Multi-advisor sequential decision-making without ground truth

Multi-advisor sequential decision-making without ground truth
Multi-advisor sequential decision-making without ground truth
Decision-making from potentially unreliable advice is an important problem in many settings, such as lending, investment, ensemble machine learning, and crowd-sourcing. In such settings, advice can often be elicited from multiple advisers and aggregated to make a more reliable decision, especially when the decisions have important consequences. In addition, often, similar decisions are made over time using the same set of advisers. Therefore, the reliability or trustworthiness of advisers can be utilized to improve decision accuracy and learned and updated over time. However, this is challenging especially when there is no access to the ground truth, i.e., when there is no information about the true or ideal decision, even after the fact, or this information is only available after a considerable delay (e.g., in the case of a loan default). While there is extensive work in decision-making from multiple advisers, existing work focuses on single-shot static decision-making, and does not account for the sequential nature of decisions. To address this gap, this thesis addresses settings where multiple decisions are made sequentially over time, without access to the ground truth, and where we have no prior information about advisors' trustworthiness. We refer to this as the multi-advisor sequential decision-making problem. To address this problem, first, we propose the Multi-Advisor Binary Sequential Decision-Making method (MABSDM). In this setting, a decision-maker needs to make decisions on a sequence of problems, which includes the essential factors for making decisions. For each problem, a set of advisors provides advice between binary options and the decision-maker needs to aggregate their advice to make a decision. To be specific, MABSDM (1) models the advisors' trustworthiness sequentially without prior information, (2) makes optimal decisions from the advice and trustworthiness of multiple imperfect advisors without ground truth. In addition, our results show that MABSDM has higher decision accuracy than benchmarks using state-of-the-art models including Bayesian aggregation, weighted voting, and Beta distribution trustworthiness model. Moreover, MABSDM outperforms benchmarks in terms of modelling the trustworthiness of advisors in most results. Second, we then apply MABSDM to an interactive reinforcement learning setting whereby proposing a method named Multi-Advisor Interactive Reinforcement Learning system (MAIRL). In more detail, interactive reinforcement learning is an effective way to accelerate agent learning by feedback from human advisors to agents. However, if the human advisor is not always reliable, it often hinders the agent's training. To address this problem, we introduce multiple advisors to turn this problem into a multi-advisor binary sequential decision-making problem. Specifically, in MAIRL, we use MABSDM to aggregate the binary feedback from multiple imperfect advisors into a reliable reward for agent training in a reward-sparse environment. In addition, the review model in MAIRL can correct the unreliable reward from advisors. In particular, our experiments for evaluating feedback forms show that the binary feedback outperforms other feedback forms including ranking feedback, scaling feedback, and state value feedback. Finally, we conduct grid-world experiments to show that the policy trained by the MAIRL with the review model is closer to the optimal policy than that without a review model. Third, we propose a utility maximization method based on MABSDM, namely Multi-Advisor Dynamic Decision-Making (MADDM). In more detail, in practice, making a correct decision often has great rewards while a failed decision has a significant cost, and gathering advice from a set of advisors has a cost. We take into account balancing the value of decisions and the cost associated with querying advisors in the multi-advisor binary sequential decision-making problem. Therefore, the challenge is finding an advisor selection strategy that retrieves reliable advice and maximizes the overall utility, which is the expected return of the decision-making. To address this challenge, MADDM considers selecting advisors by balancing the advisors' costs, advisors' trustworthiness, and the value of the problem and then using MABSDM to make the optimal decision. Moreover, we evaluate our algorithm through several numerical experiments. The results show that our approach outperforms two other methods that combine state-of-the-art models. Finally, we extend MABSDM to a general method, namely Multi-Advisor Sequential Decision-Making (MASDM), which can make decisions among multiple options, not just binary options. In addition, we evaluate MASDM through extensive experiments in simulated environments. Moreover, we apply our method to ensemble machine learning using the experiments by the MNIST database. The results show that MASDM has better decision accuracy and the ability to trustworthiness assessment than the five benchmarks that use state-of-the-art methods, achieving a maximum improvement of 22% in accuracy compared to Bayesian aggregation methods.
University of Southampton
Guo, Zhaori
d339a997-b5bc-46bf-a9cf-bc7726db96f1
Guo, Zhaori
d339a997-b5bc-46bf-a9cf-bc7726db96f1
Norman, Tim
663e522f-807c-4569-9201-dc141c8eb50d
Oren, Nir
02edd738-38be-4706-9757-718244fac5c9

Guo, Zhaori (2023) Multi-advisor sequential decision-making without ground truth. University of Southampton, Doctoral Thesis, 121pp.

Record type: Thesis (Doctoral)

Abstract

Decision-making from potentially unreliable advice is an important problem in many settings, such as lending, investment, ensemble machine learning, and crowd-sourcing. In such settings, advice can often be elicited from multiple advisers and aggregated to make a more reliable decision, especially when the decisions have important consequences. In addition, often, similar decisions are made over time using the same set of advisers. Therefore, the reliability or trustworthiness of advisers can be utilized to improve decision accuracy and learned and updated over time. However, this is challenging especially when there is no access to the ground truth, i.e., when there is no information about the true or ideal decision, even after the fact, or this information is only available after a considerable delay (e.g., in the case of a loan default). While there is extensive work in decision-making from multiple advisers, existing work focuses on single-shot static decision-making, and does not account for the sequential nature of decisions. To address this gap, this thesis addresses settings where multiple decisions are made sequentially over time, without access to the ground truth, and where we have no prior information about advisors' trustworthiness. We refer to this as the multi-advisor sequential decision-making problem. To address this problem, first, we propose the Multi-Advisor Binary Sequential Decision-Making method (MABSDM). In this setting, a decision-maker needs to make decisions on a sequence of problems, which includes the essential factors for making decisions. For each problem, a set of advisors provides advice between binary options and the decision-maker needs to aggregate their advice to make a decision. To be specific, MABSDM (1) models the advisors' trustworthiness sequentially without prior information, (2) makes optimal decisions from the advice and trustworthiness of multiple imperfect advisors without ground truth. In addition, our results show that MABSDM has higher decision accuracy than benchmarks using state-of-the-art models including Bayesian aggregation, weighted voting, and Beta distribution trustworthiness model. Moreover, MABSDM outperforms benchmarks in terms of modelling the trustworthiness of advisors in most results. Second, we then apply MABSDM to an interactive reinforcement learning setting whereby proposing a method named Multi-Advisor Interactive Reinforcement Learning system (MAIRL). In more detail, interactive reinforcement learning is an effective way to accelerate agent learning by feedback from human advisors to agents. However, if the human advisor is not always reliable, it often hinders the agent's training. To address this problem, we introduce multiple advisors to turn this problem into a multi-advisor binary sequential decision-making problem. Specifically, in MAIRL, we use MABSDM to aggregate the binary feedback from multiple imperfect advisors into a reliable reward for agent training in a reward-sparse environment. In addition, the review model in MAIRL can correct the unreliable reward from advisors. In particular, our experiments for evaluating feedback forms show that the binary feedback outperforms other feedback forms including ranking feedback, scaling feedback, and state value feedback. Finally, we conduct grid-world experiments to show that the policy trained by the MAIRL with the review model is closer to the optimal policy than that without a review model. Third, we propose a utility maximization method based on MABSDM, namely Multi-Advisor Dynamic Decision-Making (MADDM). In more detail, in practice, making a correct decision often has great rewards while a failed decision has a significant cost, and gathering advice from a set of advisors has a cost. We take into account balancing the value of decisions and the cost associated with querying advisors in the multi-advisor binary sequential decision-making problem. Therefore, the challenge is finding an advisor selection strategy that retrieves reliable advice and maximizes the overall utility, which is the expected return of the decision-making. To address this challenge, MADDM considers selecting advisors by balancing the advisors' costs, advisors' trustworthiness, and the value of the problem and then using MABSDM to make the optimal decision. Moreover, we evaluate our algorithm through several numerical experiments. The results show that our approach outperforms two other methods that combine state-of-the-art models. Finally, we extend MABSDM to a general method, namely Multi-Advisor Sequential Decision-Making (MASDM), which can make decisions among multiple options, not just binary options. In addition, we evaluate MASDM through extensive experiments in simulated environments. Moreover, we apply our method to ensemble machine learning using the experiments by the MNIST database. The results show that MASDM has better decision accuracy and the ability to trustworthiness assessment than the five benchmarks that use state-of-the-art methods, achieving a maximum improvement of 22% in accuracy compared to Bayesian aggregation methods.

Text
Doctoral_Thesis_Zhaori_Guo_PDFA - Version of Record
Available under License University of Southampton Thesis Licence.
Download (9MB)
Text
Final-thesis-submission-Examination-Mr-Zhaori-Guo
Restricted to Repository staff only
Available under License University of Southampton Thesis Licence.

More information

Published date: 9 November 2023

Identifiers

Local EPrints ID: 484248
URI: http://eprints.soton.ac.uk/id/eprint/484248
PURE UUID: ca2d0fa3-307e-4d65-ad73-a1d06d07e2e4
ORCID for Zhaori Guo: ORCID iD orcid.org/0000-0002-1957-7059
ORCID for Tim Norman: ORCID iD orcid.org/0000-0002-6387-4034

Catalogue record

Date deposited: 13 Nov 2023 18:49
Last modified: 18 Mar 2024 03:59

Export record

Contributors

Author: Zhaori Guo ORCID iD
Thesis advisor: Tim Norman ORCID iD
Thesis advisor: Nir Oren

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×