Multi-advisor sequential decision-making without ground truth

Guo, Zhaori (2023) Multi-advisor sequential decision-making without ground truth. University of Southampton, Doctoral Thesis, 121pp.

Record type: Thesis (Doctoral)

Abstract

Decision-making from potentially unreliable advice is an important problem in many settings, such as lending, investment, ensemble machine learning, and crowd-sourcing. In such settings, advice can often be elicited from multiple advisers and aggregated to make a more reliable decision, especially when the decisions have important consequences. In addition, often, similar decisions are made over time using the same set of advisers. Therefore, the reliability or trustworthiness of advisers can be utilized to improve decision accuracy and learned and updated over time. However, this is challenging especially when there is no access to the ground truth, i.e., when there is no information about the true or ideal decision, even after the fact, or this information is only available after a considerable delay (e.g., in the case of a loan default). While there is extensive work in decision-making from multiple advisers, existing work focuses on single-shot static decision-making, and does not account for the sequential nature of decisions. To address this gap, this thesis addresses settings where multiple decisions are made sequentially over time, without access to the ground truth, and where we have no prior information about advisors' trustworthiness. We refer to this as the multi-advisor sequential decision-making problem. To address this problem, first, we propose the Multi-Advisor Binary Sequential Decision-Making method (MABSDM). In this setting, a decision-maker needs to make decisions on a sequence of problems, which includes the essential factors for making decisions. For each problem, a set of advisors provides advice between binary options and the decision-maker needs to aggregate their advice to make a decision. To be specific, MABSDM (1) models the advisors' trustworthiness sequentially without prior information, (2) makes optimal decisions from the advice and trustworthiness of multiple imperfect advisors without ground truth. In addition, our results show that MABSDM has higher decision accuracy than benchmarks using state-of-the-art models including Bayesian aggregation, weighted voting, and Beta distribution trustworthiness model. Moreover, MABSDM outperforms benchmarks in terms of modelling the trustworthiness of advisors in most results. Second, we then apply MABSDM to an interactive reinforcement learning setting whereby proposing a method named Multi-Advisor Interactive Reinforcement Learning system (MAIRL). In more detail, interactive reinforcement learning is an effective way to accelerate agent learning by feedback from human advisors to agents. However, if the human advisor is not always reliable, it often hinders the agent's training. To address this problem, we introduce multiple advisors to turn this problem into a multi-advisor binary sequential decision-making problem. Specifically, in MAIRL, we use MABSDM to aggregate the binary feedback from multiple imperfect advisors into a reliable reward for agent training in a reward-sparse environment. In addition, the review model in MAIRL can correct the unreliable reward from advisors. In particular, our experiments for evaluating feedback forms show that the binary feedback outperforms other feedback forms including ranking feedback, scaling feedback, and state value feedback. Finally, we conduct grid-world experiments to show that the policy trained by the MAIRL with the review model is closer to the optimal policy than that without a review model. Third, we propose a utility maximization method based on MABSDM, namely Multi-Advisor Dynamic Decision-Making (MADDM). In more detail, in practice, making a correct decision often has great rewards while a failed decision has a significant cost, and gathering advice from a set of advisors has a cost. We take into account balancing the value of decisions and the cost associated with querying advisors in the multi-advisor binary sequential decision-making problem. Therefore, the challenge is finding an advisor selection strategy that retrieves reliable advice and maximizes the overall utility, which is the expected return of the decision-making. To address this challenge, MADDM considers selecting advisors by balancing the advisors' costs, advisors' trustworthiness, and the value of the problem and then using MABSDM to make the optimal decision. Moreover, we evaluate our algorithm through several numerical experiments. The results show that our approach outperforms two other methods that combine state-of-the-art models. Finally, we extend MABSDM to a general method, namely Multi-Advisor Sequential Decision-Making (MASDM), which can make decisions among multiple options, not just binary options. In addition, we evaluate MASDM through extensive experiments in simulated environments. Moreover, we apply our method to ensemble machine learning using the experiments by the MNIST database. The results show that MASDM has better decision accuracy and the ability to trustworthiness assessment than the five benchmarks that use state-of-the-art methods, achieving a maximum improvement of 22% in accuracy compared to Bayesian aggregation methods.

Text

Doctoral_Thesis_Zhaori_Guo_PDFA - Version of Record

Available under License University of Southampton Thesis Licence.

Download (9MB)

Text

Final-thesis-submission-Examination-Mr-Zhaori-Guo

Restricted to Repository staff only

Available under License University of Southampton Thesis Licence.