Explaining the future context of deep reinforcement learning agents’ decision-making

Deep reinforcement learning has achieved superhuman performance in numerous environments. Despite these advances, there are limited tools to understand why agents make decisions. A central issue is how specific actions enable agents to collect rewards or achieve goals far in the future. Understanding this future context for an agent's decision-making is critical to explaining their choices. To date, however, little research has explored such temporal explanations. Therefore, we investigate how to explain the future context of agents’ decision-making for both pretrained agents, using a memory of past behaviour, and architecturally modified agents, explicitly outputting their next $N$ expected rewards. We evaluate these explanations with user surveys, finding them preferred and more effective to baseline algorithms in Atari environments.

We develop three novel video-based explanations for pretrained agents. Two of these require no domain knowledge, a common feature of prior work, while the third incorporates limited domain knowledge. These approaches are the first local explanations that use a memory of how an agent acted in the past to explain their current decision-making. We collect similar decisions from past states or skills, showcasing them to users to help visualise an action’s possible future outcomes.

We identify that deep reinforcement learning agents implicitly compute their beliefs about the future when predicting their rewards (i.e., Q-value or State-value). From this, we prove that an agent's Q-value can be transformed into computing the expected reward for each future timestep. This opens up the opportunity to explain an agent's confidence and decision-making for individual future timesteps. This innovation allows us to propose a novel training algorithm referred to as Temporal Reward Decomposition, where agents output their expected rewards for the next N timesteps. From this, we pioneer three novel explanations for users with a strong understanding of reinforcement learning. For non-technical users, we propose a fourth explanation using Large Language Models to summarise the future rewards in natural language.

We conduct two user surveys to evaluate our temporal explanations against two baseline algorithms. In the second, we propose a novel evaluation methodology inspired by debugging, where users must identify an unknown agent's goal from an explanation of its decision-making. We find that in both user surveys, our temporal explanations were preferred and, in the second, were significantly more effective for determining an agent's goal.

Explainable Reinforcement Learning

University of Southampton

Towers, Mark

18e6acc7-29c4-4d0c-9058-32d180ad4f12

2025

Towers, Mark

18e6acc7-29c4-4d0c-9058-32d180ad4f12

Norman, Tim

663e522f-807c-4569-9201-dc141c8eb50d

Du, Yali

0b0d4eef-0820-4753-b384-72db5058df32

Freeman, Chris

ccdd1272-cdc7-43fb-a1bb-b1ef0bdf5815

Towers, Mark (2025) Explaining the future context of deep reinforcement learning agents’ decision-making. University of Southampton, Doctoral Thesis, 180pp.

Record type: Thesis (Doctoral)

Abstract

Text

archival phd_thesis - Version of Record

Available under License University of Southampton Thesis Licence.

Download (9MB)

Text

Final-thesis-submission-Examination-Mr-Mark-Towers

Restricted to Repository staff only

More information

Published date: 2025

Related URLs:

Keywords: Explainable Reinforcement Learning

Learn more about School of Electronics and Computer Science research

Identifiers

Local EPrints ID: 502074

URI: http://eprints.soton.ac.uk/id/eprint/502074

PURE UUID: d0f52859-23c5-4527-a42b-de0d9fc5b223

ORCID for Mark Towers:

orcid.org/0000-0002-2609-2041

ORCID for Tim Norman:

orcid.org/0000-0002-6387-4034

ORCID for Chris Freeman:

orcid.org/0000-0003-0305-9246

Catalogue record

Date deposited: 16 Jun 2025 16:38

Last modified: 11 Sep 2025 03:18

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Mark Towers

Thesis advisor: Tim Norman

Thesis advisor: Yali Du

Thesis advisor: Chris Freeman

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information