The University of Southampton
University of Southampton Institutional Repository

Explaining an agent’s future beliefs through temporally decomposing future reward estimators

Explaining an agent’s future beliefs through temporally decomposing future reward estimators
Explaining an agent’s future beliefs through temporally decomposing future reward estimators
Future reward estimation is a core component of reinforcement learning agents; i.e., Q-value and state-value functions, predicting an agent’s sum of future rewards. Their scalar output, however, obfuscates when or what individual future rewards an agent may expect to receive. We address this by modifying an agent’s future reward estimator to predict their next N expected rewards, referred to as Temporal Reward Decomposition (TRD). This unlocks novel explanations of agent behaviour. Through TRD we can: estimate when an agent may expect to receive a reward, the value of the reward and the agent’s confidence in receiving it; measure an input feature’s temporal importance to the agent’s action decisions; and predict the influence of different actions on future rewards. Furthermore, we show that DQN agents trained on Atari environments can be efficiently retrained to incorporate TRD with minimal impact on performance.
cs.AI, cs.LG
2790-2797
IOS Press
Towers, Mark
18e6acc7-29c4-4d0c-9058-32d180ad4f12
Du, Yali
0b0d4eef-0820-4753-b384-72db5058df32
Freeman, Chris
ccdd1272-cdc7-43fb-a1bb-b1ef0bdf5815
Norman, Tim
663e522f-807c-4569-9201-dc141c8eb50d
Endriss, Ulle
Melo, Francisco S.
Bach, Kerstin
Bugarín-Diz, Alberto
Alonso-Moral, José M.
Barro, Senén
Heintz, Fredrik
Towers, Mark
18e6acc7-29c4-4d0c-9058-32d180ad4f12
Du, Yali
0b0d4eef-0820-4753-b384-72db5058df32
Freeman, Chris
ccdd1272-cdc7-43fb-a1bb-b1ef0bdf5815
Norman, Tim
663e522f-807c-4569-9201-dc141c8eb50d
Endriss, Ulle
Melo, Francisco S.
Bach, Kerstin
Bugarín-Diz, Alberto
Alonso-Moral, José M.
Barro, Senén
Heintz, Fredrik

Towers, Mark, Du, Yali, Freeman, Chris and Norman, Tim (2024) Explaining an agent’s future beliefs through temporally decomposing future reward estimators. Endriss, Ulle, Melo, Francisco S., Bach, Kerstin, Bugarín-Diz, Alberto, Alonso-Moral, José M., Barro, Senén and Heintz, Fredrik (eds.) In ECAI 2024 : 27th European Conference on Artificial Intelligence, 19–24 October 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024. vol. 392, IOS Press. pp. 2790-2797 . (doi:10.3233/FAIA240814).

Record type: Conference or Workshop Item (Paper)

Abstract

Future reward estimation is a core component of reinforcement learning agents; i.e., Q-value and state-value functions, predicting an agent’s sum of future rewards. Their scalar output, however, obfuscates when or what individual future rewards an agent may expect to receive. We address this by modifying an agent’s future reward estimator to predict their next N expected rewards, referred to as Temporal Reward Decomposition (TRD). This unlocks novel explanations of agent behaviour. Through TRD we can: estimate when an agent may expect to receive a reward, the value of the reward and the agent’s confidence in receiving it; measure an input feature’s temporal importance to the agent’s action decisions; and predict the influence of different actions on future rewards. Furthermore, we show that DQN agents trained on Atari environments can be efficiently retrained to incorporate TRD with minimal impact on performance.

Text
2408.08230v1 - Author's Original
Available under License Other.
Download (816kB)
Text
FAIA-392-FAIA240814 - Version of Record
Download (1MB)

More information

Published date: 19 October 2024
Additional Information: 7 pages + 3 pages of supplementary material. Published at ECAI 2024
Keywords: cs.AI, cs.LG

Identifiers

Local EPrints ID: 495494
URI: http://eprints.soton.ac.uk/id/eprint/495494
PURE UUID: 8927bdae-03c7-4397-9ecf-123ff7967011
ORCID for Mark Towers: ORCID iD orcid.org/0000-0002-2609-2041
ORCID for Chris Freeman: ORCID iD orcid.org/0000-0003-0305-9246
ORCID for Tim Norman: ORCID iD orcid.org/0000-0002-6387-4034

Catalogue record

Date deposited: 14 Nov 2024 18:06
Last modified: 11 Dec 2024 02:39

Export record

Altmetrics

Contributors

Author: Mark Towers ORCID iD
Author: Yali Du
Author: Chris Freeman ORCID iD
Author: Tim Norman ORCID iD
Editor: Ulle Endriss
Editor: Francisco S. Melo
Editor: Kerstin Bach
Editor: Alberto Bugarín-Diz
Editor: José M. Alonso-Moral
Editor: Senén Barro
Editor: Fredrik Heintz

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×