Explaining an agent’s future beliefs through temporally decomposing future reward estimators

Future reward estimation is a core component of reinforcement learning agents; i.e., Q-value and state-value functions, predicting an agent’s sum of future rewards. Their scalar output, however, obfuscates when or what individual future rewards an agent may expect to receive. We address this by modifying an agent’s future reward estimator to predict their next N expected rewards, referred to as Temporal Reward Decomposition (TRD). This unlocks novel explanations of agent behaviour. Through TRD we can: estimate when an agent may expect to receive a reward, the value of the reward and the agent’s confidence in receiving it; measure an input feature’s temporal importance to the agent’s action decisions; and predict the influence of different actions on future rewards. Furthermore, we show that DQN agents trained on Atari environments can be efficiently retrained to incorporate TRD with minimal impact on performance.

cs.AI, cs.LG

10.3233/FAIA240814

2790-2797

IOS Press

Towers, Mark

18e6acc7-29c4-4d0c-9058-32d180ad4f12

Du, Yali

0b0d4eef-0820-4753-b384-72db5058df32

Freeman, Chris

ccdd1272-cdc7-43fb-a1bb-b1ef0bdf5815

Norman, Tim

663e522f-807c-4569-9201-dc141c8eb50d

Endriss, Ulle

Melo, Francisco S.

Bach, Kerstin

Bugarín-Diz, Alberto

Alonso-Moral, José M.

Barro, Senén

Heintz, Fredrik

19 October 2024

Towers, Mark

18e6acc7-29c4-4d0c-9058-32d180ad4f12

Du, Yali

0b0d4eef-0820-4753-b384-72db5058df32

Freeman, Chris

ccdd1272-cdc7-43fb-a1bb-b1ef0bdf5815

Norman, Tim

663e522f-807c-4569-9201-dc141c8eb50d

Endriss, Ulle

Melo, Francisco S.

Bach, Kerstin

Bugarín-Diz, Alberto

Alonso-Moral, José M.

Barro, Senén

Heintz, Fredrik

Towers, Mark, Du, Yali, Freeman, Chris and Norman, Tim (2024) Explaining an agent’s future beliefs through temporally decomposing future reward estimators. Endriss, Ulle, Melo, Francisco S., Bach, Kerstin, Bugarín-Diz, Alberto, Alonso-Moral, José M., Barro, Senén and Heintz, Fredrik (eds.) In ECAI 2024 : 27th European Conference on Artificial Intelligence, 19–24 October 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024. vol. 392, IOS Press. pp. 2790-2797 . (doi:10.3233/FAIA240814).

Record type: Conference or Workshop Item (Paper)

Abstract

Text

2408.08230v1 - Author's Original

Available under License Other.

Download (816kB)

Text

FAIA-392-FAIA240814 - Version of Record

Available under License Creative Commons Attribution Non-commercial.

Download (1MB)

More information

Published date: 19 October 2024

Additional Information: 7 pages + 3 pages of supplementary material. Published at ECAI 2024

Keywords: cs.AI, cs.LG

Learn more about School of Electronics and Computer Science research

Identifiers

Local EPrints ID: 495494

URI: http://eprints.soton.ac.uk/id/eprint/495494

DOI: doi:10.3233/FAIA240814

PURE UUID: 8927bdae-03c7-4397-9ecf-123ff7967011

ORCID for Mark Towers:

orcid.org/0000-0002-2609-2041

ORCID for Chris Freeman:

orcid.org/0000-0003-0305-9246

ORCID for Tim Norman:

orcid.org/0000-0002-6387-4034

Catalogue record

Date deposited: 14 Nov 2024 18:06

Last modified: 11 Dec 2024 02:39

Export record

Altmetrics

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Mark Towers

Author: Yali Du

Author: Chris Freeman

Author: Tim Norman

Editor: Ulle Endriss

Editor: Francisco S. Melo

Editor: Kerstin Bach

Editor: Alberto Bugarín-Diz

Editor: José M. Alonso-Moral

Editor: Senén Barro

Editor: Fredrik Heintz

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information