ME: modelling ethical values for value alignment

Rigley, Eryn, Chapman, Adriane, Evers, Christine and McNeill, Will (2025) ME: modelling ethical values for value alignment. In Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, AAAI Press. pp. 27608-27616 . (doi:10.1609/aaai.v39i26.34974).

Record type: Conference or Workshop Item (Paper)

Abstract

Value alignment, at the intersection of moral philosophy and AI safety, is dedicated to ensuring that artificially intelligent (AI) systems align with a certain set of values. One challenge facing value alignment researchers is accurately translating these values into a machine readable format. In the case of reinforcement learning (RL), a popular method within value alignment, this requires designing a reward function which accurately defines the value of all state-action pairs. It is common for programmers to hand-set and manually tune these values. In this paper, we examine the challenges of hand-programming values into reward functions for value alignment, and propose mathematical models as an alternative grounding for reward function design in ethical scenarios. Experimental results demonstrate that our modelled-ethics approach offers a more consistent alternative and outperforms our hand-programmed reward functions.

Text

Rigley_Submission96-2 - Accepted Manuscript

Available under License Other.

Download (493kB)