ME: modelling ethical values for value alignment
ME: modelling ethical values for value alignment
Value alignment, at the intersection of moral philosophy and AI safety, is dedicated to ensuring that artificially intelligent (AI) systems align with a certain set of values. One challenge facing value alignment researchers is accurately translating these values into a machine readable format. In the case of reinforcement learning (RL), a popular method within value alignment, this requires designing a reward function which accurately defines the value of all state-action pairs. It is common for programmers to hand-set and manually tune these values. In this paper, we examine the challenges of hand-programming values into reward functions for value alignment, and propose mathematical models as an alternative grounding for reward function design in ethical scenarios. Experimental results demonstrate that our modelled-ethics approach offers a more consistent alternative and outperforms our hand-programmed reward functions.
27608-27616
Rigley, Eryn
713d79b1-a53a-44c4-a52a-1b5b46827f68
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
McNeill, Will
be33c4df-0f0e-42bf-8b9b-3c0afe8cb69e
11 April 2025
Rigley, Eryn
713d79b1-a53a-44c4-a52a-1b5b46827f68
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
McNeill, Will
be33c4df-0f0e-42bf-8b9b-3c0afe8cb69e
Rigley, Eryn, Chapman, Adriane, Evers, Christine and McNeill, Will
(2025)
ME: modelling ethical values for value alignment.
In Proceedings of the AAAI Conference on Artificial Intelligence.
vol. 39,
AAAI Press.
.
(doi:10.1609/aaai.v39i26.34974).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Value alignment, at the intersection of moral philosophy and AI safety, is dedicated to ensuring that artificially intelligent (AI) systems align with a certain set of values. One challenge facing value alignment researchers is accurately translating these values into a machine readable format. In the case of reinforcement learning (RL), a popular method within value alignment, this requires designing a reward function which accurately defines the value of all state-action pairs. It is common for programmers to hand-set and manually tune these values. In this paper, we examine the challenges of hand-programming values into reward functions for value alignment, and propose mathematical models as an alternative grounding for reward function design in ethical scenarios. Experimental results demonstrate that our modelled-ethics approach offers a more consistent alternative and outperforms our hand-programmed reward functions.
Text
Rigley_Submission96-2
- Accepted Manuscript
Available under License Other.
More information
Accepted/In Press date: 14 December 2024
Published date: 11 April 2025
Identifiers
Local EPrints ID: 501673
URI: http://eprints.soton.ac.uk/id/eprint/501673
PURE UUID: 80b52fae-c8fa-4ab3-af8d-981b5a3a73fe
Catalogue record
Date deposited: 05 Jun 2025 16:51
Last modified: 03 Sep 2025 02:03
Export record
Altmetrics
Contributors
Author:
Eryn Rigley
Author:
Christine Evers
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics