The University of Southampton
University of Southampton Institutional Repository

ME: modelling ethical values for value alignment

ME: modelling ethical values for value alignment
ME: modelling ethical values for value alignment
Value alignment, at the intersection of moral philosophy and AI safety, is dedicated to ensuring that artificially intelligent (AI) systems align with a certain set of values. One challenge facing value alignment researchers is accurately translating these values into a machine readable format. In the case of reinforcement learning (RL), a popular method within value alignment, this requires designing a reward function which accurately defines the value of all state-action pairs. It is common for programmers to hand-set and manually tune these values. In this paper, we examine the challenges of hand-programming values into reward functions for value alignment, and propose mathematical models as an alternative grounding for reward function design in ethical scenarios. Experimental results demonstrate that our modelled-ethics approach offers a more consistent alternative and outperforms our hand-programmed reward functions.
27608-27616
AAAI Press
Rigley, Eryn
713d79b1-a53a-44c4-a52a-1b5b46827f68
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
McNeill, Will
be33c4df-0f0e-42bf-8b9b-3c0afe8cb69e
Rigley, Eryn
713d79b1-a53a-44c4-a52a-1b5b46827f68
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
McNeill, Will
be33c4df-0f0e-42bf-8b9b-3c0afe8cb69e

Rigley, Eryn, Chapman, Adriane, Evers, Christine and McNeill, Will (2025) ME: modelling ethical values for value alignment. In Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, AAAI Press. pp. 27608-27616 . (doi:10.1609/aaai.v39i26.34974).

Record type: Conference or Workshop Item (Paper)

Abstract

Value alignment, at the intersection of moral philosophy and AI safety, is dedicated to ensuring that artificially intelligent (AI) systems align with a certain set of values. One challenge facing value alignment researchers is accurately translating these values into a machine readable format. In the case of reinforcement learning (RL), a popular method within value alignment, this requires designing a reward function which accurately defines the value of all state-action pairs. It is common for programmers to hand-set and manually tune these values. In this paper, we examine the challenges of hand-programming values into reward functions for value alignment, and propose mathematical models as an alternative grounding for reward function design in ethical scenarios. Experimental results demonstrate that our modelled-ethics approach offers a more consistent alternative and outperforms our hand-programmed reward functions.

Text
Rigley_Submission96-2 - Accepted Manuscript
Available under License Other.
Download (493kB)

More information

Accepted/In Press date: 14 December 2024
Published date: 11 April 2025

Identifiers

Local EPrints ID: 501673
URI: http://eprints.soton.ac.uk/id/eprint/501673
PURE UUID: 80b52fae-c8fa-4ab3-af8d-981b5a3a73fe
ORCID for Eryn Rigley: ORCID iD orcid.org/0000-0003-2475-6307
ORCID for Adriane Chapman: ORCID iD orcid.org/0000-0002-3814-2587
ORCID for Christine Evers: ORCID iD orcid.org/0000-0003-0757-5504
ORCID for Will McNeill: ORCID iD orcid.org/0000-0002-3647-0720

Catalogue record

Date deposited: 05 Jun 2025 16:51
Last modified: 03 Sep 2025 02:03

Export record

Altmetrics

Contributors

Author: Eryn Rigley ORCID iD
Author: Adriane Chapman ORCID iD
Author: Christine Evers ORCID iD
Author: Will McNeill ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×