Social Reward Shaping in the Prisoner's Dilemma
Social Reward Shaping in the Prisoner's Dilemma
Reward shaping is a well-known technique applied to help reinforcement-learning agents converge more quickly to near-optimal behavior. In this paper, we introduce social reward shaping, which is reward shaping applied in the multiagent-learning framework. We present preliminary experiments in the iterated Prisoner's dilemma setting that show that agents using social reward shaping appropriately can behave more effectively than other classical learning and non-learning strategies. In particular, we show that these agents can both lead - encourage adaptive opponents to stably cooperate - and follow - adopt a best-response strategy when paired with a fixed opponent - where better known approaches achieve only one of these objectives.
Reinforcement Learning, leader/follower strategies, iterated prisoner's dilemma, game theory, subgame perfect equilibrium
Babes, Monica
0d3f1209-d47d-4cc5-89d2-19055c3bf0b0
Munoz de Cote, Enrique
0b38ed33-005a-44e5-aa5d-cae0474039ae
Littman, Michael L.
22c4a5b6-d6c6-4f81-abb9-342a28c6009f
2008
Babes, Monica
0d3f1209-d47d-4cc5-89d2-19055c3bf0b0
Munoz de Cote, Enrique
0b38ed33-005a-44e5-aa5d-cae0474039ae
Littman, Michael L.
22c4a5b6-d6c6-4f81-abb9-342a28c6009f
Babes, Monica, Munoz de Cote, Enrique and Littman, Michael L.
(2008)
Social Reward Shaping in the Prisoner's Dilemma.
Proceedings of the International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS).
Record type:
Conference or Workshop Item
(Poster)
Abstract
Reward shaping is a well-known technique applied to help reinforcement-learning agents converge more quickly to near-optimal behavior. In this paper, we introduce social reward shaping, which is reward shaping applied in the multiagent-learning framework. We present preliminary experiments in the iterated Prisoner's dilemma setting that show that agents using social reward shaping appropriately can behave more effectively than other classical learning and non-learning strategies. In particular, we show that these agents can both lead - encourage adaptive opponents to stably cooperate - and follow - adopt a best-response strategy when paired with a fixed opponent - where better known approaches achieve only one of these objectives.
Text
BML08AAMAS.pdf
- Version of Record
More information
Published date: 2008
Venue - Dates:
Proceedings of the International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS), 2008-01-01
Keywords:
Reinforcement Learning, leader/follower strategies, iterated prisoner's dilemma, game theory, subgame perfect equilibrium
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 266919
URI: http://eprints.soton.ac.uk/id/eprint/266919
PURE UUID: 5b83322b-57a2-4794-9444-59a3857930e0
Catalogue record
Date deposited: 17 Nov 2008 14:36
Last modified: 14 Mar 2024 08:39
Export record
Contributors
Author:
Monica Babes
Author:
Enrique Munoz de Cote
Author:
Michael L. Littman
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics