Rational communication for the coordination of multi-Agent systems
Rational communication for the coordination of multi-Agent systems
Increasingly, complex real-world problems (including distributed sensing, air-traffic control, disaster response, network routing, space exploration and unmanned aerial vehicles) are being tackled by teams of software agents, rather than the more traditional centralised systems. Whilst this approach has many benefits in terms of creating robust solutions, it creates a new challenge — how to flexibly coordinate the actions of the agent teams to solve the problem efficiently. In more detail, coordination is here viewed as the problem of managing the interactions of these autonomous entities so that they do not disrupt each other, can take proactive actions to help each other, and take multiple actions at the same time when this is required to solve the problem.
In this context, communication underpins most solutions to the coordination problem. That is, if the agents communicate their state and intentions to each other then they can coordinate their actions. Unfortunately, however, in many real-world problems, communication is a scarce resource. Specifically, communication has limited bandwidth, is not always available and may be expensive to utilise. In such circumstances, typical coordination mechanisms break down because the agents can no longer accurately model the state of the other agents. Given this, in this thesis, we consider how to coordinate when communication is a restricted resource. Specifically, we argue for a rational approach to communication. Since communication has a cost then, similarly, we should be able to calculate a value of sending any given communication. Once we have these costs and values, we can use standard decision theoretic models to choose whether to send a communication, and in fact, generate a plan which utilises communications and other actions efficiently.
In this research we explore ways to value communications in several contexts. Within the framework of decentralised Partially Observable Markov Decision Process (POMDP) we develop a simple information theoretic valuation function (based on Kullback–Leibler (KL) Divergence). This techniques allows agents to coordinate in large problems such as RoboCupRescue, where teams of ambulances must save as many civilians as possible after an earthquake. We found that, in this task, valuing communications before deciding whether to send them results in a level of performance which is higher than not communicating, and close to a model which utilises a free communication medium to communicate all the time. Furthermore, this model is robust to increasing communication restrictions, whereas simple communication policies are not.
We then extend this framework to value communications based on a technique from the field of Machine Learning, namely Reward Shaping, which allows the decentralised POMDP to be transformed into individual agent POMDPs that can be solved more easily. This approach can use a heuristic transformation to allow the approach to work in large problems like RobocupRescue or the Multi–Agent Tiger problem, where it outperforms the current state of the art. Further to this, this approach can also use an exact reward shaping function in order to generate a bounded approximation of the intractable optimal decentralised solution in slightly smaller problems.
Finally, we show how, if we restrict our attention to relatively static (i.e. the problem does not change without an agent doing something) problems than those which the reward shaping technique was designed for, we can generate an optimal solution to decentralised control based on communication valuations. In more detail, we extend the class of Bayesian coordination games to include explicit observation and communication actions. By so doing, the value of observation and exchange can be derived using the concept of opportunity costs. This is a natural way of measuring the relationship between communication and information gathering on an agent’s utility, and removes the need to introduce arbitrary penalties for communicating (which is what most existing approaches do). Furthermore, this approach allows us to show that the optimal communication policy is a Nash equilibrium, and to exploit the fact that there exist many efficient algorithms for finding such equilibria in a local fashion. Specifically, we provide a complete analysis of this model for two–state problems, and illustrate how the analysis can be carried out for larger domains making use of explicit information gathering strategies. Finally, we develop a procedure for finding the optimal communication and search policy as a function of the partial observability of the state and payoffs of the underlying game (which we demonstrate in the canonical Multi–Agent Tiger problem).
In performing all of this work, we demonstrate how communication can be managed locally by accurately placing a value on the cost and benefit of using a restricted communication resource. This allows agents to coordinate efficiently in many interesting problem domains, where existing approaches perform badly. We contribute to the field of rational communication by providing several algorithms for utilising costly communication under different domain conditions. Our reward shaping approaches are highly scalable in problems with large state spaces and come with sound theoretical guarantees on the optimality of the solution they find.
Williamson, Simon Andrew
6cfc279c-2b9a-4c88-94a3-8f93ddf70084
December 2009
Williamson, Simon Andrew
6cfc279c-2b9a-4c88-94a3-8f93ddf70084
Jennings, Nicholas
ab3d94cc-247c-4545-9d1e-65873d6cdb30
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
Williamson, Simon Andrew
(2009)
Rational communication for the coordination of multi-Agent systems.
University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 155pp.
Record type:
Thesis
(Doctoral)
Abstract
Increasingly, complex real-world problems (including distributed sensing, air-traffic control, disaster response, network routing, space exploration and unmanned aerial vehicles) are being tackled by teams of software agents, rather than the more traditional centralised systems. Whilst this approach has many benefits in terms of creating robust solutions, it creates a new challenge — how to flexibly coordinate the actions of the agent teams to solve the problem efficiently. In more detail, coordination is here viewed as the problem of managing the interactions of these autonomous entities so that they do not disrupt each other, can take proactive actions to help each other, and take multiple actions at the same time when this is required to solve the problem.
In this context, communication underpins most solutions to the coordination problem. That is, if the agents communicate their state and intentions to each other then they can coordinate their actions. Unfortunately, however, in many real-world problems, communication is a scarce resource. Specifically, communication has limited bandwidth, is not always available and may be expensive to utilise. In such circumstances, typical coordination mechanisms break down because the agents can no longer accurately model the state of the other agents. Given this, in this thesis, we consider how to coordinate when communication is a restricted resource. Specifically, we argue for a rational approach to communication. Since communication has a cost then, similarly, we should be able to calculate a value of sending any given communication. Once we have these costs and values, we can use standard decision theoretic models to choose whether to send a communication, and in fact, generate a plan which utilises communications and other actions efficiently.
In this research we explore ways to value communications in several contexts. Within the framework of decentralised Partially Observable Markov Decision Process (POMDP) we develop a simple information theoretic valuation function (based on Kullback–Leibler (KL) Divergence). This techniques allows agents to coordinate in large problems such as RoboCupRescue, where teams of ambulances must save as many civilians as possible after an earthquake. We found that, in this task, valuing communications before deciding whether to send them results in a level of performance which is higher than not communicating, and close to a model which utilises a free communication medium to communicate all the time. Furthermore, this model is robust to increasing communication restrictions, whereas simple communication policies are not.
We then extend this framework to value communications based on a technique from the field of Machine Learning, namely Reward Shaping, which allows the decentralised POMDP to be transformed into individual agent POMDPs that can be solved more easily. This approach can use a heuristic transformation to allow the approach to work in large problems like RobocupRescue or the Multi–Agent Tiger problem, where it outperforms the current state of the art. Further to this, this approach can also use an exact reward shaping function in order to generate a bounded approximation of the intractable optimal decentralised solution in slightly smaller problems.
Finally, we show how, if we restrict our attention to relatively static (i.e. the problem does not change without an agent doing something) problems than those which the reward shaping technique was designed for, we can generate an optimal solution to decentralised control based on communication valuations. In more detail, we extend the class of Bayesian coordination games to include explicit observation and communication actions. By so doing, the value of observation and exchange can be derived using the concept of opportunity costs. This is a natural way of measuring the relationship between communication and information gathering on an agent’s utility, and removes the need to introduce arbitrary penalties for communicating (which is what most existing approaches do). Furthermore, this approach allows us to show that the optimal communication policy is a Nash equilibrium, and to exploit the fact that there exist many efficient algorithms for finding such equilibria in a local fashion. Specifically, we provide a complete analysis of this model for two–state problems, and illustrate how the analysis can be carried out for larger domains making use of explicit information gathering strategies. Finally, we develop a procedure for finding the optimal communication and search policy as a function of the partial observability of the state and payoffs of the underlying game (which we demonstrate in the canonical Multi–Agent Tiger problem).
In performing all of this work, we demonstrate how communication can be managed locally by accurately placing a value on the cost and benefit of using a restricted communication resource. This allows agents to coordinate efficiently in many interesting problem domains, where existing approaches perform badly. We contribute to the field of rational communication by providing several algorithms for utilising costly communication under different domain conditions. Our reward shaping approaches are highly scalable in problems with large state spaces and come with sound theoretical guarantees on the optimality of the solution they find.
More information
Published date: December 2009
Organisations:
University of Southampton
Identifiers
Local EPrints ID: 72138
URI: http://eprints.soton.ac.uk/id/eprint/72138
PURE UUID: 436972a9-7c8b-465a-aae0-b3084a94bc3d
Catalogue record
Date deposited: 27 Jan 2010
Last modified: 14 Mar 2024 02:50
Export record
Contributors
Author:
Simon Andrew Williamson
Thesis advisor:
Nicholas Jennings
Thesis advisor:
Enrico Gerding
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics