Williamson, Simon Andrew
Rational communication for the coordination of multi-Agent systems.
University of Southampton, School of Electronics and Computer Science,
Increasingly, complex real-world problems (including distributed sensing, air-traffic control, disaster response, network routing, space exploration and unmanned aerial vehicles) are being tackled by teams of software agents, rather than the more traditional centralised systems. Whilst this approach has many benefits in terms of creating robust solutions, it creates a new challenge — how to flexibly coordinate the actions of the agent teams to solve the problem efficiently. In more detail, coordination is here viewed as the problem of managing the interactions of these autonomous entities so that they do not disrupt each other, can take proactive actions to help each other, and take multiple actions at the same time when this is required to solve the problem.
In this context, communication underpins most solutions to the coordination problem. That is, if the agents communicate their state and intentions to each other then they can coordinate their actions. Unfortunately, however, in many real-world problems, communication is a scarce resource. Specifically, communication has limited bandwidth, is not always available and may be expensive to utilise. In such circumstances, typical coordination mechanisms break down because the agents can no longer accurately model the state of the other agents. Given this, in this thesis, we consider how to coordinate when communication is a restricted resource. Specifically, we argue for a rational approach to communication. Since communication has a cost then, similarly, we should be able to calculate a value of sending any given communication. Once we have these costs and values, we can use standard decision theoretic models to choose whether to send a communication, and in fact, generate a plan which utilises communications and other actions efficiently.
In this research we explore ways to value communications in several contexts. Within the framework of decentralised Partially Observable Markov Decision Process (POMDP) we develop a simple information theoretic valuation function (based on Kullback–Leibler (KL) Divergence). This techniques allows agents to coordinate in large problems such as RoboCupRescue, where teams of ambulances must save as many civilians as possible after an earthquake. We found that, in this task, valuing communications before deciding whether to send them results in a level of performance which is higher than not communicating, and close to a model which utilises a free communication medium to communicate all the time. Furthermore, this model is robust to increasing communication restrictions, whereas simple communication policies are not.
We then extend this framework to value communications based on a technique from the field of Machine Learning, namely Reward Shaping, which allows the decentralised POMDP to be transformed into individual agent POMDPs that can be solved more easily. This approach can use a heuristic transformation to allow the approach to work in large problems like RobocupRescue or the Multi–Agent Tiger problem, where it outperforms the current state of the art. Further to this, this approach can also use an exact reward shaping function in order to generate a bounded approximation of the intractable optimal decentralised solution in slightly smaller problems.
Finally, we show how, if we restrict our attention to relatively static (i.e. the problem does not change without an agent doing something) problems than those which the reward shaping technique was designed for, we can generate an optimal solution to decentralised control based on communication valuations. In more detail, we extend the class of Bayesian coordination games to include explicit observation and communication actions. By so doing, the value of observation and exchange can be derived using the concept of opportunity costs. This is a natural way of measuring the relationship between communication and information gathering on an agent’s utility, and removes the need to introduce arbitrary penalties for communicating (which is what most existing approaches do). Furthermore, this approach allows us to show that the optimal communication policy is a Nash equilibrium, and to exploit the fact that there exist many efficient algorithms for finding such equilibria in a local fashion. Specifically, we provide a complete analysis of this model for two–state problems, and illustrate how the analysis can be carried out for larger domains making use of explicit information gathering strategies. Finally, we develop a procedure for finding the optimal communication and search policy as a function of the partial observability of the state and payoffs of the underlying game (which we demonstrate in the canonical Multi–Agent Tiger problem).
In performing all of this work, we demonstrate how communication can be managed locally by accurately placing a value on the cost and benefit of using a restricted communication resource. This allows agents to coordinate efficiently in many interesting problem domains, where existing approaches perform badly. We contribute to the field of rational communication by providing several algorithms for utilising costly communication under different domain conditions. Our reward shaping approaches are highly scalable in problems with large state spaces and come with sound theoretical guarantees on the optimality of the solution they find.
Actions (login required)