Tran-Thanh, Long (2010) Multi–Armed Bandit Models for Efficient Long–Term Information Collection in Wireless Sensor Networks (In Press)
Abstract
We are entering a new age in the evolution of computer systems, in which pervasive computing technologies seamlessly interact with human users. These technologies serve people in their everyday lives at home and work by functioning invisibly in the background, creating a smart environment around them. For example, this could be an intelligent building or a smart traffic control system. Now, since such smart environments need information about their surroundings to function effectively, they rely first and foremost on sensory data from the real world. More accurately, this data is typically provided by wireless sensor networks, which are networks of small, autonomous sensor devices. The advantages of wireless sensor networks, such as flexibility, low cost and ease of deployment, have ensured they have gained significant attention from both researchers and manufacturers. However, due to the limited resource constraints of such sensors (e.g. hardware limitations, low computational capacity, or limited energy budget), there are still a number of significant and specific research challenges to be addressed in this domain. To overcome these challenges, we believe an efficient solution for long–term information collection in wireless sensor network should be able to fulfill the following requirements: (i) adaptivity to environmental changes; (ii) robustness and flexibility; (iii) computational feasibility; and (iv) limited use of communication. In more detail, wireless sensor networks are typically deployed in dynamic environments, we must take environmental changes into account, and thus, it must be able to adapt to those changes. Furthermore, since future changes of the environment are typically unknown a priori, we cannot accurately predict these changes. Thus, in order to efficiently adapt to the environment, a good solution must be on–line, so that it can quickly react to environmental changes. Besides, we must be aware of topological and physical changes (e.g. node or communication failures) as well. Finally, due to the limited resources of the sensors, communication and computational cost should not be significant, compared to the size of the network. Previous work of information collection in wireless sensor networks has typically focused on optimising data sampling, routing, information valuation and energy management in order to achieve efficient information collection. However, it usually fail to provide all of the aforementioned requirements. Specifically, existing solutions are typically not designed for long–term operation, since they cannot adapt to environmental changes. That is, they do not have the ability of modifying their behaviour so that they could efficiently adapt to the new characteristics of the environment. Other algorithms follow the concept of centralised control mechanism (i.e. a central unit is responsible for all the calculations and decision making). These solutions, however, are not robust and flexible, since the central unit may represent a computational bottleneck. Against this background, this transfer report focuses on the challenge of developing decentralised adaptive on–line algorithms for efficient long–term information collection in the wireless sensor network domain. In particular, we focus on developing energy management and information–centric data routing policies that adapt their behaviour according to the energy that is harvested, in order to achieve efficient performance. In so doing, we introduce two new energy management techniques, based on multi–armed bandit learning, that allow each sensor to adaptively allocate its energy budget across the tasks of data sampling, receiving and transmitting. These approaches are devised in order to deal with the following different situations: (i) when the sensors can harvest energy from the environment; and (ii) when energy harvesting from the environment is not possible. By using this approaches, each sensor can learn the optimal energy budget settings that gives it efficient information collection in the long run. In addition, we propose a novel decentralised algorithm for information–centric routing. In more detail, we first tackle the energy management problem with energy–harvesting sensors from the multi–armed bandit perspective. That is, we reduce the energy management problem to a non–stochastic multi–armed bandit model. Then through extensive simulations, we demonstrate that the performance of this approach outperforms other state–of–the–art non–learning algorithms. For the case of energy management with non–harvesting sensors, we show that existing multi–armed bandit models are not suitable for modelling this problem. Given this, we introduce a new bandit model, the budgeted multi–armed bandit with pulling cost, in order to efficiently tackle the energy management problem. Following this, we propose an epsilon–first approach for this new bandit problem, in which the first epsilon portion of the total budget is allocated to exploration (i.e. learning which actions are the most efficient). Finally, for the routing, we introduce an information–centric routing problem, the maximal information throughput routing problem. Existing routing algorithms, however, are not suitable to solve this problem. Thus, we devise a simple, but proveably optimal decentralised algorithm, that maximises the information throughput in the network.
More information
Identifiers
Catalogue record
Export record
Contributors
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.