Deep reinforcement learning for online combinatorial resource allocation with arbitrary state and action spaces

Gebrekidan, Tesfay Zemuy (2024) Deep reinforcement learning for online combinatorial resource allocation with arbitrary state and action spaces. University of Southampton, Doctoral Thesis, 152pp.

Record type: Thesis (Doctoral)

Abstract

Online combinatorial resource allocation is the process of dynamically assigning limited resources to tasks that arrive arbitrarily. The allocation of resources is done without complete knowledge of future resource demands. Conventional resource allocation algorithms, such as mathematical optimization, are inefficient for online resource allocation problems because relevant information about the problem is not available in advance; they cannot predict and adapt to the dynamic changes of the problem; they have high online computational costs, making them impractical for real-time decisions; and they are inefficient for non-convex optimization problems. Real-time optimization using artificial intelligence (AI) and machine learning (ML) algorithms is state-of-the-art in online resource allocation. The use of AI/ML is one of the key components in the evolution of 5G to 6G. Deep reinforcement learning (DRL) is a subfield of ML that integrates reinforcement learning (RL) and deep learning (DL), both of which are components of AI. Due to its ability to make sequential online decisions in dynamic and uncertain contexts, as well as its ability to learn from experience and lower online computational costs, DRL is a commonly used solution for online resource allocation problems. Compared to other application areas, DRL encounters unique challenges in online combinatorial resource allocation problems. The resource allocation problem often involves elements that are of varying sizes and have no specific order, henceforth referred to as arbitrarily sized and orderless (ASO) elements. In mobile and cloud computing, for example, the problem can include tasks with varying numbers and sizes, as well as varying numbers of user devices (UDs). However, existing DRL algorithms use standard deep neural network (DNN) algorithms as function approximators. The neurons in DNNs are set to accept specific information at the specific index of a given input, whereas the order of the tasks in the input of combinatorial resource allocation problems does not matter. The DNN cannot generalize using the knowledge it learned with a different permutation of the same input. Furthermore, the number of UDs can vary, but the number of neurons in the input of standard DNNs is fixed in size. Additionally, existing DRL algorithms make decisions by selecting a single action sequentially or a fixed number of actions at a time for the ASO input. However, online combinatorial resource allocation needs to select an arbitrary number of actions based on the resource constraint. This sequential action selection leads to increased dimensionality, suboptimal convergence in training, and greater computational complexity. Arbitrary action space, in particular, is understudied in DRL. Furthermore, existing DRL algorithms in resource allocation problems consider homogeneous constraints on either the UDs or the server side, while resource allocation problems usually include various resource constraints. There are continuous-valued resource constraints on UDs, discrete-valued number of channels on the communication network, and combinatorial competition of UDs on the server due to storage constraints. These challenges can generally be summarized as handling an arbitrary state space for an ASO input, an arbitrary action space for an ASO output, and various constraints on the UDs and the server. The objective of this research is to advance DRL algorithms for online combinatorial resource allocation problems so that they can effectively handle arbitrary state and action spaces for the ASO inputs and outputs, as well as to consider heterogeneous resource constraints on the UDs and the server. Consequently, we propose three solutions as follows. 1) A novel DRL algorithm with coalition action selection for online combinatorial resource allocation. The coalition action selection enables DRL to simultaneously select an arbitrary number of actions without updating the state multiple times. By reducing state space and depth of decision, coalition action selection provides better performance, faster convergence, and lower execution complexity compared to conventional sequential action selection approaches, where the state is updated for every action taken. 2) We proposed a novel DRL algorithm with computationally efficient stationary ASO input transformation for online combinatorial resource allocation problems. By using a set of equations to transform the ASO input to a fixed-size vector, the stationary ASO input transformation provides better convergence and lower computational cost than a transformer neural network-based transformation, which is used as the state-of-the-art technique to handle ASO inputs in existing combinatorial optimization problems. The reason for the efficiency is that the transformer is designed to learn contextual relationships between the sequence of words in natural language processing (NLP), but the ASO inputs in the resource allocation are numerical and do not have as significant a relationship as the sequence of words. 3) By applying coalition action selection to the multiagent deep deterministic policy gradient (MADDPG), we propose a combinatorial client-master multiagent DRL (CCM_MADRL) algorithm for task offloading in mobile edge computing (CCM_MADRL_MEC) to handle various resource constraints. The efficiencies of the proposed solutions, compared to state-of-the-art approaches, are assessed using online resource allocation problems with arbitrary arrival of tasks and various resource constraints. The DRL algorithm with coalition action selection is evaluated using an online resource allocation problem with an arbitrary number of tasks. It has outperformed existing sequential action selection approaches in terms of proximity to offline optimal solutions, speed of convergence, and computational costs. The coalition action selection retains close to offline optimal performance in settings with different task arrival rates, whereas the sequential action selection approach drops in performance when the task arrival rate is high. The DRL with coalition action selection is implemented using the encoder of the transformer neural network. We evaluated the stationary ASO input transformation on the same problem. It has outperformed the transformer-based transformation in various sizes of task arrival rates. Furthermore, the stationary ASO input transformation yields a lower computational complexity than the transformer for various task arrival rates. The difference in computational cost between the transformer and the stationary ASO input transformation is greater in the sequential action selection than in the coalition action selection. Lastly, the CCM_MADRL algorithm is evaluated using a task-offloading problem in mobile edge computing with different constraints, such as battery level, task deadline, transmission power, computational resource, server storage capacity, and number of communication channels. By exploiting the different advantages of the policy iteration and value function and the coalition action selection approach, it has demonstrated better convergence than the existing MADDPG and heuristic algorithms.

Text

Tesfay_Final_PhD_Thesis_PDFA - Version of Record

Available under License University of Southampton Thesis Licence.

Download (3MB)

Text

Final-thesis-submission-Examination-Mr-Tesfay-Gebrekidan

Restricted to Repository staff only

Available under License University of Southampton Thesis Licence.