The University of Southampton
University of Southampton Institutional Repository

Deep reinforcement learning for online combinatorial resource allocation with arbitrary state and action spaces

Deep reinforcement learning for online combinatorial resource allocation with arbitrary state and action spaces
Deep reinforcement learning for online combinatorial resource allocation with arbitrary state and action spaces
Online combinatorial resource allocation is the process of dynamically assigning limited resources to tasks that arrive arbitrarily. The allocation of resources is done without complete knowledge of future resource demands. Conventional resource allocation algorithms, such as mathematical optimization, are inefficient for online resource allocation problems because relevant information about the problem is not available in advance; they cannot predict and adapt to the dynamic changes of the problem; they have high online computational costs, making them impractical for real-time decisions; and they are inefficient for non-convex optimization problems. Real-time optimization using artificial intelligence (AI) and machine learning (ML) algorithms is state-of-the-art in online resource allocation. The use of AI/ML is one of the key components in the evolution of 5G to 6G. Deep reinforcement learning (DRL) is a subfield of ML that integrates reinforcement learning (RL) and deep learning (DL), both of which are components of AI. Due to its ability to make sequential online decisions in dynamic and uncertain contexts, as well as its ability to learn from experience and lower online computational costs, DRL is a commonly used solution for online resource allocation problems. Compared to other application areas, DRL encounters unique challenges in online combinatorial resource allocation problems. The resource allocation problem often involves elements that are of varying sizes and have no specific order, henceforth referred to as arbitrarily sized and orderless (ASO) elements. In mobile and cloud computing, for example, the problem can include tasks with varying numbers and sizes, as well as varying numbers of user devices (UDs). However, existing DRL algorithms use standard deep neural network (DNN) algorithms as function approximators. The neurons in DNNs are set to accept specific information at the specific index of a given input, whereas the order of the tasks in the input of combinatorial resource allocation problems does not matter. The DNN cannot generalize using the knowledge it learned with a different permutation of the same input. Furthermore, the number of UDs can vary, but the number of neurons in the input of standard DNNs is fixed in size. Additionally, existing DRL algorithms make decisions by selecting a single action sequentially or a fixed number of actions at a time for the ASO input. However, online combinatorial resource allocation needs to select an arbitrary number of actions based on the resource constraint. This sequential action selection leads to increased dimensionality, suboptimal convergence in training, and greater computational complexity. Arbitrary action space, in particular, is understudied in DRL. Furthermore, existing DRL algorithms in resource allocation problems consider homogeneous constraints on either the UDs or the server side, while resource allocation problems usually include various resource constraints. There are continuous-valued resource constraints on UDs, discrete-valued number of channels on the communication network, and combinatorial competition of UDs on the server due to storage constraints. These challenges can generally be summarized as handling an arbitrary state space for an ASO input, an arbitrary action space for an ASO output, and various constraints on the UDs and the server. The objective of this research is to advance DRL algorithms for online combinatorial resource allocation problems so that they can effectively handle arbitrary state and action spaces for the ASO inputs and outputs, as well as to consider heterogeneous resource constraints on the UDs and the server. Consequently, we propose three solutions as follows. 1) A novel DRL algorithm with coalition action selection for online combinatorial resource allocation. The coalition action selection enables DRL to simultaneously select an arbitrary number of actions without updating the state multiple times. By reducing state space and depth of decision, coalition action selection provides better performance, faster convergence, and lower execution complexity compared to conventional sequential action selection approaches, where the state is updated for every action taken. 2) We proposed a novel DRL algorithm with computationally efficient stationary ASO input transformation for online combinatorial resource allocation problems. By using a set of equations to transform the ASO input to a fixed-size vector, the stationary ASO input transformation provides better convergence and lower computational cost than a transformer neural network-based transformation, which is used as the state-of-the-art technique to handle ASO inputs in existing combinatorial optimization problems. The reason for the efficiency is that the transformer is designed to learn contextual relationships between the sequence of words in natural language processing (NLP), but the ASO inputs in the resource allocation are numerical and do not have as significant a relationship as the sequence of words. 3) By applying coalition action selection to the multiagent deep deterministic policy gradient (MADDPG), we propose a combinatorial client-master multiagent DRL (CCM_MADRL) algorithm for task offloading in mobile edge computing (CCM_MADRL_MEC) to handle various resource constraints. The efficiencies of the proposed solutions, compared to state-of-the-art approaches, are assessed using online resource allocation problems with arbitrary arrival of tasks and various resource constraints. The DRL algorithm with coalition action selection is evaluated using an online resource allocation problem with an arbitrary number of tasks. It has outperformed existing sequential action selection approaches in terms of proximity to offline optimal solutions, speed of convergence, and computational costs. The coalition action selection retains close to offline optimal performance in settings with different task arrival rates, whereas the sequential action selection approach drops in performance when the task arrival rate is high. The DRL with coalition action selection is implemented using the encoder of the transformer neural network. We evaluated the stationary ASO input transformation on the same problem. It has outperformed the transformer-based transformation in various sizes of task arrival rates. Furthermore, the stationary ASO input transformation yields a lower computational complexity than the transformer for various task arrival rates. The difference in computational cost between the transformer and the stationary ASO input transformation is greater in the sequential action selection than in the coalition action selection. Lastly, the CCM_MADRL algorithm is evaluated using a task-offloading problem in mobile edge computing with different constraints, such as battery level, task deadline, transmission power, computational resource, server storage capacity, and number of communication channels. By exploiting the different advantages of the policy iteration and value function and the coalition action selection approach, it has demonstrated better convergence than the existing MADDPG and heuristic algorithms.
University of Southampton
Gebrekidan, Tesfay Zemuy
289d7a6a-f783-42c4-9a77-e69e0d96d66e
Gebrekidan, Tesfay Zemuy
289d7a6a-f783-42c4-9a77-e69e0d96d66e
Stein, Sebastian
cb2325e7-5e63-475e-8a69-9db2dfbdb00b
Norman, Tim
663e522f-807c-4569-9201-dc141c8eb50d

Gebrekidan, Tesfay Zemuy (2024) Deep reinforcement learning for online combinatorial resource allocation with arbitrary state and action spaces. University of Southampton, Doctoral Thesis, 152pp.

Record type: Thesis (Doctoral)

Abstract

Online combinatorial resource allocation is the process of dynamically assigning limited resources to tasks that arrive arbitrarily. The allocation of resources is done without complete knowledge of future resource demands. Conventional resource allocation algorithms, such as mathematical optimization, are inefficient for online resource allocation problems because relevant information about the problem is not available in advance; they cannot predict and adapt to the dynamic changes of the problem; they have high online computational costs, making them impractical for real-time decisions; and they are inefficient for non-convex optimization problems. Real-time optimization using artificial intelligence (AI) and machine learning (ML) algorithms is state-of-the-art in online resource allocation. The use of AI/ML is one of the key components in the evolution of 5G to 6G. Deep reinforcement learning (DRL) is a subfield of ML that integrates reinforcement learning (RL) and deep learning (DL), both of which are components of AI. Due to its ability to make sequential online decisions in dynamic and uncertain contexts, as well as its ability to learn from experience and lower online computational costs, DRL is a commonly used solution for online resource allocation problems. Compared to other application areas, DRL encounters unique challenges in online combinatorial resource allocation problems. The resource allocation problem often involves elements that are of varying sizes and have no specific order, henceforth referred to as arbitrarily sized and orderless (ASO) elements. In mobile and cloud computing, for example, the problem can include tasks with varying numbers and sizes, as well as varying numbers of user devices (UDs). However, existing DRL algorithms use standard deep neural network (DNN) algorithms as function approximators. The neurons in DNNs are set to accept specific information at the specific index of a given input, whereas the order of the tasks in the input of combinatorial resource allocation problems does not matter. The DNN cannot generalize using the knowledge it learned with a different permutation of the same input. Furthermore, the number of UDs can vary, but the number of neurons in the input of standard DNNs is fixed in size. Additionally, existing DRL algorithms make decisions by selecting a single action sequentially or a fixed number of actions at a time for the ASO input. However, online combinatorial resource allocation needs to select an arbitrary number of actions based on the resource constraint. This sequential action selection leads to increased dimensionality, suboptimal convergence in training, and greater computational complexity. Arbitrary action space, in particular, is understudied in DRL. Furthermore, existing DRL algorithms in resource allocation problems consider homogeneous constraints on either the UDs or the server side, while resource allocation problems usually include various resource constraints. There are continuous-valued resource constraints on UDs, discrete-valued number of channels on the communication network, and combinatorial competition of UDs on the server due to storage constraints. These challenges can generally be summarized as handling an arbitrary state space for an ASO input, an arbitrary action space for an ASO output, and various constraints on the UDs and the server. The objective of this research is to advance DRL algorithms for online combinatorial resource allocation problems so that they can effectively handle arbitrary state and action spaces for the ASO inputs and outputs, as well as to consider heterogeneous resource constraints on the UDs and the server. Consequently, we propose three solutions as follows. 1) A novel DRL algorithm with coalition action selection for online combinatorial resource allocation. The coalition action selection enables DRL to simultaneously select an arbitrary number of actions without updating the state multiple times. By reducing state space and depth of decision, coalition action selection provides better performance, faster convergence, and lower execution complexity compared to conventional sequential action selection approaches, where the state is updated for every action taken. 2) We proposed a novel DRL algorithm with computationally efficient stationary ASO input transformation for online combinatorial resource allocation problems. By using a set of equations to transform the ASO input to a fixed-size vector, the stationary ASO input transformation provides better convergence and lower computational cost than a transformer neural network-based transformation, which is used as the state-of-the-art technique to handle ASO inputs in existing combinatorial optimization problems. The reason for the efficiency is that the transformer is designed to learn contextual relationships between the sequence of words in natural language processing (NLP), but the ASO inputs in the resource allocation are numerical and do not have as significant a relationship as the sequence of words. 3) By applying coalition action selection to the multiagent deep deterministic policy gradient (MADDPG), we propose a combinatorial client-master multiagent DRL (CCM_MADRL) algorithm for task offloading in mobile edge computing (CCM_MADRL_MEC) to handle various resource constraints. The efficiencies of the proposed solutions, compared to state-of-the-art approaches, are assessed using online resource allocation problems with arbitrary arrival of tasks and various resource constraints. The DRL algorithm with coalition action selection is evaluated using an online resource allocation problem with an arbitrary number of tasks. It has outperformed existing sequential action selection approaches in terms of proximity to offline optimal solutions, speed of convergence, and computational costs. The coalition action selection retains close to offline optimal performance in settings with different task arrival rates, whereas the sequential action selection approach drops in performance when the task arrival rate is high. The DRL with coalition action selection is implemented using the encoder of the transformer neural network. We evaluated the stationary ASO input transformation on the same problem. It has outperformed the transformer-based transformation in various sizes of task arrival rates. Furthermore, the stationary ASO input transformation yields a lower computational complexity than the transformer for various task arrival rates. The difference in computational cost between the transformer and the stationary ASO input transformation is greater in the sequential action selection than in the coalition action selection. Lastly, the CCM_MADRL algorithm is evaluated using a task-offloading problem in mobile edge computing with different constraints, such as battery level, task deadline, transmission power, computational resource, server storage capacity, and number of communication channels. By exploiting the different advantages of the policy iteration and value function and the coalition action selection approach, it has demonstrated better convergence than the existing MADDPG and heuristic algorithms.

Text
Tesfay_Final_PhD_Thesis_PDFA - Version of Record
Available under License University of Southampton Thesis Licence.
Download (3MB)
Text
Final-thesis-submission-Examination-Mr-Tesfay-Gebrekidan
Restricted to Repository staff only
Available under License University of Southampton Thesis Licence.

More information

Published date: June 2024

Identifiers

Local EPrints ID: 491435
URI: http://eprints.soton.ac.uk/id/eprint/491435
PURE UUID: ec381c83-2f8a-4530-8a22-2a5e002d516b
ORCID for Tesfay Zemuy Gebrekidan: ORCID iD orcid.org/0000-0002-0182-0997
ORCID for Sebastian Stein: ORCID iD orcid.org/0000-0003-2858-8857
ORCID for Tim Norman: ORCID iD orcid.org/0000-0002-6387-4034

Catalogue record

Date deposited: 24 Jun 2024 16:33
Last modified: 25 Jun 2024 01:58

Export record

Contributors

Author: Tesfay Zemuy Gebrekidan ORCID iD
Thesis advisor: Sebastian Stein ORCID iD
Thesis advisor: Tim Norman ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×