Dynamic DNNs meet runtime resource management for efficient heterogeneous computing
Dynamic DNNs meet runtime resource management for efficient heterogeneous computing
Deep Neural Network (DNN) inference is increasingly being deployed on edge devices, driven by the advantages of lower latency and enhanced privacy. However, the deployment of these models on such platforms poses considerable challenges due to the intensive computation and memory access requirements. While various static model compression techniques have been proposed, they often struggle when adapting to the dynamic computing environments of modern heterogeneous platforms. The two main challenges we focus on in our research are: (1) Dynamic Hardware and Runtime Conditions: Modern edge devices are equipped with heterogeneous computing resources, including CPUs, GPUs, NPUs, and FPGAs. Their availability and performance can change dynamically during runtime, influenced by factors such as device state, power constraints, and thermal conditions. Moreover, DNN models may need to share resources with other applications or models, introducing an additional layer of complexity to the quest for consistent performance and efficiency. (2) Dynamic Application Requirements: The same DNN model can be used in a variety of applications, each with unique and potentially fluctuating performance requirements.
In this poster, we will explore the world of dynamic neural networks, with a particular focus on their role in efficient model deployment in dynamic computing environments. Our system leverages runtime trade-offs in both algorithms and hardware to optimize DNN performance and energy efficiency. A cornerstone of our system is the Dynamic-OFA, a dynamic version of the 'once-for-all network', designed to efficiently scale the ConvNet architecture to fit the dynamic application requirements and hardware resources. It exhibits strong generalization across different model architectures, such as Transformer. We will also discuss the benefits of integrating algorithmic techniques with hardware opportunities, including Dynamic Voltage and Frequency Scaling (DVFS) and task mapping. Our experimental results, using ImageNet on a Jetson Xavier NX, reveal that the Dynamic-OFA outperforms state of-the-art Dynamic DNNs, offering up to 3.5x (CPU) and 2.4x (GPU) speed improvements for similar ImageNet Top-1 accuracy, or a 3.8% (CPU) and 5.1% (GPU) increase in accuracy at similar latency.
Xun, Lei
d30d0c37-7c17-4eed-b02c-1a0f81844f17
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020
September 2023
Xun, Lei
d30d0c37-7c17-4eed-b02c-1a0f81844f17
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020
Xun, Lei, Hare, Jonathon and Merrett, Geoff
(2023)
Dynamic DNNs meet runtime resource management for efficient heterogeneous computing.
Workshop on Novel Architecture and Novel Design Automation (NANDA), Imperial College London, London, United Kingdom.
11 - 12 Sep 2023.
1 pp
.
Record type:
Conference or Workshop Item
(Poster)
Abstract
Deep Neural Network (DNN) inference is increasingly being deployed on edge devices, driven by the advantages of lower latency and enhanced privacy. However, the deployment of these models on such platforms poses considerable challenges due to the intensive computation and memory access requirements. While various static model compression techniques have been proposed, they often struggle when adapting to the dynamic computing environments of modern heterogeneous platforms. The two main challenges we focus on in our research are: (1) Dynamic Hardware and Runtime Conditions: Modern edge devices are equipped with heterogeneous computing resources, including CPUs, GPUs, NPUs, and FPGAs. Their availability and performance can change dynamically during runtime, influenced by factors such as device state, power constraints, and thermal conditions. Moreover, DNN models may need to share resources with other applications or models, introducing an additional layer of complexity to the quest for consistent performance and efficiency. (2) Dynamic Application Requirements: The same DNN model can be used in a variety of applications, each with unique and potentially fluctuating performance requirements.
In this poster, we will explore the world of dynamic neural networks, with a particular focus on their role in efficient model deployment in dynamic computing environments. Our system leverages runtime trade-offs in both algorithms and hardware to optimize DNN performance and energy efficiency. A cornerstone of our system is the Dynamic-OFA, a dynamic version of the 'once-for-all network', designed to efficiently scale the ConvNet architecture to fit the dynamic application requirements and hardware resources. It exhibits strong generalization across different model architectures, such as Transformer. We will also discuss the benefits of integrating algorithmic techniques with hardware opportunities, including Dynamic Voltage and Frequency Scaling (DVFS) and task mapping. Our experimental results, using ImageNet on a Jetson Xavier NX, reveal that the Dynamic-OFA outperforms state of-the-art Dynamic DNNs, offering up to 3.5x (CPU) and 2.4x (GPU) speed improvements for similar ImageNet Top-1 accuracy, or a 3.8% (CPU) and 5.1% (GPU) increase in accuracy at similar latency.
Text
Dynamic DNNs Meet Runtime Resource Management for Efficient Heterogeneous Computing
- Accepted Manuscript
More information
Accepted/In Press date: 11 September 2023
Published date: September 2023
Venue - Dates:
Workshop on Novel Architecture and Novel Design Automation (NANDA), Imperial College London, London, United Kingdom, 2023-09-11 - 2023-09-12
Identifiers
Local EPrints ID: 481848
URI: http://eprints.soton.ac.uk/id/eprint/481848
PURE UUID: 6b095223-aad2-407f-bdbc-29c7ec6ad6cc
Catalogue record
Date deposited: 11 Sep 2023 17:00
Last modified: 18 Mar 2024 03:03
Export record
Contributors
Author:
Lei Xun
Author:
Jonathon Hare
Author:
Geoff Merrett
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics