The University of Southampton
University of Southampton Institutional Repository

Dynamic DNNs meet runtime resource management for efficient heterogeneous computing

Dynamic DNNs meet runtime resource management for efficient heterogeneous computing
Dynamic DNNs meet runtime resource management for efficient heterogeneous computing
Deep Neural Network (DNN) inference is increasingly being deployed on edge devices, driven by the advantages of lower latency and enhanced privacy. However, the deployment of these models on such platforms poses considerable challenges due to the intensive computation and memory access requirements. While various static model compression techniques have been proposed, they often struggle when adapting to the dynamic computing environments of modern heterogeneous platforms. The two main challenges we focus on in our research are: (1) Dynamic Hardware and Runtime Conditions: Modern edge devices are equipped with heterogeneous computing resources, including CPUs, GPUs, NPUs, and FPGAs. Their availability and performance can change dynamically during runtime, influenced by factors such as device state, power constraints, and thermal conditions. Moreover, DNN models may need to share resources with other applications or models, introducing an additional layer of complexity to the quest for consistent performance and efficiency. (2) Dynamic Application Requirements: The same DNN model can be used in a variety of applications, each with unique and potentially fluctuating performance requirements.

In this poster, we will explore the world of dynamic neural networks, with a particular focus on their role in efficient model deployment in dynamic computing environments. Our system leverages runtime trade-offs in both algorithms and hardware to optimize DNN performance and energy efficiency. A cornerstone of our system is the Dynamic-OFA, a dynamic version of the 'once-for-all network', designed to efficiently scale the ConvNet architecture to fit the dynamic application requirements and hardware resources. It exhibits strong generalization across different model architectures, such as Transformer. We will also discuss the benefits of integrating algorithmic techniques with hardware opportunities, including Dynamic Voltage and Frequency Scaling (DVFS) and task mapping. Our experimental results, using ImageNet on a Jetson Xavier NX, reveal that the Dynamic-OFA outperforms state of-the-art Dynamic DNNs, offering up to 3.5x (CPU) and 2.4x (GPU) speed improvements for similar ImageNet Top-1 accuracy, or a 3.8% (CPU) and 5.1% (GPU) increase in accuracy at similar latency.
Xun, Lei
d30d0c37-7c17-4eed-b02c-1a0f81844f17
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020
Xun, Lei
d30d0c37-7c17-4eed-b02c-1a0f81844f17
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020

Xun, Lei, Hare, Jonathon and Merrett, Geoff (2023) Dynamic DNNs meet runtime resource management for efficient heterogeneous computing. Workshop on Novel Architecture and Novel Design Automation (NANDA), Imperial College London, London, United Kingdom. 11 - 12 Sep 2023. 1 pp .

Record type: Conference or Workshop Item (Poster)

Abstract

Deep Neural Network (DNN) inference is increasingly being deployed on edge devices, driven by the advantages of lower latency and enhanced privacy. However, the deployment of these models on such platforms poses considerable challenges due to the intensive computation and memory access requirements. While various static model compression techniques have been proposed, they often struggle when adapting to the dynamic computing environments of modern heterogeneous platforms. The two main challenges we focus on in our research are: (1) Dynamic Hardware and Runtime Conditions: Modern edge devices are equipped with heterogeneous computing resources, including CPUs, GPUs, NPUs, and FPGAs. Their availability and performance can change dynamically during runtime, influenced by factors such as device state, power constraints, and thermal conditions. Moreover, DNN models may need to share resources with other applications or models, introducing an additional layer of complexity to the quest for consistent performance and efficiency. (2) Dynamic Application Requirements: The same DNN model can be used in a variety of applications, each with unique and potentially fluctuating performance requirements.

In this poster, we will explore the world of dynamic neural networks, with a particular focus on their role in efficient model deployment in dynamic computing environments. Our system leverages runtime trade-offs in both algorithms and hardware to optimize DNN performance and energy efficiency. A cornerstone of our system is the Dynamic-OFA, a dynamic version of the 'once-for-all network', designed to efficiently scale the ConvNet architecture to fit the dynamic application requirements and hardware resources. It exhibits strong generalization across different model architectures, such as Transformer. We will also discuss the benefits of integrating algorithmic techniques with hardware opportunities, including Dynamic Voltage and Frequency Scaling (DVFS) and task mapping. Our experimental results, using ImageNet on a Jetson Xavier NX, reveal that the Dynamic-OFA outperforms state of-the-art Dynamic DNNs, offering up to 3.5x (CPU) and 2.4x (GPU) speed improvements for similar ImageNet Top-1 accuracy, or a 3.8% (CPU) and 5.1% (GPU) increase in accuracy at similar latency.

Text
Dynamic DNNs Meet Runtime Resource Management for Efficient Heterogeneous Computing - Accepted Manuscript
Available under License Creative Commons Attribution.
Download (1MB)

More information

Accepted/In Press date: 11 September 2023
Published date: September 2023
Venue - Dates: Workshop on Novel Architecture and Novel Design Automation (NANDA), Imperial College London, London, United Kingdom, 2023-09-11 - 2023-09-12

Identifiers

Local EPrints ID: 481848
URI: http://eprints.soton.ac.uk/id/eprint/481848
PURE UUID: 6b095223-aad2-407f-bdbc-29c7ec6ad6cc
ORCID for Jonathon Hare: ORCID iD orcid.org/0000-0003-2921-4283
ORCID for Geoff Merrett: ORCID iD orcid.org/0000-0003-4980-3894

Catalogue record

Date deposited: 11 Sep 2023 17:00
Last modified: 18 Mar 2024 03:03

Export record

Contributors

Author: Lei Xun
Author: Jonathon Hare ORCID iD
Author: Geoff Merrett ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×