Dynamic DNNs meet runtime resource management for efficient heterogeneous computing

Deep Neural Network (DNN) inference is increasingly being deployed on edge devices, driven by the advantages of lower latency and enhanced privacy. However, the deployment of these models on such platforms poses considerable challenges due to the intensive computation and memory access requirements. While various static model compression techniques have been proposed, they often struggle when adapting to the dynamic computing environments of modern heterogeneous platforms. The two main challenges we focus on in our research are: (1) Dynamic Hardware and Runtime Conditions: Modern edge devices are equipped with heterogeneous computing resources, including CPUs, GPUs, NPUs, and FPGAs. Their availability and performance can change dynamically during runtime, influenced by factors such as device state, power constraints, and thermal conditions. Moreover, DNN models may need to share resources with other applications or models, introducing an additional layer of complexity to the quest for consistent performance and efficiency. (2) Dynamic Application Requirements: The same DNN model can be used in a variety of applications, each with unique and potentially fluctuating performance requirements.

In this poster, we will explore the world of dynamic neural networks, with a particular focus on their role in efficient model deployment in dynamic computing environments. Our system leverages runtime trade-offs in both algorithms and hardware to optimize DNN performance and energy efficiency. A cornerstone of our system is the Dynamic-OFA, a dynamic version of the 'once-for-all network', designed to efficiently scale the ConvNet architecture to fit the dynamic application requirements and hardware resources. It exhibits strong generalization across different model architectures, such as Transformer. We will also discuss the benefits of integrating algorithmic techniques with hardware opportunities, including Dynamic Voltage and Frequency Scaling (DVFS) and task mapping. Our experimental results, using ImageNet on a Jetson Xavier NX, reveal that the Dynamic-OFA outperforms state of-the-art Dynamic DNNs, offering up to 3.5x (CPU) and 2.4x (GPU) speed improvements for similar ImageNet Top-1 accuracy, or a 3.8% (CPU) and 5.1% (GPU) increase in accuracy at similar latency.

Xun, Lei

d30d0c37-7c17-4eed-b02c-1a0f81844f17

Hare, Jonathon

65ba2cda-eaaf-4767-a325-cd845504e5a9

Merrett, Geoff

89b3a696-41de-44c3-89aa-b0aa29f54020

September 2023

Xun, Lei

d30d0c37-7c17-4eed-b02c-1a0f81844f17

Hare, Jonathon

65ba2cda-eaaf-4767-a325-cd845504e5a9

Merrett, Geoff

89b3a696-41de-44c3-89aa-b0aa29f54020

Xun, Lei, Hare, Jonathon and Merrett, Geoff (2023) Dynamic DNNs meet runtime resource management for efficient heterogeneous computing. Workshop on Novel Architecture and Novel Design Automation (NANDA), Imperial College London, London, United Kingdom. 11 - 12 Sep 2023. 1 pp .

Record type: Conference or Workshop Item (Poster)

Abstract

Text

Dynamic DNNs Meet Runtime Resource Management for Efficient Heterogeneous Computing - Accepted Manuscript

Available under License Creative Commons Attribution.

Download (1MB)

More information

Accepted/In Press date: 11 September 2023

Published date: September 2023

Venue - Dates: Workshop on Novel Architecture and Novel Design Automation (NANDA), Imperial College London, London, United Kingdom, 2023-09-11 - 2023-09-12

Learn more about Vision, Learning and Control research Learn more about Cyber Physical Systems research