The University of Southampton
University of Southampton Institutional Repository

Dynamic DNNs and runtime management for efficient inference on mobile/embedded devices

Dynamic DNNs and runtime management for efficient inference on mobile/embedded devices
Dynamic DNNs and runtime management for efficient inference on mobile/embedded devices
Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to several key advantages in latency, privacy and always-on availability. However, due to limited computing resources, efficient DNN deployment on mobile and embedded platforms is challenging. Although many hardware accelerators and static model compression methods were proposed by previous works, at system runtime, multiple applications are typically executed concurrently and compete for hardware resources. This raises two main challenges: Runtime Hardware Availability and Runtime Application Variability. Previous works have addressed these challenges through either dynamic neural networks that contain sub-networks with different performance trade-offs or runtime hardware resource management. In this thesis, we proposed a combined method, a system was developed for DNN performance trade-off management, combining the runtime trade-off opportunities in both algorithms and hardware to meet dynamically changing application performance targets and hardware constraints in real time. We co-designed novel Dynamic Super-Networks to maximise runtime system-level performance and energy efficiency on heterogeneous hardware platforms. Compared with SOTA, our experimental results using ImageNet on the GPU of Jetson Xavier NX show our model is 2.4x faster for similar ImageNet Top-1 accuracy, or 5.1% higher accuracy at similar latency. We also designed a hierarchical runtime resource manager that tunes both dynamic neural networks and DVFS at runtime. Compared with the Linux DVFS governor schedutil, our runtime approach achieves up to a 19% energy reduction and a 9% latency reduction in single model deployment scenario, and an 89% energy reduction and a 23% latency reduction in a two concurrent model deployment scenario.
Xun, Lei
d30d0c37-7c17-4eed-b02c-1a0f81844f17
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Merrett, Geoff V.
89b3a696-41de-44c3-89aa-b0aa29f54020
Xun, Lei
d30d0c37-7c17-4eed-b02c-1a0f81844f17
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Merrett, Geoff V.
89b3a696-41de-44c3-89aa-b0aa29f54020

Xun, Lei, Hare, Jonathon and Merrett, Geoff V. (2024) Dynamic DNNs and runtime management for efficient inference on mobile/embedded devices. Design, Automation and Test in Europe Conference: PhD Forum, , Valencia, Spain. 25 - 27 Mar 2024. 3 pp . (In Press)

Record type: Conference or Workshop Item (Poster)

Abstract

Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to several key advantages in latency, privacy and always-on availability. However, due to limited computing resources, efficient DNN deployment on mobile and embedded platforms is challenging. Although many hardware accelerators and static model compression methods were proposed by previous works, at system runtime, multiple applications are typically executed concurrently and compete for hardware resources. This raises two main challenges: Runtime Hardware Availability and Runtime Application Variability. Previous works have addressed these challenges through either dynamic neural networks that contain sub-networks with different performance trade-offs or runtime hardware resource management. In this thesis, we proposed a combined method, a system was developed for DNN performance trade-off management, combining the runtime trade-off opportunities in both algorithms and hardware to meet dynamically changing application performance targets and hardware constraints in real time. We co-designed novel Dynamic Super-Networks to maximise runtime system-level performance and energy efficiency on heterogeneous hardware platforms. Compared with SOTA, our experimental results using ImageNet on the GPU of Jetson Xavier NX show our model is 2.4x faster for similar ImageNet Top-1 accuracy, or 5.1% higher accuracy at similar latency. We also designed a hierarchical runtime resource manager that tunes both dynamic neural networks and DVFS at runtime. Compared with the Linux DVFS governor schedutil, our runtime approach achieves up to a 19% energy reduction and a 9% latency reduction in single model deployment scenario, and an 89% energy reduction and a 23% latency reduction in a two concurrent model deployment scenario.

Text
DATE2024_PhD_Forum - Accepted Manuscript
Available under License Creative Commons Attribution.
Download (415kB)

More information

Accepted/In Press date: 12 January 2024
Venue - Dates: Design, Automation and Test in Europe Conference: PhD Forum, , Valencia, Spain, 2024-03-25 - 2024-03-27

Identifiers

Local EPrints ID: 486303
URI: http://eprints.soton.ac.uk/id/eprint/486303
PURE UUID: 5645bd44-d228-42f8-816b-e376ee2c8496
ORCID for Jonathon Hare: ORCID iD orcid.org/0000-0003-2921-4283
ORCID for Geoff V. Merrett: ORCID iD orcid.org/0000-0003-4980-3894

Catalogue record

Date deposited: 17 Jan 2024 17:30
Last modified: 20 Jan 2024 02:43

Export record

Contributors

Author: Lei Xun
Author: Jonathon Hare ORCID iD
Author: Geoff V. Merrett ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×