Runtime Algorithm and Hardware Management for Efficient DNN Inference on Mobile/Embedded Platforms

Xun, Lei (2025) Runtime Algorithm and Hardware Management for Efficient DNN Inference on Mobile/Embedded Platforms. University of Southampton, Doctoral Thesis, 150pp.

Record type: Thesis (Doctoral)

Abstract

Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to enhanced privacy, reduced latency, and improved energy efficiency. Efficient DNN deployment on these platforms is challenging due to limited computing resources. Although many static DNN model compression approaches have been proposed, they rely on prior knowledge of application performance requirements and hardware resource availability to determine the compression ratio. However, because both of these factors vary at runtime, statically compressed models cannot maintain consistent performance.Prior work has addressed this issue through algorithmic approaches (e.g., DNN model switching or dynamic DNNs) or runtime hardware resource management. However, there is limited literature on integrating the advantages of both algorithms and hardware at runtime. In this thesis, we investigate runtime DNN algorithm and hardware management, and develop a runtime system to optimise DNN performance as well as power and energy efficiency by leveraging the trade off opportunities from both algorithms and hardware platforms.First, our study finds that earlier dynamic DNN models suffer from significant memory overhead, limited runtime model compression ratio, and a narrow range of dynamic performance trade-offs. To address these issues, we propose a dynamic DNN approach that uses incremental training and group convolution pruning. In this approach, the channels of each DNN convolutional layer are divided into groups, which are then trained incrementally. At runtime, these pre-trained groups can be pruned to reduce latency and energy consumption or added back for accuracy recovery, all in real time and without the need for any retraining. At the same compression ratio, our proposed dynamic DNN model achieves a 2.4× reduction in memory footprint compared to prior work. In addition, we combine dynamic voltage and frequency scaling (DVFS) and task mapping with the model, enabling fine-grained and wide-ranging dynamic performance trade-offs. Next, we identify three common issues with all existing dynamic DNN approaches: (1) significant training time, (2) incompatibility with state-of-the-art Neural Architecture Search (NAS) deployment pipeline, and (3) suboptimal inference on heterogeneous hardware platforms. To address these problems, we propose the Dynamic Super-Network, a novel dynamic DNN approach designed specifically for NAS models. Unlike traditional resource-intensive approaches to train dynamic DNN models, this approach pre-samples diverse sub-networks from NAS super-network, eliminating the need for training. By sampling separate sub-network libraries for each type of heterogeneous hardware resource (e.g., CPU and GPU) on modern SoCs, one backbone super-network can efficiently scale across all hardware resources. On an Nvidia Jetson Xavier NX platform using the ImageNet dataset, our approach outperforms state-of-the-art work by achieving up to 3.5× (CPU) and 2.4× (GPU) faster inference at similar Top-1 accuracy, or delivering 3.8% (CPU) and 5.1% (GPU) higher accuracy at similar latency.To explore opportunities in both algorithms and hardware platforms, we propose a hierarchical runtime resource management approach that adjusts dynamic DNN models and DVFS to meet application- and user-level performance requirements (e.g., accuracy and latency) while respecting hardware constraints (e.g., power consumption). Compared with the Linux schedutil governor, our approach achieves a 13.7% reduction in energy consumption and a 6.5% reduction in latency when deploying a single DNN model, and up to a 47.2% reduction in energy consumption and a 19% reduction in latency when deploying two DNN models concurrently.

Text

Lei Xun - Runtime Algorithm and Hardware Management for Efficient DNN Inference on Mobile and Embedded Platforms - Version of Record

Available under License University of Southampton Thesis Licence.

Download (15MB)

Text

Final-thesis-submission-Examination-Mr-Lei-Xun

Restricted to Repository staff only

Available under License University of Southampton Thesis Licence.