READ ME File For 'Dataset for "Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms"'

Dataset DOI: https://doi.org/10.5258/SOTON/D1804

ReadMe Author: Wei Lou, University of Southampton

This dataset supports the publication:
AUTHORS: Wei Lou, Lei Xun, Amin Sabet , Jia Bi , Jonathon Hare , Geoff V. Merrett
TITLE: Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms
CONFERENCE: Efficient Deep Learning for Computer Vision Workshop at CVPR Conference 2021
PAPER DOI IF KNOWN:


This dataset contains:
Data for Figure.1,Figure.3,Figure.4,Figure.5,Figure.6,Figure.7,Figure.8

The figures are as follows:

Figure.1
Experimental results illustrating how inference latency constraints characterised at design-time can be violated at runtime by changes in available hardware resources and executing tasks. Pink bars denote the latency on CPU, while blue bars are on GPU. The tables above indicate the combinations of workload and frequency.

Figure.3 
The accuracy and latency of potential sub-network architectures. The optimal architectures are on the Pareto curve.

Figure.4  
Experimental results of Dynamic-OFA's accuracy-latency trade-offs on the a) GPU and b) CPU of an Nvidia Jetson Xavier NX. State of the art approaches (shown in different colours) are also plotted, including static and dynamic DNNs. Dynamic-OFA is 2.4x (GPU) and 3.5x (CPU) faster (at similar accuracy) or has 5.1% (GPU) and 3.8% (CPU) higher Top-1 ImageNet accuracy (at similar latency) than AutoSlim-MNasNet.

Figure.5 
Experimental results of Dynamic-OFA's accuracy-FLOPs trade-offs. Compare with state-of-the-art approaches, Dynamic-OFA achieves up to 50% FLOPs reduction (at similar accuracy) and 2.95% higher Top-1 ImageNet accuracy (at similar FLOPs) than AutoSlim-MNasNet.

Figure.6 
Runtime management of dynamic-OFA to meet the dynamic latency constraints. Latency constraints are denoted as 'dash line'. Different sub-networks with different accuracies are denoted using different colours.

Figure.7 
At t=0, a single DNN is executing. After ~2.5 s, a second application begins executing, reducing available GPU resource and impacting on the DNN inference latency. The runtime adapts, reducing the DNN accuracy until the performance constraint is met.

Figure.8 
Runtime management of two concurrently executing Dynamic-OFA models, each with its own latency constraint (dashed lines).

Date of data collection: July - March 2020

Information about geographic location of data collection: Southampton, UK

Licence: CC BY

Related projects: International Centre for Spatial Computation

Date that the file was created: April 2021