READ ME File For 'Data for Efficient Deep Learning Inference at the Edge' ReadMe Author: SULAIMAN SADIQ, University of Southampton This dataset supports the thesis: AUTHOR: Sulaiman Sadiq ACADEMIC SUPERVISORS: Geoff Merrett, Jonathon Hare INDUSTRIAL SUPERVISORS: Partha Maji, Simon Craske TITLE: Efficient Deep Learning Inference at the Edge JOURNAL: IEEE Internet of Things Journal DOI: https://doi.org/10.5258/SOTON/D3550 This dataset contains: ---- Figure 3.1 ---- Data used in Figure 3.1 to draw the sketch of the loss landscape of the cross entropy loss Filename: 'fig_3_1_X.csv', 'fig_3_1_Y.csv', 'fig_3_1_Z.csv' Data: The X, Y and Z data points used to plot the cross entropy loss landscape ---- Figure 3.2a ---- Data used in Figure 3.2 to draw the sketch of the loss landscape of the compute cost Filename: 'fig_3_2_a.csv' Data: The Z data points used to plot the compute cost ---- Figure 3.2b ---- Data used in Figure 3.2 to draw the sketch of the loss landscape of the performance loss Filename: 'fig_3_2_b.csv' Data: The Z data points used to plot the performance loss ---- Figure 3.3 ---- Data used in Figure 3.3 to draw the relation between the compute cost and cross-entropy loss Filename: 'fig_3_3.csv' Data: The cross entropy loss after training and compute cost of 5 different architectures ---- Figure 3.4a ---- Data used in Figure 3.4a to draw the relation between the compute cost and cross-entropy loss when the compute cost is modulated with a hyper parameter Filename: 'fig_3_4a.csv' Data: The cross entropy loss after training and the modulated compute cost of 5 different architectures ---- Figure 3.4b ---- Data used in Figure 3.4b to draw the relation between the compute cost and cross-entropy loss when the compute cost is modulated with a hyper parameter Filename: 'fig_3_4b.csv' Data: The cross entropy loss after training and the modulated compute cost of 5 different architectures ---- Figure 3.5a ---- Data used in Figure 3.5a to draw the sketch of the loss landscape of the compute cost modulated by a hyper parameter Filename: 'fig_3_5_a.csv' Data: The Z data points used to plot the compute cost ---- Figure 3.5b ---- Data used in Figure 3.5b to draw the sketch of the loss landscape of the performance loss modulated by a hyper parameter Filename: 'fig_3_5_b.csv' Data: The Z data points used to plot the performance loss ---- Figure 3.6a ---- Data used in Figure 3.6 to draw the sketch of the loss landscape of the compute cost modulated by a hyper parameter Filename: 'fig_3_6_a.csv' Data: The Z data points used to plot the compute cost ---- Figure 3.6b ---- Data used in Figure 3.6 to draw the sketch of the loss landscape of the performance loss modulated by a hyper parameter Filename: 'fig_3_6_b.csv' Data: The Z data points used to plot the performance loss ---- Figure 3.7a ---- Data used in Figure 3.7a to plot the cross entropy training loss through training Filename: 'fig_3_7_ce_loss_train0.01.csv', 'fig_3_7_ce_loss_train0.02.csv', 'fig_3_7_ce_loss_train0.04.csv' Data: The cross-entropy loss on the training data for different values of the hyper-parameter gamme throughout the training ---- Figure 3.7b ---- Data used in Figure 3.7a to plot the cross entropy validation loss through training Filename: 'fig_3_7_ce_loss_valid0.01.csv', 'fig_3_7_ce_loss_valid0.02.csv', 'fig_3_7_ce_loss_valid0.04.csv' Data: The cross-entropy loss on the validation data for different values of the hyper-parameter gamme throughout the training ---- Figure 3.8a ---- Data used in Figure 3.8a to plot the compute cost through training Filename: 'fig_3_8_op_loss_train0.01.csv', 'fig_3_8_op_loss_train0.02.csv', 'fig_3_8_op_loss_train0.04.csv' Data: The compute cost for different values of the hyper-parameter gamme throughout the training ---- Figure 3.8b ---- Data used in Figure 3.8b to plot the compute cost through training Filename: 'fig_3_8_op_loss_valid0.01.csv', 'fig_3_8_op_loss_valid0.02.csv', 'fig_3_8_op_loss_valid0.04.csv' Data: The compute cost for different values of the hyper-parameter gamme throughout the training ---- Figure 3.9a ---- Data used in Figure 3.9a to plot the performance loss on the training data through training Filename: 'fig_3_9_total_loss_train0.01.csv', 'fig_3_9_total_loss_train0.02.csv', 'fig_3_9_total_loss_train0.04.csv' Data: The performance loss on the training data for different values of the hyper-parameter gamme throughout the training ---- Figure 3.9b ---- Data used in Figure 3.9b to plot the performance loss on the training data through training Filename: 'fig_3_9_total_loss_valid0.01.csv', 'fig_3_9_total_loss_valid0.02.csv', 'fig_3_9_total_loss_valid0.04.csv' Data: The cost of the normal cell for different values of the hyper-parameter gamme throughout the training ---- Figure 3.10a ---- Data used in Figure 3.10a to plot the cost of the normal cell that is derived throughout the training Filename: 'fig_3_10_total_loss_train0.01.csv', 'fig_3_10_total_loss_train0.02.csv', 'fig_3_10_total_loss_train0.04.csv' Data: The cost of the normal cell ---- Figure 3.10b ---- Data used in Figure 3.10b to plot the cost of the reduction cell that is derived throughout the training Filename: 'fig_3_10_total_loss_valid0.01.csv', 'fig_3_10_total_loss_valid0.02.csv', 'fig_3_10_total_loss_valid0.04.csv' Data: The cost of the reduction cell for different values of the hyper-parameter gamme throughout the training ---- Figure 4.4 ---- Data used in Figure 4.4 to compare the latency and power consumption of deploying models onto different hardware using the external and internal memory configuration Filename: 'fig_4_4.csv' Data: The latency and power consumption of deployment of different models on different hardware ---- Figure 4.8 ---- Data used in Figure 4.8 to demonstrate how overlaying different data in inference reduces the inference latency and its internal memory usage Filename: 'fig_4_8.csv' Data: The latancy and internal memory usage when different types of data are overlayed including tiny tensors, filters, biases and quantisation parameters ---- Figure 4.9 ---- Data used in Figure 4.9 to demonstrate how the tiny ops framework can adaptively reduce memory usage to accelerate inference latency on different hardware Filename: 'fig_4_9.csv' Data: The latency and internal memory usage when the same model is deployed within internal memory constraints of different devies ---- Figure 5.3 ---- Data used to draw the accuracy macs pareto frontier and the accuracy latency pareto frontier on different devices and compare models designed for internal memory with the pareto frontier. Filename: 'fig_5_3.csv' Data: The accuracy, multiply and accumulates and latency of different models used to aprroximate the pareto frontier and state of the art internal memory models ---- Figure 5.5 ---- Data used in Figure 5.5 to benchmark the throughput and compare efficiency of different operations in convolutional neural networks when deployed on devices with or without cache Filename: 'fig_5_5_a.csv', 'fig_5_5_b.csv' Data: The latency and multiply and accumulates of different operations in convolutional neural networks when deployed on devies with or without cache ---- Figure 5.6 ---- Data used in Figure 5.6 to compare the efficiency of different backbone models when deployed on devices with or without cache Filename: 'fig_5_6_1.csv', 'fig_5_6_2.csv', 'fig_5_6_3.csv' Data: The latency and multiply and accumulates of different models ---- Figure 5.7 ---- Data used in Figure 5.7 to demonstrate how scaling of width skews the distribution of computation in the model Filename: 'fig_5_7.csv' Data: The multiply and accumulates and latency when different models are deployed with varying width and resolution ---- Figure 5.10 ---- Data used in Figure 5.10 to draw the pareto frontier of different sub-networks in the supernetwork Filename: 'fig_5_10.csv' Data: The accuracy and multiply and accumulates of different subnetworks in the model ---- Figure 5.11 ---- Data used to draw the accuracy of the early exit neural networks and compare with the pareto frontier Filename: 'fig_5_11.csv' Data: The accuracy and multiply and accumulates of derived early exit neural networks and the pareto frontier ---- Figure 5.12a ---- Data used to draw the accuracy of the models derived via neural architecture search and the pareto frontier Filename: 'fig_5_12a.csv' Data: The accuracy and multiply and accumulates of models derived via neural architecture search and the pareto frontier ---- Figure 5.12b ---- Data used to draw the accuracy of the models derived via neural architecture search and the pareto frontier Filename: 'fig_5_12b.csv' Data: The accuracy and multiply and accumulates of models derived via neural architecture search and the pareto frontier ---- Table 2.1 ---- Data used in Table 2.1 that shows the hardware characteristics of platforms used in deployment of deep neural networks including memory, power and compute resources Filename: 'table_2_1.csv' Data: Memory, power and compute resources available on different hardware used in deep learning inference ---- Table 3.1 ---- Data in Table 3.1 that shows the CPU cycles taken for low-level operations used in DEff-ARTS Filename: 'table_3_1.csv' Data: The cpu cycles from a Texas Instruments C64x+ digital signal processor for operations including comparison, addition, multiplication, and division ---- Table 3.2 ---- Data used in Table 3.2 which shows the compute cost of candidate operations included in the DEff-ARTS search space Filename: 'table_3_2.csv' Data: The cost in CPU cycle for the seven candidate operations used in the search space for the DEff-ARTS neural architecture search algorithm ---- Table 3.3 ---- Data to create Table 3.3 which records how the cross entropy loss changes with respect to the compute cost Filename: 'table_3_3.csv' Data: The compute cost and cross entropy loss of two architectures after the models have been trained ---- Table 3.4 ---- Data used in results Table 3.4 Filename: 'table_3_4.csv' Data: The test error, params, search cost, number of ops in search space, search method, multiply and accumulates and compute cost of a number of architectures including NASNet, AmoebeNet, DARTS and DEff-ARTS ---- Table 4.1 ---- Data for Table 4.1 that shows the multiply and accumulates along with the RAM usage, Flash usage and accuracy of a number of models including MCUNet, MNASNet, ProxylessNAS and MobileNetV3 Filename: 'table_4_1.csv' Data: The MACs, RAM usage and Flash usage along with accuracy of model when deployed with quantisation on hardware with TensorFlowLite-Micro ---- Table 4.2 ---- Data for Table 4.2 that shows the hardware characterestics of different MCU platforms and how they compare against each other Filename: 'table_4_2.csv' Data: The architecure, core type, clock along with internal and external memory available on different MCU platforms ---- Table 4.3 ---- Data in Table 4.3 that demonstrates that how Flash and RAM usage varies with changes in the width and resolution of a backbone model Filename: 'table_4_3.csv' Data: The Flash and RAM usage of models with different width and resolution ---- Table 4.4 ---- Data in Table 4.4 that shows the number, usage and size of the buffers utilised by the TinyOps framework Filename: 'table_4_4.csv' Data: The buffer ids followed by a descriptor denoting their usage and the size in bytes of the buffers used for allocation ---- Table 4.5 ---- Data in Table 4.5 that compares the power consumption, latency and energy per inference of deploying models under the different memory configurations on different devices Filename: 'table_4_5.csv' Data: The power consumption, latency and energy per inference when models are deployed using the different memory configurations on the different devices ---- Table 4.6 ---- Data in Table 4.6 that compares the statistics of models deployed with the external memory design space or the TinyOps design space Filename: 'table_4_6.csv' Data: The model name, multiply and accumulates, parameters, accuracy, design space, and latency (ms) of models deployed from the TinyOps and External Memory Design Space ---- Table 5.1 ---- Data in Table 5.1 that compares the statistics of models deployed with the external memory design space or the TinyOps design space Filename: 'table_5_1.csv' Data: The model name, multiply and accumulates, parameters, accuracy, design space, and latency (ms) of models deployed from the TinyOps and External Memory Design Space ---- Table 5.2 ---- Data in Table 5.2 that shows the characteristics and performance of the models derived via the NAS algorithm Filename: 'table_5_2.csv' Data: The width, resolution, params and multiply and accumulates, accuracy and latency of models derived via neural architecture search when deployed on two different devices ---- Table 5.3 ---- Data in Table 5.3 tha compares the end to end cost of the different neural architecture search algorithms Filename: 'table_5_3.csv' Data: The search methods followed by their search cost, training cost and total cost ---- Table C.1 ---- Data in Table C.1 that compares the performance of models derived from DEff-ARTs with models derived via manual scaling Filename: 'table_c_1.csv' Data: The derived models followed by their test error, parameters, search cost, number of operations, search method and multiply and accumulates Date of data collection: 2025 Licence: CC BY Related projects: This work was supported by the UK Research and Innovation (UKRI) Centre for Doctoral Training in Machine Intelligence for Nano-electronic Devices and Systems [EP/S024298/1], the Engineering and Physical Sciences Research Council (EPSRC) International Centre for Spatial Computational Learning [EP/S030069/1] and ARM Limited. The authors also acknowledge the use of the IRIDIS High Performance Computing Facility, and associated support services at the University of Southampton, in the completion of this work. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising. Date that the file was created: Jun, 2025