Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores

Heterogeneous multi-core platforms that contain different types of cores, organized as clusters, are emerging, e.g. ARM’s big.LITTLE architecture. These platforms often need to deal with multiple applications, having different performance requirements, executing concurrently. This leads to generation of varying and mixed workloads (e.g. compute and memory intensive) due to resource sharing. Run-time management is required for adapting to such performance requirements and workload variabilities and to achieve energy efficiency. Moreover, the management becomes challenging when the applications are multi-threaded and the heterogeneity needs to be exploited. The existing run-time management approaches do not efficiently exploit cores situated in different clusters simultaneously (referred to as inter-cluster exploitation) and DVFS potential of cores, which is the aim of this paper. Such exploitation might help to satisfy the performance requirement while achieving energy savings at the same time. Therefore, in this paper, we propose a run-time management approach that first selects thread-to-core mapping based on the performance requirements and resource availability. Then, it applies online adaptation by adjusting the voltage-frequency (V-f) levels to achieve energy optimization, without trading-off application performance. For thread-to-core mapping, offline profiled results are used, which contain performance and energy characteristics of applications when executed on the heterogeneous platform by using different types of cores in various possible combinations. For an application, thread-to-core mapping process defines the number of used cores and their type, which are situated in different clusters. The online adaptation process classifies the inherent workload characteristics of concurrently executing applications, incurring a lower overhead than existing learning-based approaches as demonstrated in this paper. The classification of workload is performed using the metric Memory Reads Per Instruction (MRPI). The adaptation process pro-actively selects an appropriate V-f pair for a predicted workload. Subsequently, it monitors the workload prediction error and performance loss, quantified by instructions per second (IPS), and adjusts the chosen V-f to compensate. We validate the proposed run-time management approach on a hardware platform, the Odroid-XU3, with various combinations of multi-threaded applications from PARSEC and SPLASH benchmarks. Results show an average improvement in energy efficiency up to 33% compared to existing approaches while meeting the performance requirements.

Heterogeneous multi-cores,, Multi-threaded applications, Run-time management, Performance, Energy consumption

10.1109/TMSCS.2017.2755619

2332-7766

1-14

Reddy, Basireddy Karunakar

5bfb0b2e-8242-499a-a52b-e813d9a90889

Singh, Amit

bb67d43e-34d9-4b58-9295-8b5458270408

Biswas, Dwaipayan

bc8a9147-64df-451f-b00b-e1265087b6f3

Merrett, Geoff

89b3a696-41de-44c3-89aa-b0aa29f54020

Al-Hashimi, Bashir

0b29c671-a6d2-459c-af68-c4614dce3b5d