The University of Southampton
University of Southampton Institutional Repository

Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores

Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores
Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores
Heterogeneous multi-core platforms that contain different types of cores, organized as clusters, are emerging, e.g. ARM’s big.LITTLE architecture. These platforms often need to deal with multiple applications, having different performance requirements, executing concurrently. This leads to generation of varying and mixed workloads (e.g. compute and memory intensive) due to resource sharing. Run-time management is required for adapting to such performance requirements and workload variabilities and to achieve energy efficiency. Moreover, the management becomes challenging when the applications are multi-threaded and the heterogeneity needs to be exploited. The existing run-time management approaches do not efficiently exploit cores situated in different clusters simultaneously (referred to as inter-cluster exploitation) and DVFS potential of cores, which is the aim of this paper. Such exploitation might help to satisfy the performance requirement while achieving energy savings at the same time. Therefore, in this paper, we propose a run-time management approach that first selects thread-to-core mapping based on the performance requirements and resource availability. Then, it applies online adaptation by adjusting the voltage-frequency (V-f) levels to achieve energy optimization, without trading-off application performance. For thread-to-core mapping, offline profiled results are used, which contain performance and energy characteristics of applications when executed on the heterogeneous platform by using different types of cores in various possible combinations. For an application, thread-to-core mapping process defines the number of used cores and their type, which are situated in different clusters. The online adaptation process classifies the inherent workload characteristics of concurrently executing applications, incurring a lower overhead than existing learning-based approaches as demonstrated in this paper. The classification of workload is performed using the metric Memory Reads Per Instruction (MRPI). The adaptation process pro-actively selects an appropriate V-f pair for a predicted workload. Subsequently, it monitors the workload prediction error and performance loss, quantified by instructions per second (IPS), and adjusts the chosen V-f to compensate. We validate the proposed run-time management approach on a hardware platform, the Odroid-XU3, with various combinations of multi-threaded applications from PARSEC and SPLASH benchmarks. Results show an average improvement in energy efficiency up to 33% compared to existing approaches while meeting the performance requirements.
Heterogeneous multi-cores,, Multi-threaded applications, Run-time management, Performance, Energy consumption
2332-7766
1-14
Reddy, Basireddy Karunakar
5bfb0b2e-8242-499a-a52b-e813d9a90889
Singh, Amit
bb67d43e-34d9-4b58-9295-8b5458270408
Biswas, Dwaipayan
bc8a9147-64df-451f-b00b-e1265087b6f3
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020
Al-Hashimi, Bashir
0b29c671-a6d2-459c-af68-c4614dce3b5d
Reddy, Basireddy Karunakar
5bfb0b2e-8242-499a-a52b-e813d9a90889
Singh, Amit
bb67d43e-34d9-4b58-9295-8b5458270408
Biswas, Dwaipayan
bc8a9147-64df-451f-b00b-e1265087b6f3
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020
Al-Hashimi, Bashir
0b29c671-a6d2-459c-af68-c4614dce3b5d

Reddy, Basireddy Karunakar, Singh, Amit, Biswas, Dwaipayan, Merrett, Geoff and Al-Hashimi, Bashir (2017) Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores. IEEE Transactions on Multiscale Computing Systems, 1-14. (doi:10.1109/TMSCS.2017.2755619).

Record type: Article

Abstract

Heterogeneous multi-core platforms that contain different types of cores, organized as clusters, are emerging, e.g. ARM’s big.LITTLE architecture. These platforms often need to deal with multiple applications, having different performance requirements, executing concurrently. This leads to generation of varying and mixed workloads (e.g. compute and memory intensive) due to resource sharing. Run-time management is required for adapting to such performance requirements and workload variabilities and to achieve energy efficiency. Moreover, the management becomes challenging when the applications are multi-threaded and the heterogeneity needs to be exploited. The existing run-time management approaches do not efficiently exploit cores situated in different clusters simultaneously (referred to as inter-cluster exploitation) and DVFS potential of cores, which is the aim of this paper. Such exploitation might help to satisfy the performance requirement while achieving energy savings at the same time. Therefore, in this paper, we propose a run-time management approach that first selects thread-to-core mapping based on the performance requirements and resource availability. Then, it applies online adaptation by adjusting the voltage-frequency (V-f) levels to achieve energy optimization, without trading-off application performance. For thread-to-core mapping, offline profiled results are used, which contain performance and energy characteristics of applications when executed on the heterogeneous platform by using different types of cores in various possible combinations. For an application, thread-to-core mapping process defines the number of used cores and their type, which are situated in different clusters. The online adaptation process classifies the inherent workload characteristics of concurrently executing applications, incurring a lower overhead than existing learning-based approaches as demonstrated in this paper. The classification of workload is performed using the metric Memory Reads Per Instruction (MRPI). The adaptation process pro-actively selects an appropriate V-f pair for a predicted workload. Subsequently, it monitors the workload prediction error and performance loss, quantified by instructions per second (IPS), and adjusts the chosen V-f to compensate. We validate the proposed run-time management approach on a hardware platform, the Odroid-XU3, with various combinations of multi-threaded applications from PARSEC and SPLASH benchmarks. Results show an average improvement in energy efficiency up to 33% compared to existing approaches while meeting the performance requirements.

Text Accepted_TMSCS - Accepted Manuscript
Download (1MB)

More information

Accepted/In Press date: 6 September 2017
e-pub ahead of print date: 26 September 2017
Keywords: Heterogeneous multi-cores,, Multi-threaded applications, Run-time management, Performance, Energy consumption

Identifiers

Local EPrints ID: 414715
URI: https://eprints.soton.ac.uk/id/eprint/414715
ISSN: 2332-7766
PURE UUID: 25e62fa7-37d6-4a17-a4b9-b8fab972a5b2
ORCID for Basireddy Karunakar Reddy: ORCID iD orcid.org/0000-0001-9755-1041
ORCID for Geoff Merrett: ORCID iD orcid.org/0000-0003-4980-3894

Catalogue record

Date deposited: 07 Oct 2017 16:31
Last modified: 06 Jun 2018 12:42

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×