The University of Southampton
University of Southampton Institutional Repository

Nucleus: finding the sharing limit of heterogeneous cores

Nucleus: finding the sharing limit of heterogeneous cores
Nucleus: finding the sharing limit of heterogeneous cores
Heterogeneous multi-processors are designed to bridge the gap between performance and energy efficiency in modern embedded systems. This is achieved by pairing Out-of-Order (OoO) cores, yielding performance through aggressive speculation and latency masking, with In-Order (InO) cores, that preserve energy through simpler design. By leveraging migrations between them, workloads can therefore select the best setting for any given energy/delay envelope. However, migrations introduce execution overheads that can hurt performance if they happen too frequently. Finding the optimal migration frequency is critical to maximize energy savings while maintaining acceptable performance. We develop a simulation methodology that can 1) isolate the hardware effects of migrations from the software, 2) directly compare the performance of different core types, 3) quantify the performance degradation and 4) calculate the cost of migrations for each case. To showcase our methodology we run mibench, a microbenchmark suite, and show that migrations can happen as fast as every 100k instructions with little performance loss. We also show that, contrary to numerous recent studies, hypothetical designs do not need to share all of their internal components to be able to migrate at that frequency. Instead, we propose a feasible system that shares level 2 caches and a translation lookaside buffer that matches performance and efficiency. Our results show that there are phases comprising up to 10% that a migration to the OoO core leads to performance benefits without any additional energy cost when running on the InO core, and up to 6% of phases where a migration to the InO core can save energy without affecting performance. When considering a policy that focuses on improving the energy-delay product, results show that on average 66% of the phases can be migrated to deliver equal or better system operation without having to aggressively share the entire memory system or to revert to migration periods finer than 100k instructions.
1539-9087
Vougioukas, Ilias
b5654d64-ff5c-43ab-a005-97a72cc343d7
Sandberg, Andreas
d09c2a2a-151d-439c-b258-b852eeb56e33
Diestelhorst, Stephan
5ac0a14f-5a42-4e09-a173-399b97170272
Al-Hashimi, Bashir
0b29c671-a6d2-459c-af68-c4614dce3b5d
Merrett, Geoffrey
89b3a696-41de-44c3-89aa-b0aa29f54020
Vougioukas, Ilias
b5654d64-ff5c-43ab-a005-97a72cc343d7
Sandberg, Andreas
d09c2a2a-151d-439c-b258-b852eeb56e33
Diestelhorst, Stephan
5ac0a14f-5a42-4e09-a173-399b97170272
Al-Hashimi, Bashir
0b29c671-a6d2-459c-af68-c4614dce3b5d
Merrett, Geoffrey
89b3a696-41de-44c3-89aa-b0aa29f54020

Vougioukas, Ilias, Sandberg, Andreas, Diestelhorst, Stephan, Al-Hashimi, Bashir and Merrett, Geoffrey (2017) Nucleus: finding the sharing limit of heterogeneous cores ACM Transactions on Embedded Computing Systems, 16, (5s) (doi:10.1145/3126544).

Record type: Article

Abstract

Heterogeneous multi-processors are designed to bridge the gap between performance and energy efficiency in modern embedded systems. This is achieved by pairing Out-of-Order (OoO) cores, yielding performance through aggressive speculation and latency masking, with In-Order (InO) cores, that preserve energy through simpler design. By leveraging migrations between them, workloads can therefore select the best setting for any given energy/delay envelope. However, migrations introduce execution overheads that can hurt performance if they happen too frequently. Finding the optimal migration frequency is critical to maximize energy savings while maintaining acceptable performance. We develop a simulation methodology that can 1) isolate the hardware effects of migrations from the software, 2) directly compare the performance of different core types, 3) quantify the performance degradation and 4) calculate the cost of migrations for each case. To showcase our methodology we run mibench, a microbenchmark suite, and show that migrations can happen as fast as every 100k instructions with little performance loss. We also show that, contrary to numerous recent studies, hypothetical designs do not need to share all of their internal components to be able to migrate at that frequency. Instead, we propose a feasible system that shares level 2 caches and a translation lookaside buffer that matches performance and efficiency. Our results show that there are phases comprising up to 10% that a migration to the OoO core leads to performance benefits without any additional energy cost when running on the InO core, and up to 6% of phases where a migration to the InO core can save energy without affecting performance. When considering a policy that focuses on improving the energy-delay product, results show that on average 66% of the phases can be migrated to deliver equal or better system operation without having to aggressively share the entire memory system or to revert to migration periods finer than 100k instructions.

Text 38_Vougioukas - Accepted Manuscript
Download (2MB)

More information

Accepted/In Press date: 30 June 2017
e-pub ahead of print date: 10 October 2017
Published date: October 2017

Identifiers

Local EPrints ID: 412917
URI: http://eprints.soton.ac.uk/id/eprint/412917
ISSN: 1539-9087
PURE UUID: 94471bd3-75b4-49a2-9c0c-52ba8db2dba4
ORCID for Ilias Vougioukas: ORCID iD orcid.org/0000-0003-1444-4326
ORCID for Geoffrey Merrett: ORCID iD orcid.org/0000-0003-4980-3894

Catalogue record

Date deposited: 08 Aug 2017 16:31
Last modified: 01 Nov 2017 17:32

Export record

Altmetrics

Contributors

Author: Ilias Vougioukas ORCID iD
Author: Andreas Sandberg
Author: Stephan Diestelhorst
Author: Geoffrey Merrett ORCID iD

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×