The University of Southampton
University of Southampton Institutional Repository

Hardware and software innovations in energy-efficient system-reliability monitoring

Hardware and software innovations in energy-efficient system-reliability monitoring
Hardware and software innovations in energy-efficient system-reliability monitoring
Many threats that can undermine the reliability of a system can be realized at design, while others only during its online operation. As the availability of system monitoring sensors and run-time software increases in heterogeneous platforms, there is a demand for a novel platform-independent framework that can capture and deliver, in a holistic way, system level self-assessment and adaptation capabilities at run-time. In this paper, two groups from academia and one from industry present the following three contributions. First, system reliability is considered from the perspective of novel timing guardband designs for aging mitigation. Effective timing guardband models are presented from the physical to the system level, while targeting multiple wear-out mechanisms. Second, a technique for correlating complex software and micro-architectural events with power integrity loss is presented. The presented technique uses an embedded voltage noise sensor, a power-network model and a genetic algorithm for identifying workload that triggers power-network resonances which can ultimately lead to system failures. Third, the ‘PRiME’ cross-layer programming framework is presented that unites available sensors and dynamic-voltage and frequency scaling actuators with learning-based run-time process mapping and scheduling algorithms. Scenarios on exploring the energy efficiency and reliability of heterogeneous platforms using run-time software derived from the developed framework are also reviewed.
IEEE
Tenentes, Vasileios
1bff9ebc-9186-438b-850e-6c738994fa39
Leech, Charles
6ba70c54-3792-41cd-a8d6-9e8884ae004f
Bragg, Graeme
b5fd19b9-1a51-470b-a226-2d4dd5ff447a
Merrett, Geoffrey
89b3a696-41de-44c3-89aa-b0aa29f54020
Al-Hashimi, Bashir
0b29c671-a6d2-459c-af68-c4614dce3b5d
Amrouch, Hussam
b53bc8d6-34eb-4f5a-b432-060014577309
Henkel, Jörg
568d5d65-da9a-4ef1-be90-7c6e0b396700
Das, Shidhartha
c1e693af-261c-495d-8f0f-227396df0e3b
Tenentes, Vasileios
1bff9ebc-9186-438b-850e-6c738994fa39
Leech, Charles
6ba70c54-3792-41cd-a8d6-9e8884ae004f
Bragg, Graeme
b5fd19b9-1a51-470b-a226-2d4dd5ff447a
Merrett, Geoffrey
89b3a696-41de-44c3-89aa-b0aa29f54020
Al-Hashimi, Bashir
0b29c671-a6d2-459c-af68-c4614dce3b5d
Amrouch, Hussam
b53bc8d6-34eb-4f5a-b432-060014577309
Henkel, Jörg
568d5d65-da9a-4ef1-be90-7c6e0b396700
Das, Shidhartha
c1e693af-261c-495d-8f0f-227396df0e3b

Tenentes, Vasileios, Leech, Charles, Bragg, Graeme, Merrett, Geoffrey, Al-Hashimi, Bashir, Amrouch, Hussam, Henkel, Jörg and Das, Shidhartha (2017) Hardware and software innovations in energy-efficient system-reliability monitoring In IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems. IEEE. 5 pp.

Record type: Conference or Workshop Item (Paper)

Abstract

Many threats that can undermine the reliability of a system can be realized at design, while others only during its online operation. As the availability of system monitoring sensors and run-time software increases in heterogeneous platforms, there is a demand for a novel platform-independent framework that can capture and deliver, in a holistic way, system level self-assessment and adaptation capabilities at run-time. In this paper, two groups from academia and one from industry present the following three contributions. First, system reliability is considered from the perspective of novel timing guardband designs for aging mitigation. Effective timing guardband models are presented from the physical to the system level, while targeting multiple wear-out mechanisms. Second, a technique for correlating complex software and micro-architectural events with power integrity loss is presented. The presented technique uses an embedded voltage noise sensor, a power-network model and a genetic algorithm for identifying workload that triggers power-network resonances which can ultimately lead to system failures. Third, the ‘PRiME’ cross-layer programming framework is presented that unites available sensors and dynamic-voltage and frequency scaling actuators with learning-based run-time process mapping and scheduling algorithms. Scenarios on exploring the energy efficiency and reliability of heterogeneous platforms using run-time software derived from the developed framework are also reviewed.

Text Camera-Ready-DFT17 - Accepted Manuscript
Restricted to Repository staff only

More information

Accepted/In Press date: 2 August 2017

Identifiers

Local EPrints ID: 413034
URI: http://eprints.soton.ac.uk/id/eprint/413034
PURE UUID: 2c1046ab-6f02-49bd-ba65-16a337e830f7
ORCID for Charles Leech: ORCID iD orcid.org/0000-0002-2403-3873
ORCID for Graeme Bragg: ORCID iD orcid.org/0000-0002-5201-7977
ORCID for Geoffrey Merrett: ORCID iD orcid.org/0000-0003-4980-3894

Catalogue record

Date deposited: 14 Aug 2017 16:30
Last modified: 09 Nov 2017 17:30

Export record

Contributors

Author: Vasileios Tenentes
Author: Charles Leech ORCID iD
Author: Graeme Bragg ORCID iD
Author: Geoffrey Merrett ORCID iD
Author: Hussam Amrouch
Author: Jörg Henkel
Author: Shidhartha Das

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×