Hardware and software innovations in energy-efficient system-reliability monitoring
Hardware and software innovations in energy-efficient system-reliability monitoring
Many threats that can undermine the reliability of a system can be realized at design, while others only during its online operation. As the availability of system monitoring sensors and run-time software increases in heterogeneous platforms, there is a demand for a novel platform-independent framework that can capture and deliver, in a holistic way, system level self-assessment and adaptation capabilities at run-time. In this paper, two groups from academia and one from industry present the following three contributions. First, system reliability is considered from the perspective of novel timing guardband designs for aging mitigation. Effective timing guardband models are presented from the physical to the system level, while targeting multiple wear-out mechanisms. Second, a technique for correlating complex software and micro-architectural events with power integrity loss is presented. The presented technique uses an embedded voltage noise sensor, a power-network model and a genetic algorithm for identifying workload that triggers power-network resonances which can ultimately lead to system failures. Third, the ‘PRiME’ cross-layer programming framework is presented that unites available sensors and dynamic-voltage and frequency scaling actuators with learning-based run-time process mapping and scheduling algorithms. Scenarios on exploring the energy efficiency and reliability of heterogeneous platforms using run-time software derived from the developed framework are also reviewed.
Tenentes, Vasileios
1bff9ebc-9186-438b-850e-6c738994fa39
Leech, Charles
6ba70c54-3792-41cd-a8d6-9e8884ae004f
Bragg, Graeme
b5fd19b9-1a51-470b-a226-2d4dd5ff447a
Merrett, Geoffrey
89b3a696-41de-44c3-89aa-b0aa29f54020
Al-Hashimi, Bashir
0b29c671-a6d2-459c-af68-c4614dce3b5d
Amrouch, Hussam
b53bc8d6-34eb-4f5a-b432-060014577309
Henkel, Jörg
568d5d65-da9a-4ef1-be90-7c6e0b396700
Das, Shidhartha
c1e693af-261c-495d-8f0f-227396df0e3b
Tenentes, Vasileios
1bff9ebc-9186-438b-850e-6c738994fa39
Leech, Charles
6ba70c54-3792-41cd-a8d6-9e8884ae004f
Bragg, Graeme
b5fd19b9-1a51-470b-a226-2d4dd5ff447a
Merrett, Geoffrey
89b3a696-41de-44c3-89aa-b0aa29f54020
Al-Hashimi, Bashir
0b29c671-a6d2-459c-af68-c4614dce3b5d
Amrouch, Hussam
b53bc8d6-34eb-4f5a-b432-060014577309
Henkel, Jörg
568d5d65-da9a-4ef1-be90-7c6e0b396700
Das, Shidhartha
c1e693af-261c-495d-8f0f-227396df0e3b
Tenentes, Vasileios, Leech, Charles, Bragg, Graeme, Merrett, Geoffrey, Al-Hashimi, Bashir, Amrouch, Hussam, Henkel, Jörg and Das, Shidhartha
(2017)
Hardware and software innovations in energy-efficient system-reliability monitoring.
In IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems.
IEEE.
5 pp
.
(In Press)
(doi:10.1109/DFT.2017.8244435).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Many threats that can undermine the reliability of a system can be realized at design, while others only during its online operation. As the availability of system monitoring sensors and run-time software increases in heterogeneous platforms, there is a demand for a novel platform-independent framework that can capture and deliver, in a holistic way, system level self-assessment and adaptation capabilities at run-time. In this paper, two groups from academia and one from industry present the following three contributions. First, system reliability is considered from the perspective of novel timing guardband designs for aging mitigation. Effective timing guardband models are presented from the physical to the system level, while targeting multiple wear-out mechanisms. Second, a technique for correlating complex software and micro-architectural events with power integrity loss is presented. The presented technique uses an embedded voltage noise sensor, a power-network model and a genetic algorithm for identifying workload that triggers power-network resonances which can ultimately lead to system failures. Third, the ‘PRiME’ cross-layer programming framework is presented that unites available sensors and dynamic-voltage and frequency scaling actuators with learning-based run-time process mapping and scheduling algorithms. Scenarios on exploring the energy efficiency and reliability of heterogeneous platforms using run-time software derived from the developed framework are also reviewed.
Text
Camera-Ready-DFT17
- Accepted Manuscript
Restricted to Repository staff only
Request a copy
More information
Accepted/In Press date: 2 August 2017
Identifiers
Local EPrints ID: 413034
URI: http://eprints.soton.ac.uk/id/eprint/413034
PURE UUID: 2c1046ab-6f02-49bd-ba65-16a337e830f7
Catalogue record
Date deposited: 14 Aug 2017 16:30
Last modified: 16 Mar 2024 04:29
Export record
Altmetrics
Contributors
Author:
Vasileios Tenentes
Author:
Charles Leech
Author:
Graeme Bragg
Author:
Geoffrey Merrett
Author:
Bashir Al-Hashimi
Author:
Hussam Amrouch
Author:
Jörg Henkel
Author:
Shidhartha Das
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics