The University of Southampton
University of Southampton Institutional Repository

Hardware performance counters for system reliability monitoring

Hardware performance counters for system reliability monitoring
Hardware performance counters for system reliability monitoring
As technology scaling reaches nanometre scales,
the error rate due to variations in temperature and voltage,
single event effects and component degradation increases, making
components less reliable. In order to ensure a system continues
to function correctly while facing known reliability issues, it is
imperative that the system should have the means to detect the
occurrence of errors due to the presence of faults. A system that
behaves normally (no error detected in the system) exhibits a
profile, and any deviations from this profile indicate that there
is an anomaly in the system. In this paper, we propose to use
hardware performance counters (HPCs) to measure events that
occur during the execution of the program. We explore the
various counters available which could be use to identify the
anomalous behaviour in the system and develop a methodology
to observe the anomalies using HPCs by creating a faultfree
pattern and observing any subsequent changes in that
pattern. We evaluate the proposed technique using GemFI, an
architectural simulator based on Gem5 with additional fault
injection capabilities. We compare the results obtained at the
end of the execution with data collected during a time interval.
Our results show that HPCs can be used to identify anomalous
behaviour in a system that would lead to failure.
IEEE
Woo, Lai Leng
ee042648-77bc-4b5d-979e-a44b302a7ad9
Halak, Basel
8221f839-0dfd-4f81-9865-37def5f79f33
Zwolinski, Mark
adfcb8e7-877f-4bd7-9b55-7553b6cb3ea0
Woo, Lai Leng
ee042648-77bc-4b5d-979e-a44b302a7ad9
Halak, Basel
8221f839-0dfd-4f81-9865-37def5f79f33
Zwolinski, Mark
adfcb8e7-877f-4bd7-9b55-7553b6cb3ea0

Woo, Lai Leng, Halak, Basel and Zwolinski, Mark (2017) Hardware performance counters for system reliability monitoring. In 2nd International Verification and Security Workshop: IVSW 2017. IEEE.. (doi:10.1109/IVSW.2017.8031548).

Record type: Conference or Workshop Item (Paper)

Abstract

As technology scaling reaches nanometre scales,
the error rate due to variations in temperature and voltage,
single event effects and component degradation increases, making
components less reliable. In order to ensure a system continues
to function correctly while facing known reliability issues, it is
imperative that the system should have the means to detect the
occurrence of errors due to the presence of faults. A system that
behaves normally (no error detected in the system) exhibits a
profile, and any deviations from this profile indicate that there
is an anomaly in the system. In this paper, we propose to use
hardware performance counters (HPCs) to measure events that
occur during the execution of the program. We explore the
various counters available which could be use to identify the
anomalous behaviour in the system and develop a methodology
to observe the anomalies using HPCs by creating a faultfree
pattern and observing any subsequent changes in that
pattern. We evaluate the proposed technique using GemFI, an
architectural simulator based on Gem5 with additional fault
injection capabilities. We compare the results obtained at the
end of the execution with data collected during a time interval.
Our results show that HPCs can be used to identify anomalous
behaviour in a system that would lead to failure.

Text
main
Download (1MB)

More information

Published date: 3 July 2017

Identifiers

Local EPrints ID: 412724
URI: http://eprints.soton.ac.uk/id/eprint/412724
PURE UUID: 6b80a268-d336-42ba-a930-cf9e5bfde3bb
ORCID for Lai Leng Woo: ORCID iD orcid.org/0000-0003-3313-6177
ORCID for Basel Halak: ORCID iD orcid.org/0000-0003-3470-7226
ORCID for Mark Zwolinski: ORCID iD orcid.org/0000-0002-2230-625X

Catalogue record

Date deposited: 27 Jul 2017 16:30
Last modified: 16 Mar 2024 04:07

Export record

Altmetrics

Contributors

Author: Lai Leng Woo ORCID iD
Author: Basel Halak ORCID iD
Author: Mark Zwolinski ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×