The University of Southampton
University of Southampton Institutional Repository

Predicting potential failure in real-time through monitoring and detection of anomalous behaviour using hardware performance counters

Predicting potential failure in real-time through monitoring and detection of anomalous behaviour using hardware performance counters
Predicting potential failure in real-time through monitoring and detection of anomalous behaviour using hardware performance counters
Safety-critical embedded systems can be found in many application areas such as automotive control systems, medical devices, and nuclear systems. Failure in these systems can have catastrophic results and devastating effects on human lives and the surrounding environment. Variations in temperature and voltage, single event effects and component degradation are just some contributors that cause faults in these systems. Existing research into techniques that deal with errors due to the presence of faults has mostly focused on replication of hardware components, information redundancy or inclusion of additional components to perform self-testing. However, these techniques either have high overheads or are resource-intensive. This thesis presents a detection method that can predict potential failure in real-time by detecting a change in system behaviour using hardware performance counters that are readily available in a processor. The early detection and prediction algorithm consists of two main stages - one-step ahead prediction and anomaly classification. Evaluation on the early detection and prediction algorithm were performed on benchmarks that are perturbed by single bit flip faults.
The analysis on the early detection algorithm shows that it achieves 99.7% accuracy and earliest detection time was recorded at 325µs, which is less than a typical time to failure about 4,000µs. The proof of concept results show that the detector manages to detect when the system had started to behave anomalously and is able to stop execution before the system encounters a critical failure. Analyses on the performance and size of the detector show that the detector can be realised with minimal computational time and resources.
University of Southampton
Woo, Lai Leng
ee042648-77bc-4b5d-979e-a44b302a7ad9
Woo, Lai Leng
ee042648-77bc-4b5d-979e-a44b302a7ad9
Zwolinski, Mark
adfcb8e7-877f-4bd7-9b55-7553b6cb3ea0

Woo, Lai Leng (2019) Predicting potential failure in real-time through monitoring and detection of anomalous behaviour using hardware performance counters. University of Southampton, Doctoral Thesis, 185pp.

Record type: Thesis (Doctoral)

Abstract

Safety-critical embedded systems can be found in many application areas such as automotive control systems, medical devices, and nuclear systems. Failure in these systems can have catastrophic results and devastating effects on human lives and the surrounding environment. Variations in temperature and voltage, single event effects and component degradation are just some contributors that cause faults in these systems. Existing research into techniques that deal with errors due to the presence of faults has mostly focused on replication of hardware components, information redundancy or inclusion of additional components to perform self-testing. However, these techniques either have high overheads or are resource-intensive. This thesis presents a detection method that can predict potential failure in real-time by detecting a change in system behaviour using hardware performance counters that are readily available in a processor. The early detection and prediction algorithm consists of two main stages - one-step ahead prediction and anomaly classification. Evaluation on the early detection and prediction algorithm were performed on benchmarks that are perturbed by single bit flip faults.
The analysis on the early detection algorithm shows that it achieves 99.7% accuracy and earliest detection time was recorded at 325µs, which is less than a typical time to failure about 4,000µs. The proof of concept results show that the detector manages to detect when the system had started to behave anomalously and is able to stop execution before the system encounters a critical failure. Analyses on the performance and size of the detector show that the detector can be realised with minimal computational time and resources.

Text
Predicting Potential Failure in Real-Time through Monitoring and Detection of Anomalous Behaviour using Hardware Performance Counters - Version of Record
Available under License University of Southampton Thesis Licence.
Download (13MB)

More information

Published date: 9 December 2019

Identifiers

Local EPrints ID: 480830
URI: http://eprints.soton.ac.uk/id/eprint/480830
PURE UUID: 3789a29c-f103-4a3e-854f-0d387d84db42
ORCID for Lai Leng Woo: ORCID iD orcid.org/0000-0003-3313-6177
ORCID for Mark Zwolinski: ORCID iD orcid.org/0000-0002-2230-625X

Catalogue record

Date deposited: 10 Aug 2023 16:31
Last modified: 17 Mar 2024 02:35

Export record

Contributors

Author: Lai Leng Woo ORCID iD
Thesis advisor: Mark Zwolinski ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×