Predicting potential failure in real-time through monitoring and detection of anomalous behaviour using hardware performance counters
Predicting potential failure in real-time through monitoring and detection of anomalous behaviour using hardware performance counters
Safety-critical embedded systems can be found in many application areas such as automotive control systems, medical devices, and nuclear systems. Failure in these systems can have catastrophic results and devastating effects on human lives and the surrounding environment. Variations in temperature and voltage, single event effects and component degradation are just some contributors that cause faults in these systems. Existing research into techniques that deal with errors due to the presence of faults has mostly focused on replication of hardware components, information redundancy or inclusion of additional components to perform self-testing. However, these techniques either have high overheads or are resource-intensive. This thesis presents a detection method that can predict potential failure in real-time by detecting a change in system behaviour using hardware performance counters that are readily available in a processor. The early detection and prediction algorithm consists of two main stages - one-step ahead prediction and anomaly classification. Evaluation on the early detection and prediction algorithm were performed on benchmarks that are perturbed by single bit flip faults.
The analysis on the early detection algorithm shows that it achieves 99.7% accuracy and earliest detection time was recorded at 325µs, which is less than a typical time to failure about 4,000µs. The proof of concept results show that the detector manages to detect when the system had started to behave anomalously and is able to stop execution before the system encounters a critical failure. Analyses on the performance and size of the detector show that the detector can be realised with minimal computational time and resources.
University of Southampton
Woo, Lai Leng
ee042648-77bc-4b5d-979e-a44b302a7ad9
9 December 2019
Woo, Lai Leng
ee042648-77bc-4b5d-979e-a44b302a7ad9
Zwolinski, Mark
adfcb8e7-877f-4bd7-9b55-7553b6cb3ea0
Woo, Lai Leng
(2019)
Predicting potential failure in real-time through monitoring and detection of anomalous behaviour using hardware performance counters.
University of Southampton, Doctoral Thesis, 185pp.
Record type:
Thesis
(Doctoral)
Abstract
Safety-critical embedded systems can be found in many application areas such as automotive control systems, medical devices, and nuclear systems. Failure in these systems can have catastrophic results and devastating effects on human lives and the surrounding environment. Variations in temperature and voltage, single event effects and component degradation are just some contributors that cause faults in these systems. Existing research into techniques that deal with errors due to the presence of faults has mostly focused on replication of hardware components, information redundancy or inclusion of additional components to perform self-testing. However, these techniques either have high overheads or are resource-intensive. This thesis presents a detection method that can predict potential failure in real-time by detecting a change in system behaviour using hardware performance counters that are readily available in a processor. The early detection and prediction algorithm consists of two main stages - one-step ahead prediction and anomaly classification. Evaluation on the early detection and prediction algorithm were performed on benchmarks that are perturbed by single bit flip faults.
The analysis on the early detection algorithm shows that it achieves 99.7% accuracy and earliest detection time was recorded at 325µs, which is less than a typical time to failure about 4,000µs. The proof of concept results show that the detector manages to detect when the system had started to behave anomalously and is able to stop execution before the system encounters a critical failure. Analyses on the performance and size of the detector show that the detector can be realised with minimal computational time and resources.
Text
Predicting Potential Failure in Real-Time through Monitoring and Detection of Anomalous Behaviour using Hardware Performance Counters
- Version of Record
More information
Published date: 9 December 2019
Identifiers
Local EPrints ID: 480830
URI: http://eprints.soton.ac.uk/id/eprint/480830
PURE UUID: 3789a29c-f103-4a3e-854f-0d387d84db42
Catalogue record
Date deposited: 10 Aug 2023 16:31
Last modified: 17 Mar 2024 02:35
Export record
Contributors
Author:
Lai Leng Woo
Thesis advisor:
Mark Zwolinski
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics