The University of Southampton
University of Southampton Institutional Repository

Software modification aided transient error tolerance for embedded systems

Software modification aided transient error tolerance for embedded systems
Software modification aided transient error tolerance for embedded systems
Commercial off-the-shelf (COTS) components are increasingly being employed in embedded systems due to their high performance at low cost. With emerging reliability requirements, design of these components using traditional hardware redundancy incur large overheads, time-demanding re-design and validation. To reduce the design time with shorter time-to-market requirements, software-only reliable design techniques can provide with an effective and low-cost alternative. This paper presents a novel, architecture-independent software modification tool, SMART (Software Modification Aided transient eRror Tolerance) for effective error detection and tolerance. To detect transient errors in processor datapath, control flow and memory at reasonable system overheads, the tool incorporates selective and non-intrusive data duplication and dynamic signature comparison. Also, to mitigate the impact of the detected errors, it facilitates further software modification implementing software-based check-pointing. Due to automatic software based source-to-source modification tailored to a given reliability requirement, the tool requires no re-design effort, hardware- or compiler-level intervention. We evaluate the effectiveness of the tool using a Xentium R processor based system as a case study of COTS based systems. Using various benchmark applications with single-event upset (SEUs) based error model, we show that up to 91% of the errors can be detected or masked with reasonable performance, energy and memory footprint overheads.
fault tolerance, error detection, reliable computing, embedded systems
Shafik, Rishad Ahmed
aa0bdafc-b022-4cb2-a8ef-4bf8a03ba524
Rauwerda, Gerard
1a2aded0-eaaf-4d7e-afc4-60f2bb965f94
Potman, Jordy
7d763534-4e67-4746-8982-d534ce87db3d
Sunesen, Kim
a47ee18a-288b-44a8-aa2e-749db07a54c5
Pradhan, Dhiraj K.
14f13d30-42ec-43bf-941b-3116a7f803fc
Mathew, Jimson
156eec1e-d690-43eb-a72f-daefd8b04144
Sourdis, Ioannis
d89e234b-9d13-4b80-9c5c-9a1fbe454922
Shafik, Rishad Ahmed
aa0bdafc-b022-4cb2-a8ef-4bf8a03ba524
Rauwerda, Gerard
1a2aded0-eaaf-4d7e-afc4-60f2bb965f94
Potman, Jordy
7d763534-4e67-4746-8982-d534ce87db3d
Sunesen, Kim
a47ee18a-288b-44a8-aa2e-749db07a54c5
Pradhan, Dhiraj K.
14f13d30-42ec-43bf-941b-3116a7f803fc
Mathew, Jimson
156eec1e-d690-43eb-a72f-daefd8b04144
Sourdis, Ioannis
d89e234b-9d13-4b80-9c5c-9a1fbe454922

Shafik, Rishad Ahmed, Rauwerda, Gerard, Potman, Jordy, Sunesen, Kim, Pradhan, Dhiraj K., Mathew, Jimson and Sourdis, Ioannis (2013) Software modification aided transient error tolerance for embedded systems. 16th Euromicro Conference on Digital System Design (Euromicro DSD/SEAA 2013, Santander, Spain. 04 - 06 Sep 2013. 8 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

Commercial off-the-shelf (COTS) components are increasingly being employed in embedded systems due to their high performance at low cost. With emerging reliability requirements, design of these components using traditional hardware redundancy incur large overheads, time-demanding re-design and validation. To reduce the design time with shorter time-to-market requirements, software-only reliable design techniques can provide with an effective and low-cost alternative. This paper presents a novel, architecture-independent software modification tool, SMART (Software Modification Aided transient eRror Tolerance) for effective error detection and tolerance. To detect transient errors in processor datapath, control flow and memory at reasonable system overheads, the tool incorporates selective and non-intrusive data duplication and dynamic signature comparison. Also, to mitigate the impact of the detected errors, it facilitates further software modification implementing software-based check-pointing. Due to automatic software based source-to-source modification tailored to a given reliability requirement, the tool requires no re-design effort, hardware- or compiler-level intervention. We evaluate the effectiveness of the tool using a Xentium R processor based system as a case study of COTS based systems. Using various benchmark applications with single-event upset (SEUs) based error model, we show that up to 91% of the errors can be detected or masked with reasonable performance, energy and memory footprint overheads.

Text
dsd2013-camera-ready.pdf - Accepted Manuscript
Download (3MB)

More information

Published date: 4 September 2013
Venue - Dates: 16th Euromicro Conference on Digital System Design (Euromicro DSD/SEAA 2013, Santander, Spain, 2013-09-04 - 2013-09-06
Keywords: fault tolerance, error detection, reliable computing, embedded systems
Organisations: Electronic & Software Systems

Identifiers

Local EPrints ID: 355294
URI: http://eprints.soton.ac.uk/id/eprint/355294
PURE UUID: 944e1348-5cab-42bc-9478-6d12b5cc3ea3

Catalogue record

Date deposited: 28 Aug 2013 11:16
Last modified: 14 Mar 2024 14:31

Export record

Contributors

Author: Rishad Ahmed Shafik
Author: Gerard Rauwerda
Author: Jordy Potman
Author: Kim Sunesen
Author: Dhiraj K. Pradhan
Author: Jimson Mathew
Author: Ioannis Sourdis

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×