The University of Southampton
University of Southampton Institutional Repository

Fault tolerance & error monitoring techniques for cost constrained systems

Fault tolerance & error monitoring techniques for cost constrained systems
Fault tolerance & error monitoring techniques for cost constrained systems
With technology scaling, the reliability of circuits is becoming a growing concern. The appearance of logic errors in-the-field caused by faults escaping manufacturing testing, single-event upsets, aging, or process variations is increasing. Traditional techniques for online testing and circuit protection often require a high design effort or result in high area overhead and power consumption and are unsuitable for low cost systems.

This thesis presents three original contributions in the form of low cost techniques for online error detection and protection in cost constrained systems. The first contribution consists on low cost fault tolerance design technique that protects the most susceptible workload on the most susceptible logic cones of a circuit, by targeting both timing independent and timing-dependent errors. The susceptible workload is protected by a partial Triple Modular Redundancy (TMR) scheme. Protecting the 32 most susceptible patterns, an average error coverage improvement of 63.5% and 58.2% against errors induced by stuck-at and transition faults is achieved, respectively, compared an unranked pattern selection and protection. Additionally, this technique produces an average error coverage improvement of 163% and 96% against temporary erroneous output transition and errors induced by bit-flips, respectively. These error coverage improvements incur in an area/power cost in the range of 18.0-54.2%, a 145.8-182.0% reduction compared to TMR. The second contribution proposes a low cost probabilistic online error monitoring technique that produces an alarm signal when systematic erroneous behaviour has occurred over a pre-defined time interval. To detect systematic erroneous behaviour, the collected data is compared on-chip against the signature of error-free behaviour. Results demonstrate on the largest circuits, an average error coverage of 84.4% and 73.1% of errors induced by bit-flips and stuck-at faults, respectively, with an average area cost of 1.66%. The final contribution consists of a circuit approximation technique that can be used for low cost non-intrusive fault tolerance and concurrent error detection, based on finding functionality at the logic level that behaves similarly to single logic gates or constant values. An algorithm is proposed to select the input subsets to approximate.

Results show an average coverage of 33.59% of all the input space with an average 7.43% area cost. Using this approximate circuits in a reduced TMR scheme results in significant area cost reductions compared to existing techniques.
University of Southampton
Gutierrez Alcala, Mauricio Daniel
29838e60-f993-48e7-8133-0d0ff32129fa
Gutierrez Alcala, Mauricio Daniel
29838e60-f993-48e7-8133-0d0ff32129fa
Kazmierski, Tomasz
a97d7958-40c3-413f-924d-84545216092a

Gutierrez Alcala, Mauricio Daniel (2017) Fault tolerance & error monitoring techniques for cost constrained systems. University of Southampton, Doctoral Thesis, 132pp.

Record type: Thesis (Doctoral)

Abstract

With technology scaling, the reliability of circuits is becoming a growing concern. The appearance of logic errors in-the-field caused by faults escaping manufacturing testing, single-event upsets, aging, or process variations is increasing. Traditional techniques for online testing and circuit protection often require a high design effort or result in high area overhead and power consumption and are unsuitable for low cost systems.

This thesis presents three original contributions in the form of low cost techniques for online error detection and protection in cost constrained systems. The first contribution consists on low cost fault tolerance design technique that protects the most susceptible workload on the most susceptible logic cones of a circuit, by targeting both timing independent and timing-dependent errors. The susceptible workload is protected by a partial Triple Modular Redundancy (TMR) scheme. Protecting the 32 most susceptible patterns, an average error coverage improvement of 63.5% and 58.2% against errors induced by stuck-at and transition faults is achieved, respectively, compared an unranked pattern selection and protection. Additionally, this technique produces an average error coverage improvement of 163% and 96% against temporary erroneous output transition and errors induced by bit-flips, respectively. These error coverage improvements incur in an area/power cost in the range of 18.0-54.2%, a 145.8-182.0% reduction compared to TMR. The second contribution proposes a low cost probabilistic online error monitoring technique that produces an alarm signal when systematic erroneous behaviour has occurred over a pre-defined time interval. To detect systematic erroneous behaviour, the collected data is compared on-chip against the signature of error-free behaviour. Results demonstrate on the largest circuits, an average error coverage of 84.4% and 73.1% of errors induced by bit-flips and stuck-at faults, respectively, with an average area cost of 1.66%. The final contribution consists of a circuit approximation technique that can be used for low cost non-intrusive fault tolerance and concurrent error detection, based on finding functionality at the logic level that behaves similarly to single logic gates or constant values. An algorithm is proposed to select the input subsets to approximate.

Results show an average coverage of 33.59% of all the input space with an average 7.43% area cost. Using this approximate circuits in a reduced TMR scheme results in significant area cost reductions compared to existing techniques.

Text
Final Thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (6MB)

More information

Published date: September 2017

Identifiers

Local EPrints ID: 415792
URI: http://eprints.soton.ac.uk/id/eprint/415792
PURE UUID: 0d2b6498-67fa-4d5f-a757-e4ae0dac1105

Catalogue record

Date deposited: 24 Nov 2017 17:30
Last modified: 13 Mar 2019 19:12

Export record

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×