Hardware Assisted Fault Tolerance Using Reconfigurable Logic

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
Baloch 1MAPLD 2005/1024-L Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan 1,2.
Fault-Tolerant Systems Design Part 1.
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin, and Sule Ozev.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
(C) 2005 Daniel SorinDuke Computer Engineering Autonomic Computing via Dynamic Self-Repair Daniel J. Sorin Department of Electrical & Computer Engineering.
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Slide 1/20 Fault Tolerant Approaches to Nanoelectronic Programmable Logic Arrays Authors: Wenjing Rao, Alex Orailoglu, Ramesh Karri Conference: DSN 2007.
02/02/20091 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.
1 Chapter Fault Tolerant Design of Digital Systems.
2. Introduction to Redundancy Techniques Redundancy Implies the use of hardware, software, information, or time beyond what is needed for normal system.
1 FPGA Lab School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701, U.S.A. An Entropy-based Learning Hardware Organization.
1 Advanced Digital Design Asynchronous Design: Research Concept by A. Steininger and M. Delvai Vienna University of Technology.
Susmit Biswas A Pageable Defect Tolerant Nanoscale Memory System Susmit Biswas, Tzvetan S. Metodi, Frederic T. Chong, Ryan Kastner
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל Technion - Israel institute of technology department of Electrical Engineering Virtex II-PRO Dynamical.
1 Chapter 7 Design Implementation. 2 Overview 3 Main Steps of an FPGA Design ’ s Implementation Design architecture Defining the structure, interface.
Bitstream Relocation with Local Clock Domains for Partially Reconfigurable FPGAs Adam Flynn, Ann Gordon-Ross, Alan D. George NSF Center for High-Performance.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Rawad N. Al-Haddad, Carthik A. Sharma, Ronald F. DeMara University of Central Florida Performance Evaluation of Two Allocation Schemes for Combinatorial.
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.
Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors.
FORMAL VERIFICATION OF ADVANCED SYNTHESIS OPTIMIZATIONS Anant Kumar Jain Pradish Mathews Mike Mahar.
P173/MAPLD 2005 Swift1 Upset Susceptibility and Design Mitigation of PowerPC405 Processors Embedded in Virtex II-Pro FPGAs.
Fault-Tolerant Systems Design Part 1.
MAPLD 2005/202 Pratt1 Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael.
Synthesis Of Fault Tolerant Circuits For FSMs & RAMs Rajiv Garg Pradish Mathews Darren Zacher.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin September 7-9, 2005 P226-W/MAPLD2005 MIRCHANDANI 1.
CprE 458/558: Real-Time Systems
Introductory project. Development systems Design Entry –Foundation ISE –Third party tools Mentor Graphics: FPGA Advantage Celoxica: DK Design Suite Design.
FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.
Fault-Tolerant Systems Design Part 1.
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
Evaluating Logic Resources Utilization in an FPGA-Based TMR CPU
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
PARBIT Tool 1 PARBIT Partial Bitfile Configuration Tool Edson L. Horta Washington University, Applied Research Lab August 15, 2001.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Paper by F.L. Kastensmidt, G. Neuberger, L. Carro, R. Reis Talk by Nick Boyd 1.
Defect-tolerant FPGA Switch Block and Connection Block with Fine-grain Redundancy for Yield Enhancement Anthony J. YuGuy G.F. Lemieux August 25, 2005.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Chandrasekhar 1 MAPLD 2005/204 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Programmable Hardware: Hardware or Software?
Robust FPGA Resynthesis Based on Fault-Tolerant Boolean Matching
SEU Mitigation Techniques for Virtex FPGAs in Space Applications
Fault Tolerance In Operating System
Storage Virtualization
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
MAPLD 2005 BOF-L Mitigation Methods for
Matlab as a Development Environment for FPGA Design
Week 5, Verilog & Full Adder
Sequential circuits and Digital System Reliability
Mattan Erez The University of Texas at Austin July 2015
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Mi Zhou, Li-Hong Shang Yu Hu, Jing Zhang
Mark McKelvin EE249 Embedded System Design December 03, 2002
THE ECE 554 XILINX DESIGN PROCESS
FAULT-TOLERANT TECHNIQUES FOR NANOCOMPUTERS
THE ECE 554 XILINX DESIGN PROCESS
Seminar on Enterprise Software
Presentation transcript:

Hardware Assisted Fault Tolerance Using Reconfigurable Logic Ayse K. Coskun CSE 237A - Project 06.12.2004

Project Outline Motivation Hardware Fault Tolerance Techniques Fault Tolerance Design Flow Example Circuitry for Implementation Redundancy Fault Masking, Error Detection, Diagnosis Reconfiguration Eliminating faulty blocks Results & Discussion

Motivation Fault resilience is required at certain levels in each circuit High fault rates in VDSM & nanoscale devices Fault masking Transient (temporary) errors, single event upsets High clock rates, fast propagation of faults Further solutions needed for: Eliminating defects to increase manufacturing yield Eliminating permanent faults (run-time) Increasing device life-time

HW Assisted Fault Tolerance Pros Fast detection and recovery Transparent to user No other way present to build circuits with high reliability Cons Area overhead Timing overhead problem may be present Hard-real time applications Considerable design time & effort

Goals Study fault tolerance techniques and design flow Implement a redundancy based circuit Fault masking Error detection and diagnosis Recovery Practice reconfiguration on the circuit Mark off faulty blocks and reconfigure (off-line) Dynamic Reconfiguration - canceled

Fault Tolerance Design Flow

Example Circuitry for Implementation (VHDL)

Redundancy Triple Modular Redundancy (TMR) Fast fault masking & recovery for hard real-time and safety-critical applications Place voters at the outputs of every clocked block (No voting scheme for combinational circuits) Masking ability Only one copy is faulty More than one copy is faulty but errors are at different register locations Duplication can detect errors but cannot mask them Diagnosis and recovery Additional circuitry added for diagnosis and recovery

TMR with Roll-Forward Recovery

Fault Insertion & Diagnosis Adding MUXes at several points to force lines to faulty values ModelSim verification Diagnosis:

Xilinx RTL Schematic - Top level

Controller –RTL schematic

Reconfiguration Dynamic Reconfiguration: Off-line reconfiguration: Needs interface to load different configurations online to the chip Canceled because of complexity Off-line reconfiguration: Xilinx Area Constraints Editor Edit *.ucf file AREA_GROUP: includes selected instances of circuit INST “instance” AREA_GROUP="GROUP1"; ... AREA_GROUP "GROUP1" RANGE=SLICE_X0Y0:SLICE_X7Y35; AREA_GROUP "GROUP1" GROUP=CLOSED; AREA_GROUP "GROUP1" PLACE=OPEN;

Reconfiguration cont’d

Reconfiguration cont’d Common reconfiguration approaches: Tile based Column-based Hierarchical (column & row ) based Xilinx design flow for reconfiguration: VHDL Synthesis Translate Map Place&Route Edit Floorplan /*.ucf file  Back to Translate  ...

Before & After Reconfiguration

Evaluation and Discussion TMR with roll-forward has effective fault masking and recovery Column based reconfiguration does not add significant area overhead to TMR circuit Fault tolerant design has considerable design time and effort problem Development of automated FT design flow Fault diagnosis is also a bottleneck for large scale circuits

Summary Fault tolerance design flow Redundancy methods: Fault Masking Fault Detection Recovery TMR roll-forward implementation Reconfiguration: Dynamic / Off-line Off-line column-based implementation