Nov. 2006Reconfiguration and VotingSlide 1 Fault-Tolerant Computing Hardware Design Methods.

Slides:



Advertisements
Similar presentations
Digital Computer Fundamentals
Advertisements

Lecture 19: Parallel Algorithms
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Henry Hexmoor1 C hapter 4 Henry Hexmoor-- SIUC Rudimentary Logic functions: Value fixing Transferring Inverting.
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
Oct. 2007Fault MaskingSlide 1 Fault-Tolerant Computing Dealing with Low-Level Impairments.
Fault-Tolerant Systems Design Part 1.
Nov. 2005Math in ComputersSlide 1 Math in Computers A Lesson in the “Math + Fun!” Series.
Programmable Logic Devices
(C) 2005 Daniel SorinDuke Computer Engineering Autonomic Computing via Dynamic Self-Repair Daniel J. Sorin Department of Electrical & Computer Engineering.
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
Nov. 2007Self-Checking ModulesSlide 1 Fault-Tolerant Computing Hardware Design Methods.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Oct. 2007State-Space ModelingSlide 1 Fault-Tolerant Computing Motivation, Background, and Tools.
Oct State-Space Modeling Slide 1 Fault-Tolerant Computing Motivation, Background, and Tools.
Nov Malfunction Diagnosis and Tolerance Slide 1 Fault-Tolerant Computing Dealing with Mid-Level Impairments.
Oct. 2007Defect Avoidance and CircumventionSlide 1 Fault-Tolerant Computing Dealing with Low-Level Impairments.
Oct Error Correction Slide 1 Fault-Tolerant Computing Dealing with Mid-Level Impairments.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
Oct Combinational Modeling Slide 1 Fault-Tolerant Computing Motivation, Background, and Tools.
Dec. 2006Agreement and AdjudicationSlide 1 Fault-Tolerant Computing Software Design Methods.
1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,
Nov. 2006Hardware Implementation StrategiesSlide 1 Fault-Tolerant Computing Hardware Design Methods.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.
1 Parallel Algorithms III Topics: graph and sort algorithms.
Oct Fault Masking Slide 1 Fault-Tolerant Computing Dealing with Low-Level Impairments.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Oct Defect Avoidance and Circumvention Slide 1 Fault-Tolerant Computing Dealing with Low-Level Impairments.
Nov Malfunction Diagnosis and ToleranceSlide 1 Fault-Tolerant Computing Dealing with Mid-Level Impairments.
Nov. 2007Reconfiguration and VotingSlide 1 Fault-Tolerant Computing Hardware Design Methods.
Nov. 2007Agreement and AdjudicationSlide 1 Fault-Tolerant Computing Software Design Methods.
Oct. 2007Combinational ModelingSlide 1 Fault-Tolerant Computing Motivation, Background, and Tools.
CS-334: Computer Architecture
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Digital Logic Chapter 4 Presented by Prof Tim Johnson
Top Level View of Computer Function and Interconnection.
1 Dynamic Interconnection Networks Miodrag Bolic.
1 © 2015 B. Wilkinson Modification date: January 1, 2015 Designing combinational circuits Logic circuits whose outputs are dependent upon the values placed.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
1 COMP541 Combinational Logic - 4 Montek Singh Jan 30, 2012.
Combinational Design, Part 3: Functional Blocks
Fault-Tolerant Systems Design Part 1.
Computer Architecture Lecture 2 System Buses. Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given.
EEE440 Computer Architecture
CprE 458/558: Real-Time Systems
ECEG-3202 Computer Architecture and Organization Chapter 3 Top Level View of Computer Function and Interconnection.
FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.
Fault-Tolerant Systems Design Part 1.
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
Re-Configurable Byzantine Quorum System Lei Kong S. Arun Mustaque Ahamad Doug Blough.
Dr Mohamed Menacer College of Computer Science and Engineering, Taibah University CE-321: Computer.
Evaluating Logic Resources Utilization in an FPGA-Based TMR CPU
CS1Q Computer Systems Lecture 8
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Chapter 3 Top Level View of Computer Function and Interconnection
ECEG-3202 Computer Architecture and Organization
William Stallings Computer Organization and Architecture 7th Edition
Presentation transcript:

Nov. 2006Reconfiguration and VotingSlide 1 Fault-Tolerant Computing Hardware Design Methods

Nov. 2006Reconfiguration and VotingSlide 2 About This Presentation EditionReleasedRevised FirstNov This presentation has been prepared for the graduate course ECE 257A (Fault-Tolerant Computing) by Behrooz Parhami, Professor of Electrical and Computer Engineering at University of California, Santa Barbara. The material contained herein can be used freely in classroom teaching or any other educational setting. Unauthorized uses are prohibited. © Behrooz Parhami

Nov. 2006Reconfiguration and VotingSlide 3 Reconfiguration and Voting

Nov. 2006Reconfiguration and VotingSlide 4

Nov. 2006Reconfiguration and VotingSlide 5 Multilevel Model of Dependable Computing ComponentLogicServiceResultInformationSystem Level  Low-Level ImpairedMid-Level ImpairedHigh-Level ImpairedUnimpaired EntryLegend:DeviationRemedyTolerance IdealDefectiveFaultyErroneousMalfunctioningDegradedFailed

Nov. 2006Reconfiguration and VotingSlide 6 S D Requirements for Reconfiguration Overcoming the effect of interconnect failures requires the availability of multiple paths from each source to every possible destination A system consists of modular resources (processors, memory banks, disk storage,... ) and interconnects Redundant resources can mitigate the effect of module failures The main challenge of reconfiguration is dealing with interconnects Assumption: Module and interconnect failures are promptly diagnosed In graph-theoretic terms, we need “edge-disjoint” paths Existence of k edge-disjoint paths between two nodes provides the means for tolerating k – 1 link failures This particular interconnection scheme (torus) is 4-connected and tolerates 3 link/node losses without becoming disconnected

Nov. 2006Reconfiguration and VotingSlide 7 Reconfiguration via Programmable Connections Interconnection switch with 4 ports (horizontal lines) and 3 channels (vertical lines) Each port can be connected to 2 of the 3 channels If each module port were connected to every channel, the maximum flexibility would result (leads to complex hardware & control, though) The challenge lies in using more limited connections effectively

Nov. 2006Reconfiguration and VotingSlide 8 Multiple-Bus Multiprocessors Failed units can be isolated from the buses No single bus failure can isolate a module from the rest of the system The vertical channels may be viewed as buses and the heavy dots as controllable bus connections, making this method applicable to fault-tolerant multiprocessors Read / Write data Write enable Q FF Write path Read path Connection FF Bus If we have extra buses, then failure in the bus connection logic can be tolerated by avoiding the particular bus For reliability analysis, lump the failure rate of reconfiguration logic with that of its associated bus

Nov. 2006Reconfiguration and VotingSlide 9 Row/Column Bypassing in 2D Arrays Similar mechanisms needed for northward links in columns and for the eastward and westward links in rows Row i Bypass row i 0 1 Question: What types of mechanisms do we need at the edges of this array to allow the row and column edge connections to be directed to the appropriate (nonbypassed) rows and columns? Bypass row i – 1

Nov. 2006Reconfiguration and VotingSlide 10 Choosing the Rows/Columns to Bypass In the adjacent diagram, can we choose up to 2 rows and 2 columns so that they contain all the faults? Spare columns Spare rows Rows with faults Columns with faults Convert to graph problem (Kuo-Fuchs): Form bipartite graph, with nodes corresponding to faulty rows and columns Find a cover for the bipartite graph (set of nodes that touch every edge) In a large array, with r spare rows and c spare columns, what is the smallest number of faults that cannot be reconfigured around with row/column bypassing?

Nov. 2006Reconfiguration and VotingSlide 11 Switch Modules in FPGAs Interconnection switch with 8 ports and four connection choices for each port: 0 – No connection 1 – Straight through 2 – Right turn 3 – Left turn 8 control bits (why?)

Nov. 2006Reconfiguration and VotingSlide 12 An Array Reconfiguration Scheme Three-state switch

Nov. 2006Reconfiguration and VotingSlide 13 One-Track and Two-Track Switching Schemes One-track switching modelTwo-track switching model Source: S.-Y. Kung, S.-N. Jean, C.-W. Chang, IEEE TC, Vol. 38, pp , April 1989

Nov. 2006Reconfiguration and VotingSlide 14 What We Already Know About Voting Majority voting: Select the value that appears on at least  n/2  + 1 of the n inputs Number n of inputs is usually odd, but does not have to be Example: vote(1, 2, 3, 2, 2) = 2. x 1 x 2 x n Majority voter y Majority voters can be realized by means of comparators and muxes This design assumes that in the case of 3-way disagreement any one of the inputs can be chosen x1x1 x2x2 x3x3 y Compare 0101 Disagree Can add logic to this voter so that it signals three-way disagreement

Nov. 2006Reconfiguration and VotingSlide 15 There is More to Voting than Simple Majority Plurality voting: Select the value that appears on the largest number of inputs Example: vote(1, 3, 2, 3, 4) = 3 What should we take as the result of vote(1.00, 3.00, 0.99, 3.00, 1.01)?. x 1 x 2 x n Plurality voter y It would be reasonable to take 1.00 as the result, because 3 inputs agree or approximately agree with 1.00, while only 2 agree with 3.00 Will discuss approximate voting and a number of other sophisticated voting schemes under software design topics Median voting: one way to deal with approximate values median(1.00, 3.00, 0.99, 3.00, 1.01) = 1.01 Median value is equal to the majority value when we have majority

Nov. 2006Reconfiguration and VotingSlide 16 Threshold Voting and Its Generalizations Simple threshold (m-out-of-n) voting: Output is 1 if at least m of the n inputs are 1s Majority voting is a special case of threshold voting: (  n/2  + 1)-out-of-n voting Agreement or quorum sets {x 1, x 2 }, {x 2, x 3 }, {x 3, x 1 } – same as 2-out-of-3 {x 1, x 2 }, {x 1, x 3, x 4 }, {x 2, x 3, x 4 } Weighted threshold (w-out-of-  v i ) voting: Output is 1 if at least  v i x i is w or more. x 1 x 2 x n m Threshold gate y. x 1 x 2 x n w Threshold gate y v 1 v 2 v n The 2 nd example above is weighted voting with v 1 = v 2 = 2, v 3 = v 4 = 1, and threshold w = 4 Agreement sets are more general than weighted voting in the sense of some agreement sets not being realizable as weighted voting

Nov. 2006Reconfiguration and VotingSlide 17 Usefulness of Weighted Threshold Voting Unequal weights allow us to take different levels of input reliabilities into account Zero weights can be used to disable or purge some of the inputs (combined switch-voter). x 1 x 2 x n w Threshold gate y v 1 v 2 v n Maximum-likelihood voting prob{x 1 correct} = 0.9 prob{x 2 correct} = 0.9 prob{x 3 correct} = 0.8 prob{x 4 correct} = 0.7 prob{x 5 correct} = 0.6 Assume x 1 = x 3 = a, x 2 = x 4 = x 5 = b prob{a correct} = 0.9  0.1  0.8  0.3  0.4 = prob{b correct} = 0.1  0.9  0.2  0.7  0.6 = Max-likelihood voting can be implemented in hardware via table lookup or approximated by weighted threshold voting (otherwise, we need software)

Nov. 2006Reconfiguration and VotingSlide 18 Implementing Weighted Threshold Voters Example: Implement a 4-input threshold voter with v 1 = v 2 = 2, v 3 = v 4 = 1, and threshold w = 4 Strategy 1: If weights are small integers, fan-out each input an appropriate number of times and use a simple threshold voter x 1 x 2 x 3 x 4 4 Threshold gate y Strategy 3: Convert the problem to agreement sets (discussed next) Strategy 2: Use table lookup based on comparison results x 1 =x 2 x 1 =x 3 x 2 =x 3 x 3 =x 4 Result 1 x x x x Error x x 2 0 x x 0 Error Is this table complete? Why, or why not?

Nov. 2006Reconfiguration and VotingSlide 19 Implementing Agreement-Set Voters Example: Implement a voter corresponding to the agreement sets {x 1, x 2 }, {x 1, x 3, x 4 }, {x 2, x 3, x 4 } Strategy 1: Implement as weighted threshold voter, if possible Strategy 2: Implement directly Find a minimal set of comparators that determine the agreement set Complete this design by producing the “no agreement” signal x1=x2x1=x2 x1=x3x1=x3 x3=x4x3=x4 x2=x3x2=x3 x2x2 x1x1 0 1 y

Nov. 2006Reconfiguration and VotingSlide 20 Implementing Weighted Plurality Voters Inputs: Data, vote-weight pairs Output: Data with maximal vote and its associated vote tally Source: B. Parhami, IEEE Trans. Reliability, Vol. 40, No. 3, pp , August 1991 Sort by data Com- bine votes Select max vote  5, 1   4, 2   5, 3   7, 2   4, 1   5, 4   7, 2   5, 1   5, 3   4, 2   4, 1   7, 2   5, 4   5, 0   4, 3   4, 0  Phase 1Phase 2Phase 3 SorterCombinerSelector

Nov. 2006Reconfiguration and VotingSlide 21 5-Sorter5-Combiner5-Selector Details of a Sorting-Based Plurality Voter The first two phases (sorting and combining) can be merged, producing a 2-phase design – fewer, more complex cells lead to tradeoff Source: B. Parhami, IEEE Trans. Reliability, Vol. 40, No. 3, pp , August 1991 Stages of delay  5, 1   4, 2   5, 3   7, 2   4, 1   5, 3   7, 2   5, 1   7, 2   4, 2   5, 3  5, 4   5, 0   4, 3   4, 0   5, 4  4, 3 