Fault-Tolerant Clustering for FPGAs

Slides:

Advertisements

Similar presentations

Interconnect Testing in Cluster Based FPGA Architectures Research by Ian G.Harris and Russel Tessier University of Massachusetts. Presented by Alpha Oumar.

Advertisements

Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs

Heuristic Search techniques

Online Algorithms for Network Design Adam Meyerson UCLA.

Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.

Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich VLSI CAD Lab Computer Science Department University of California,

Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.

1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.

Palette: Distributing Tables in Software-Defined Networks Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay.

Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.

Efficient On-line Interconnect BIST in FPGAs with Provable Detectability for Multiple Faults Vishal Suthar and Shantanu Dutt Dept. of ECE University of.

Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.

SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.

The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.

A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu

Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.

Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing

More on Exponential Distribution, Hypo exponential distribution

Hidden Markov Models BMI/CS 576

Reducing Structural Bias in Technology Mapping

The NP class. NP-completeness

Abstract In this paper, the k-coverage problem is formulated as a decision problem, whose goal is to determine whether every point in the service area.

Complexity of Determining Nonemptiness of the Core

Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.

Software Testing and Maintenance 1

MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.

Distribution of the Sample Means

ElasticTree Michael Fruchtman.

Subject Name: File Structures

Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University

Polynomial-time approximation schemes for NP-hard geometric problems

Review Graph Directed Graph Undirected Graph Sub-Graph

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Storage Virtualization

Routing: Distance Vector Algorithm

CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.

Craig Schroeder October 26, 2004

ESE534: Computer Organization

Standard-Cell Mapping Revisited

Instructor: Shengyu Zhang

Unit-2 Divide and Conquer

Sheqin Dong, Song Chen, Xianlong Hong EDA Lab., Tsinghua Univ. Beijing

Redundancy-Aware, Fault-Tolerant Clustering

B- Trees D. Frey with apologies to Tom Anastasio

An Overview of Insertion Sort

Indexing and Hashing Basic Concepts Ordered Indices

B- Trees D. Frey with apologies to Tom Anastasio

Lectures on Graph Algorithms: searching, testing and sorting

Introduction Wireless Ad-Hoc Network

Mi Zhou, Li-Hong Shang Yu Hu, Jing Zhang

B- Trees D. Frey with apologies to Tom Anastasio

Advanced Implementation of Tables

Automatic Test Pattern Generation

Improvements in FPGA Technology Mapping

Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.

ECE 352 Digital System Fundamentals

Data Structures Unsorted Arrays

Trevor Brown DC 2338, Office hour M3-4pm

ECE 352 Digital System Fundamentals

Approximation Algorithms for the Selection of Robust Tag SNPs

CS137: Electronic Design Automation

ECE 352 Digital System Fundamentals

CprE / ComS 583 Reconfigurable Computing

ECE 352 Digital System Fundamentals

ECE 352 Digital System Fundamentals

Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.

Lecture-Hashing.

Design Principles of Scalable Switching Networks

Reconfigurable Computing (EN2911X, Fall07)

Reconfigurable Computing (EN2911X, Fall07)

Presentation transcript:

Fault-Tolerant Clustering for FPGAs Jason Cong and Brian Tagiku VLSI CAD Laboratory Computer Science Department University of California, Los Angeles {cong,btagiku}@cs.ucla.edu http://cadlab.cs.ucla.edu/ Good Afternoon everyone, my name is Brian Tagiku. I am one of Professor Cong’s students at UCLA. Together we’ve been looking at fault-tolerant methods for two-level Hierarchical FPGAs. In particular, we’ve been considering both fault-tolerant clustering of LUT networks as well as fault-tolerant reconfiguration.

Outline Background Problem Model and Formulation Fault-Tolerant Clustering Fault Assignment and Reconfiguration Future Work Before we begin here’s a brief overview of what I’ll talk about. I’ll start by explaining our problem model and other preliminaries. Then I’ll first talk about our work in fault-tolerant clustering followed by a short discussion of fault assignment. Finally I’ll wrap up by discussing possibilities for future work. 12/30/2018 UCLA VLSICAD LAB

Previous Work Flat (Non-Hierarchical) FPGAs Hierarchical FPGAs Hatori et al. (Toshiba, 1993) – Spare rows of CLBs Howard et al. (Univ. of York, 1994) – Spare “blocks” of CLBs Hanchek and Dutt (Intel/UIUC, 1996) – Node Covering, each CLB assigned a node to “cover” Lach et al. (UCLA, 98) – Tiling, FPGA partitioned into tiles and alternate configurations for each tile precomputed Hierarchical FPGAs Lakamraju and Tessier (Univ. of Mass., 2000) – Spare elements in each block level Redundancy in FPGAs first began in 1993 when a group at Toshiba published a paper proposing the usage of spares rows of CLBs. In 1994, Howard et al. proposed to group CLBs not by row, but in more general rectangular blocks. This improved upon Toshiba’s work because it required less area overhead. Hanchek and Dutt came up with the concept of node covering in 1996. In this case, each CLB is assigned an adjacent CLB to “cover”. When a fault occurs at a CLB, it’s functionality is shifted down a chain of covers until a spare CLB is reached. The advantage here is that this method accommodates dynamic reconfiguration and more faults can be tolerated. Lach et al. proposed a tiling-based method in 1998. This method partitions CLBs into tiles, then precomputes alternate configurations for each tile. Thus, when a failure occurs, the tile can be reconfigured to use a compatible configuration. As far as hierarchical FPGAs go, very little work has been done. Lakamraju and Tessier propose a simple fault-tolerant scheme in 2000. Here, they simply propose to insert a new column of spare elements in each level of the hierarchy. While they do not show other spare allocation methodologies, they do illustrate the gains to be made from fault-tolerance in hierarchical FPGAs 12/30/2018 UCLA VLSICAD LAB

Related Work Fault covering in memory arrays Spare row and columns available Must use spares to cover entire row or column in which faults occur Difficulty lies in finding a set of covering rows and columns Comparison to fault tolerance in FPGAs A set of spares to cover faults is easy to find Difficulty is finding a set that allows a target delay to be met 12/30/2018 UCLA VLSICAD LAB

Hierarchical FPGAs 2 level, hierarchical circuit logic Level 0 Blocks – LUTs Level 1 Blocks – Clusters of LUTs Uses locality of interconnections to improve circuit performance Now, as I said before, we’re interested in fault-tolerance in two-level hierarchical FPGAs. This means, we have level 0 blocks which we’ll assume are simply LUTs and level 1 blocks which are clusters of level-0 blocks. The idea here is that intracluster interconnects (or ones that remain in the same cluster) have a very short propagation delay relative to intercluster edges (ones that cross clusters). So we’d like to somehow utilize the locality of these interconnections to improve circuit performance. 12/30/2018 UCLA VLSICAD LAB

Redundancy in FPGAs LUTs can fail with some probability Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase Now the problem is that manufacturing never goes perfectly, especially if we’re working with nanoscale devices… 12/30/2018 UCLA VLSICAD LAB

Redundancy in FPGAs LUTs can fail with some probability Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase So some of our LUTs will be defective and this is bad if we’ve mapped our circuit through this LUT as shown here… 12/30/2018 UCLA VLSICAD LAB

Redundancy in FPGAs LUTs can fail with some probability Allocate extra components (e.g. LUTs) Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase However, by allocating spare LUTs within the clusters in a smart way… 12/30/2018 UCLA VLSICAD LAB

Redundancy in FPGAs LUTs can fail with some probability Allocate extra components (e.g. LUTs) into the system Recover from defects by using spare LUTs Ideally, want the spare LUT to be close to the failure so that delay does not increase we will be able to recover from the defect gracefully. Notice here, we’ve moved the circuitry from this LUT to this one and each path in the circuit still passes through the same number of intercluster edges. 12/30/2018 UCLA VLSICAD LAB

Redundancy in FPGAs LUTs can fail with some probability Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase But if our spares are not allocated correctly, then our circuit performance can degrade significantly. 12/30/2018 UCLA VLSICAD LAB

Fault Tolerant Clustering Inputs A DAG G (LUT Netlist) An HFPGA with k clusters of c LUTs Inter/Intracluster edge delays Probability of LUT defects Target delay D Goal Map G into the HFPGA to maximize probability of achieving delay D after reconfiguration of failures A B C D A C B D So what we should be concerned about is how to place our circuit into the FPGA so that spares are allocated smartly. This is what we call the fault tolerant clustering problem. The inputs to this problem are a DAG G (such as this graph here), an HFPGA chip with k clusters each of c LUTs (here, this HFPGA has 2 clusters of 3 LUTs), delays between LUTs, the probabilities of each LUT having a defect and some target delay D. The goal is to map our circuit into the FPGA so that over all possible defects, we maximize the probability that we can reconfigure the circuit and still meet the target delay D. Let’s consider this example here 12/30/2018 UCLA VLSICAD LAB

Motivational Example Probability of LUT failure = 0.1 Maximum # intercluster edges along path Probability 1 0.89 2 0.09 failure 0.02 Maximum # intercluster edges along path Probability 1 0.97 2 0.01 failure 0.02 Assume that each LUT can fail with probability 0.1 and consider the following two clusterings. It turns out that in the left clustering, about 89% of the time, we can expect to be able to reconfigure the circuit so that each path has only 1 intercluster edge. 9% of the time, we can reconfigure the circuit so that each path has at most 2 intercluster edges. Finally, 2% of the time we won’t be able to reconfigure the circuit at all due to too many failures. Notice that the right clustering does significantly better as it can guarantee that 97% of the time each path will have only 1 intercluster edge. 12/30/2018 UCLA VLSICAD LAB

Dynamic Programming Heuristic Use a dynamic programming matrix A Each entry A[i,j,k] stores a clustering solution of node i and its predecessors such that Exactly j clusters are used Arrival time at i is at most k The probability of achieving delay k is maximized Allows node duplication Assumes constant fan-in To address this problem, we designed a heuristic that uses a dynamic programming based approach. This heuristic essentially stores the “best” clustering solution of each node and its predecessors that uses a specific number of clusters. So the (i,j,k)-th clustering is a clustering of node I and its predecessors, uses exactly j clusters, achieves arrival time k or better at node I, and maximizes the probability of achieving arrival time k after reconfiguration. This heuristic does assume that node duplication is allowed and that each node has a constant fan-in. 12/30/2018 UCLA VLSICAD LAB

DP Heuristic Performance 10% failure rate Intracluster edge delay 0 Intercluster edge delay 3 LUT delay 1 8 clusters each of 3 LUTs Target delay of 7 We tested this heuristic on a very small example with these parameters… 12/30/2018 UCLA VLSICAD LAB

DP Heuristic Performance And got these clusterings. The left clustering is a standard min-delay clustering whereas the right is the clustering given by our heuristic. Note that our heuristic achieves the target delay an additional 10% more than the min-delay clustering. However, we’re still investigating this heuristic’s performance over larger circuits. Min-delay clustering Achieves delay 7 with probability ≈ 0.28 DP clustering Achieves delay 7 with probability ≈ 0.39 12/30/2018 UCLA VLSICAD LAB

Difficulties Best known algorithm for calculating probability distribution of delays is exponential Doesn’t specify how to reconfigure a circuit There are a few difficulties associated with the clustering problem that we should point out. First is that if you are given a clustering solution, calculating the probability that the circuit can be reconfigured to achieve a target delay is hard. The best known algorithms for doing this in general are exponential time. Thus, it can be very hard to judge the actual performance of an algorithm on large circuits. Secondly, this problem addresses the allocation of spares in the FPGA, but doesn’t specify how to actually reconfigure the circuit around failures. The problem of assigning failures to spare resources is what we call the Failure Assignment problem. 12/30/2018 UCLA VLSICAD LAB

Failure Assignment Inputs Goal A DAG G An HFPGA with k clusters of c LUTs Inter/Intracluster edge delays Target delay D A mapping of G into the FPGA A set of failed LUTs Goal Reassign failed LUTs to spare LUTs so that the delay D is still met. In the failure assignment problem, we are given, among other things, a mapping of the circuit into the FPGA. We are also given a set of defective LUTs. Our job is to assign defective LUTs to spare LUTs so that the circuit can function correctly and so that the target delay is met. 12/30/2018 UCLA VLSICAD LAB

More Difficulties Failure Assignment is NP-Complete Even with fixed cluster sizes c ≥ 4 Even if spares are guaranteed to be non-defective Even if we are guaranteed at least m spares per cluster Even if no more than P percent of the LUTs fail These results seem to imply that Fault Tolerant Clustering is also a hard problem Unfortunately, the Failure Assignment problem is NP-Complete in general and in several other variations we’ve considered. So this seems to be a very difficult problem in itself. Moreover, since this problem is very closely related to clustering, it seems that clustering may be just as hard if not harder. 12/30/2018 UCLA VLSICAD LAB

Online Failure Assignment Defects and faults announced online (one at a time) We must assign a fault to a spare when announced We cannot change our mind at a later time Related Problems Online Routing Faults need to be connected to spares (log n)-competitive algorithm known Online Bipartite Matching Faults need to be matched to spares O(log3 n) randomized algorithm for metric cases known Despite this, we would like some way to address the problem practically. 12/30/2018 UCLA VLSICAD LAB

Future Work Try to modify inputs to Failure Assignment So problem admits a poly-time exact (or approximation) algorithm Restrict the given mapping to allocate spares in some manner Use results from Failure Assignment to guide clustering algorithms Generalize delay model so that LUT placement is also considered So where do we go from here? What we are currently working on is modifying the inputs to the Failure Assignment problem so that we can obtain a polynomial time solution. One way to do this is to try to restrict the way the given mapping allocates spares. Next, we can use results from the Failure Assignment problem to guide clustering algorithms. This way, our clustering algorithm will generate clustering solutions that we know are easily reconfigured. Finally, If you’ll recall our delay model only differentiated delays over intercluster edges and intracluster edges. We’d like to eventually generalize our delay model so that LUT positions are considered. Thus, rather than clustering, we become interested in placement on the FPGA. 12/30/2018 UCLA VLSICAD LAB

Future Applications Nanoscale FPGAs Integrate with BIST to make a self-repairing system Produce profitable yield despite high defect rates 12/30/2018 UCLA VLSICAD LAB