Fault-Tolerant Clustering for FPGAs

Fault-Tolerant Clustering for FPGAs
Jason Cong and Brian Tagiku VLSI CAD Laboratory Computer Science Department University of California, Los Angeles Good Afternoon everyone, my name is Brian Tagiku. I am one of Professor Cong’s students at UCLA. Together we’ve been looking at fault-tolerant methods for two-level Hierarchical FPGAs. In particular, we’ve been considering both fault-tolerant clustering of LUT networks as well as fault-tolerant reconfiguration.

Outline Background Problem Model and Formulation
Fault-Tolerant Clustering Fault Assignment and Reconfiguration Future Work Before we begin here’s a brief overview of what I’ll talk about. I’ll start by explaining our problem model and other preliminaries. Then I’ll first talk about our work in fault-tolerant clustering followed by a short discussion of fault assignment. Finally I’ll wrap up by discussing possibilities for future work. 12/30/2018 UCLA VLSICAD LAB

Previous Work Flat (Non-Hierarchical) FPGAs Hierarchical FPGAs
Hatori et al. (Toshiba, 1993) – Spare rows of CLBs Howard et al. (Univ. of York, 1994) – Spare “blocks” of CLBs Hanchek and Dutt (Intel/UIUC, 1996) – Node Covering, each CLB assigned a node to “cover” Lach et al. (UCLA, 98) – Tiling, FPGA partitioned into tiles and alternate configurations for each tile precomputed Hierarchical FPGAs Lakamraju and Tessier (Univ. of Mass., 2000) – Spare elements in each block level Redundancy in FPGAs first began in 1993 when a group at Toshiba published a paper proposing the usage of spares rows of CLBs. In 1994, Howard et al. proposed to group CLBs not by row, but in more general rectangular blocks. This improved upon Toshiba’s work because it required less area overhead. Hanchek and Dutt came up with the concept of node covering in In this case, each CLB is assigned an adjacent CLB to “cover”. When a fault occurs at a CLB, it’s functionality is shifted down a chain of covers until a spare CLB is reached. The advantage here is that this method accommodates dynamic reconfiguration and more faults can be tolerated. Lach et al. proposed a tiling-based method in This method partitions CLBs into tiles, then precomputes alternate configurations for each tile. Thus, when a failure occurs, the tile can be reconfigured to use a compatible configuration. As far as hierarchical FPGAs go, very little work has been done. Lakamraju and Tessier propose a simple fault-tolerant scheme in Here, they simply propose to insert a new column of spare elements in each level of the hierarchy. While they do not show other spare allocation methodologies, they do illustrate the gains to be made from fault-tolerance in hierarchical FPGAs 12/30/2018 UCLA VLSICAD LAB

Related Work Fault covering in memory arrays
Spare row and columns available Must use spares to cover entire row or column in which faults occur Difficulty lies in finding a set of covering rows and columns Comparison to fault tolerance in FPGAs A set of spares to cover faults is easy to find Difficulty is finding a set that allows a target delay to be met 12/30/2018 UCLA VLSICAD LAB

Hierarchical FPGAs 2 level, hierarchical circuit logic
Level 0 Blocks – LUTs Level 1 Blocks – Clusters of LUTs Uses locality of interconnections to improve circuit performance Now, as I said before, we’re interested in fault-tolerance in two-level hierarchical FPGAs. This means, we have level 0 blocks which we’ll assume are simply LUTs and level 1 blocks which are clusters of level-0 blocks. The idea here is that intracluster interconnects (or ones that remain in the same cluster) have a very short propagation delay relative to intercluster edges (ones that cross clusters). So we’d like to somehow utilize the locality of these interconnections to improve circuit performance. 12/30/2018 UCLA VLSICAD LAB

Redundancy in FPGAs LUTs can fail with some probability
Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase Now the problem is that manufacturing never goes perfectly, especially if we’re working with nanoscale devices… 12/30/2018 UCLA VLSICAD LAB

Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase So some of our LUTs will be defective and this is bad if we’ve mapped our circuit through this LUT as shown here… 12/30/2018 UCLA VLSICAD LAB

Allocate extra components (e.g. LUTs) Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase However, by allocating spare LUTs within the clusters in a smart way… 12/30/2018 UCLA VLSICAD LAB

Allocate extra components (e.g. LUTs) into the system Recover from defects by using spare LUTs Ideally, want the spare LUT to be close to the failure so that delay does not increase we will be able to recover from the defect gracefully. Notice here, we’ve moved the circuitry from this LUT to this one and each path in the circuit still passes through the same number of intercluster edges. 12/30/2018 UCLA VLSICAD LAB

Allocate extra components (e.g. LUTs) into the system Re-route inputs and outputs to a spare LUT Ideally, want the spare LUT to be close to the failure so that delay does not increase But if our spares are not allocated correctly, then our circuit performance can degrade significantly. 12/30/2018 UCLA VLSICAD LAB

Fault Tolerant Clustering
Inputs A DAG G (LUT Netlist) An HFPGA with k clusters of c LUTs Inter/Intracluster edge delays Probability of LUT defects Target delay D Goal Map G into the HFPGA to maximize probability of achieving delay D after reconfiguration of failures A B C D A C B D So what we should be concerned about is how to place our circuit into the FPGA so that spares are allocated smartly. This is what we call the fault tolerant clustering problem. The inputs to this problem are a DAG G (such as this graph here), an HFPGA chip with k clusters each of c LUTs (here, this HFPGA has 2 clusters of 3 LUTs), delays between LUTs, the probabilities of each LUT having a defect and some target delay D. The goal is to map our circuit into the FPGA so that over all possible defects, we maximize the probability that we can reconfigure the circuit and still meet the target delay D. Let’s consider this example here 12/30/2018 UCLA VLSICAD LAB

Motivational Example Probability of LUT failure = 0.1
Maximum # intercluster edges along path Probability 1 0.89 2 0.09 failure 0.02 Maximum # intercluster edges along path Probability 1 0.97 2 0.01 failure 0.02 Assume that each LUT can fail with probability 0.1 and consider the following two clusterings. It turns out that in the left clustering, about 89% of the time, we can expect to be able to reconfigure the circuit so that each path has only 1 intercluster edge. 9% of the time, we can reconfigure the circuit so that each path has at most 2 intercluster edges. Finally, 2% of the time we won’t be able to reconfigure the circuit at all due to too many failures. Notice that the right clustering does significantly better as it can guarantee that 97% of the time each path will have only 1 intercluster edge. 12/30/2018 UCLA VLSICAD LAB

Dynamic Programming Heuristic
Use a dynamic programming matrix A Each entry A[i,j,k] stores a clustering solution of node i and its predecessors such that Exactly j clusters are used Arrival time at i is at most k The probability of achieving delay k is maximized Allows node duplication Assumes constant fan-in To address this problem, we designed a heuristic that uses a dynamic programming based approach. This heuristic essentially stores the “best” clustering solution of each node and its predecessors that uses a specific number of clusters. So the (i,j,k)-th clustering is a clustering of node I and its predecessors, uses exactly j clusters, achieves arrival time k or better at node I, and maximizes the probability of achieving arrival time k after reconfiguration. This heuristic does assume that node duplication is allowed and that each node has a constant fan-in. 12/30/2018 UCLA VLSICAD LAB

DP Heuristic Performance
10% failure rate Intracluster edge delay 0 Intercluster edge delay 3 LUT delay 1 8 clusters each of 3 LUTs Target delay of 7 We tested this heuristic on a very small example with these parameters… 12/30/2018 UCLA VLSICAD LAB

DP Heuristic Performance
And got these clusterings. The left clustering is a standard min-delay clustering whereas the right is the clustering given by our heuristic. Note that our heuristic achieves the target delay an additional 10% more than the min-delay clustering. However, we’re still investigating this heuristic’s performance over larger circuits. Min-delay clustering Achieves delay 7 with probability ≈ 0.28 DP clustering Achieves delay 7 with probability ≈ 0.39 12/30/2018 UCLA VLSICAD LAB

Difficulties Best known algorithm for calculating probability distribution of delays is exponential Doesn’t specify how to reconfigure a circuit There are a few difficulties associated with the clustering problem that we should point out. First is that if you are given a clustering solution, calculating the probability that the circuit can be reconfigured to achieve a target delay is hard. The best known algorithms for doing this in general are exponential time. Thus, it can be very hard to judge the actual performance of an algorithm on large circuits. Secondly, this problem addresses the allocation of spares in the FPGA, but doesn’t specify how to actually reconfigure the circuit around failures. The problem of assigning failures to spare resources is what we call the Failure Assignment problem. 12/30/2018 UCLA VLSICAD LAB

Failure Assignment Inputs Goal A DAG G
An HFPGA with k clusters of c LUTs Inter/Intracluster edge delays Target delay D A mapping of G into the FPGA A set of failed LUTs Goal Reassign failed LUTs to spare LUTs so that the delay D is still met. In the failure assignment problem, we are given, among other things, a mapping of the circuit into the FPGA. We are also given a set of defective LUTs. Our job is to assign defective LUTs to spare LUTs so that the circuit can function correctly and so that the target delay is met. 12/30/2018 UCLA VLSICAD LAB

More Difficulties Failure Assignment is NP-Complete
Even with fixed cluster sizes c ≥ 4 Even if spares are guaranteed to be non-defective Even if we are guaranteed at least m spares per cluster Even if no more than P percent of the LUTs fail These results seem to imply that Fault Tolerant Clustering is also a hard problem Unfortunately, the Failure Assignment problem is NP-Complete in general and in several other variations we’ve considered. So this seems to be a very difficult problem in itself. Moreover, since this problem is very closely related to clustering, it seems that clustering may be just as hard if not harder. 12/30/2018 UCLA VLSICAD LAB

Online Failure Assignment
Defects and faults announced online (one at a time) We must assign a fault to a spare when announced We cannot change our mind at a later time Related Problems Online Routing Faults need to be connected to spares (log n)-competitive algorithm known Online Bipartite Matching Faults need to be matched to spares O(log3 n) randomized algorithm for metric cases known Despite this, we would like some way to address the problem practically. 12/30/2018 UCLA VLSICAD LAB

Future Work Try to modify inputs to Failure Assignment
So problem admits a poly-time exact (or approximation) algorithm Restrict the given mapping to allocate spares in some manner Use results from Failure Assignment to guide clustering algorithms Generalize delay model so that LUT placement is also considered So where do we go from here? What we are currently working on is modifying the inputs to the Failure Assignment problem so that we can obtain a polynomial time solution. One way to do this is to try to restrict the way the given mapping allocates spares. Next, we can use results from the Failure Assignment problem to guide clustering algorithms. This way, our clustering algorithm will generate clustering solutions that we know are easily reconfigured. Finally, If you’ll recall our delay model only differentiated delays over intercluster edges and intracluster edges. We’d like to eventually generalize our delay model so that LUT positions are considered. Thus, rather than clustering, we become interested in placement on the FPGA. 12/30/2018 UCLA VLSICAD LAB

Future Applications Nanoscale FPGAs
Integrate with BIST to make a self-repairing system Produce profitable yield despite high defect rates 12/30/2018 UCLA VLSICAD LAB

Fault-Tolerant Clustering for FPGAs

Similar presentations

Presentation on theme: "Fault-Tolerant Clustering for FPGAs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fault-Tolerant Clustering for FPGAs

Similar presentations

Presentation on theme: "Fault-Tolerant Clustering for FPGAs"— Presentation transcript:

Similar presentations

About project

Feedback