Cross-layer Optimized Placement and Routing for FPGA Soft Error Mitigation Keheng Huang 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of Computer System.

Slides:

Advertisements

Similar presentations

Address comments to FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1.

Advertisements

Interconnect Complexity-Aware FPGA Placement Using Rent’s Rule G. Parthasarathy Malgorzata Marek-Sadowska Arindam Mukherjee Amit Singh University of California,

On Diagnosis of Multiple Faults Using Compacted Responses Jing Ye 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute.

Reducing the Pressure on Routing Resources of FPGAs with Generic Logic Chains Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.

Clustering of Large Designs for Channel-Width Constrained FPGAs Marvin TomGuy Lemieux University of British Columbia Department of Electrical and Computer.

The New FPGA Architecture by Applying The CS-Box Structure Zhou Lin, Catherine October 13, 2003.

Yan Lin, Fei Li and Lei He EE Department, UCLA

ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.

1 Oct 24-26, 2006 ITC'06 Fault Coverage Estimation for Non-Random Functional Input Sequences Soumitra Bose Intel Corporation, Design Technology, Folsom,

Address comments to Robust FPGA Resynthesis Based on Fault-Tolerant Boolean Matching Yu Hu 1, Zhe Feng 1, Lei He 1 and Rupak Majumdar 2.

An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,

 Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD  Y. Hu,

Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles Partially supported by NSF Grants.

CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.

黃錫瑜 Shi-Yu Huang National Tsing-Hua University, Taiwan Speeding Up Byzantine Fault Diagnosis Using Symbolic Simulation.

Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China Alan Mishchenko Department of EECS University of California,

An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João Cardoso, Dirk Stroobandt.

Titan: Large and Complex Benchmarks in Academic CAD

A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.

March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,

Julien Lamoureux and Steven J.E Wilton ICCAD

CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.

J. Christiansen, CERN - EP/MIC

Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.

1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.

Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

-1- Statistical Analysis and Modeling for Error Composition in Approximate Computation Circuits Wei-Ting Jonas Chan 1, Andrew B. Kahng 1, Seokhyeong.

Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs Ghazanfar (Hossein) Asadi and Mehdi B. Tahoori Why Soft Error Rate (SER) Estimation?

1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.

Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation

Jing Ye 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences.

ATS Exploiting Free LUT Entries to Mitigate Soft Errors in SRAM- based FPGAs Keheng Huang, Yu Hu, Xiaowei Li Institute of Computing Technology Chinese.

Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chips Konstantinos Aisopos (Princeton, MIT) Chia-Hsin Owen Chen (MIT) Li-Shiuan.

11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,

Jing Ye 1,2, Xiaolin Zhang 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese.

Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China Alan Mishchenko Department of EECS University of California,

FPGA CAD 10-MAR-2003.

1 Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28,

In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.

1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.

1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,

Deterministic Diagnostic Pattern Generation (DDPG) for Compound Defects Fei Wang 1,2, Yu Hu 1, Huawei Li 1, Xiaowei Li 1, Jing Ye 1,2 1 Key Laboratory.

Routing Wire Optimization through Generic Synthesis on FPGA Carry Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.

IPR: In-Place Reconfiguration for FPGA Fault Tolerance Zhe Feng 1, Yu Hu 1, Lei He 1 and Rupak Majumdar 2 1 Electrical Engineering Department 2 Computer.

Xiao Patrick Dong Supervisor: Guy Lemieux. Goal: Reduce critical path  shorter period Decrease dynamic power 2.

Chandrasekhar 1 MAPLD 2005/204 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan.

Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.

Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.

Placement study at ESA Filomena Decuzzi David Merodio Codinachs

A New Logic Synthesis, ExorBDS

IPF: In-Place X-Filling to Mitigate Soft Errors in SRAM-based FPGAs

Robust FPGA Resynthesis Based on Fault-Tolerant Boolean Matching

MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.

Delay Optimization using SOP Balancing

A. Mishchenko S. Chatterjee1 R. Brayton UC Berkeley and Intel1

Verilog to Routing CAD Tool Optimization

An Active Glitch Elimination Technique for FPGAs

Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs

Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu

FPGA Glitch Power Analysis and Reduction

Off-path Leakage Power Aware Routing for SRAM-based FPGAs

Delay Optimization using SOP Balancing

A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P

Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.

Presentation transcript:

Cross-layer Optimized Placement and Routing for FPGA Soft Error Mitigation Keheng Huang 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences 2 Graduate University of Chinese Academy of Sciences

2 Outline Background Motivation Cross-layer optimized placement and routing Experimental results Conclusions

3 Background Architecture of SRAM-based FPGAs

4 Background Architecture of SRAM-based FPGAs Segments

5 Background Architecture of SRAM-based FPGAs Segments

6 Background Architecture of SRAM-based FPGAs Configuration bits (>98% of all SRAM bits) –Routing resources (80% of configuration bits) User bits (<2% of all SRAM bits) The reliability of routing resources needs to be seriously considered during placement and routing Segments

7 Reliability Oriented EDA Flow Synthesis and mapping Design specification Gate-level netlist Bit Stream Placement and routing Application design level Physical design level RoRA [TC’06] TMR designs only SEU-Aware P & R [ISQED’07] Dimensions of bounding box Reliability-aware P & R [ITC’10] Dimensions of bounding box SEU-Aware Router [DAC’07] Number of configuration bits

8 Soft Error Rate (SER) Propagation probability Occurrence probability SER evaluation criterion Application level factor (EPP) Application level factor (EPP) Physical level factor (Node error rate) Physical level factor (Node error rate) +

9 Key Observation Application level factor (EPP) Application level factor (EPP) Physical level factor (Node error rate) Physical level factor (Node error rate) Prior P & R guidance criterion All EPPs are equal? Physical level factor (Bounding boxes, Configuration bits) Physical level factor (Bounding boxes, Configuration bits) Estimated SER + SER evaluation criterion

10 Key Observation Application level factor (EPPs) varies significantly Gini coefficientInequality degree <0.2Absolutely equal Relatively equal Moderately unequal The gap is relatively large >0.6Quite unequal

11 Key Observation Application level factor (EPPs) varies significantly Gini coefficientInequality degree <0.2Absolutely equal Relatively equal Moderately unequal The gap is relatively large >0.6Quite unequal

12 Cross-layer Optimized Placement and Routing Overview

13 Cross-layer Optimized Placement and Routing Overview

14 Cross-layer Optimized Placement and Routing Overview

15 Cross-layer Optimized Placement and Routing Overview

16 Cross-layer Optimized Placement and Routing Overview

17 Cross-layer Optimized Placement and Routing Overview

18 Cube-based EPP Analysis Error propagation probability (EPP) Monte Carlo simulation –Test vectors (high accuracy) –Traverse the design N times (high complexity) Static analysis –Signal probability and error propagation rules (lower accuracy) –Traverse the design twice per fault (lower complexity)

19 Cube-based EPP Analysis Error propagation probability (EPP) Monte Carlo simulation –Test vectors (high accuracy) –Traverse the design N times (high complexity) Static analysis –Signal probability and error propagation rules (lower accuracy) –Traverse the design twice per fault (lower complexity) A method with high accuracy and low complexity?

20 Cube-based EPP Analysis Besides 0 and 1, “X” bit is introduced Introduce the cube and cover in logic synthesis Covers adjoin V :(set union: ∪ ) Covers interface I :(set intersection: ∩ ) 0XX = {000,001,010,011} cover ={0XX, 1XX} cube adjoin :{00X} V {01X} = {0XX} interface: {00X} I {001} = {001}

21 Cube-based EPP Analysis Forward traverse: compute the vectors that set the logic of the wire as 0 and 1 respectively

22 Cube-based EPP Analysis Forward traverse: compute the vectors that set the logic of the wire as 0 and 1 respectively

23 Cube-based EPP Analysis Forward traverse: compute the vectors that set the logic of the wire as 0 and 1 respectively Backward traverse: compute the vectors that can propagate the fault to outputs

24 Cube-based EPP Analysis Forward traverse: compute the vectors that set the logic of the wire as 0 and 1 respectively Backward traverse: compute the vectors that can propagate the fault to outputs

25 Application level factor Error propagation probability (EPP) Compare with traditional Monte Carlo simulation For each fault, traverse the design N times N care-cover : number of vectors stored in care-cover N inputs : total number of input vectors

26 Application level factor Comparison of computational complexity N: number of input vectors V: number of LUTs E: number of interconnecting wires g: number of configuration bits per LUT or wire C avg : average compression ratio of all covers AlgorithmComputational complexity Monte Carlo SimulationO(N*V*g*(V+E)) Static AnalysisO(V 2 *(V +E)) Cube-based EPP AnalysisO((N/C avg ) 2 *(V+E))

27 Cross-layer Optimized Placement Total cost=a*timing cost + b*congestion cost + c*SER cost SER cost= Phy cost * App cost

28 Cross-layer Optimized Placement Total cost=a*timing cost + b*congestion cost + c*SER cost SER cost= Phy cost * App cost

29 Cross-layer Optimized Placement Total cost=a*timing cost + b*congestion cost + c*SER cost SER cost= Phy cost * App cost

30 Cross-layer Optimized Placement Total cost=a*timing cost + b*congestion cost + c*SER cost SER cost= Phy cost * App cost

31 Cross-layer Optimized Placement Total cost=a*timing cost + b*congestion cost + c*SER cost SER cost= Phy cost * App cost

32 Cross-layer Optimized Placement Total cost=a*timing cost + b*congestion cost + c*SER cost SER cost= Phy cost * App cost

33 Cross-layer Optimized Placement Total cost=a*timing cost + b*congestion cost + c*SER cost SER cost= Phy cost * App cost

34 Cross-layer Optimized Routing Finer granularity estimate of SER

35 Cross-layer Optimized Routing Finer granularity estimate of SER

36 Cross-layer Optimized Routing Finer granularity estimate of SER

37 Cross-layer Optimized Routing Finer granularity estimate of SER

38 Cross-layer Optimized Routing Finer granularity estimate of SER

39 Cross-layer Optimized Routing Finer granularity estimate of SER

40 Experimental Setup MCNC benchmark set Berkeley ABC mapper Gate-level netlist Bit Stream VPR: Academic FPGA placement and routing tool Logic resources: 4 6-input LUTs per CLB Routing channel width: 30% increase

41 Experimental results Comparison of EPP accuracy and run time Monte Carlo simulation DCOW: partial Monte Carlo simulation Cube-based EPP analysis Comparison of SER mitigation Original VPR Guided by physical level factor only (PPL) Cross-layer optimized placement and routing algorithm (COPAR)

42 Comparison of EPP Accuracy and Run Time Monte Carlo simulation (golden model) DCOW: partial Monte Carlo simulation (DAC’10) Cube-based analysis (our method) N cube : number of test vectors computed by cube-based analysis N sim : number of test vectors computed by Monte Carlo simulation N inputs : total number of input vectors

43 Circuit Monte Carlo simulationCube-based analysisGap(10 -4 ) † N sim (10 5 )Time(s)N cube (10 5 )Time(s)G cube G DCOW alu apex N/A /itr apex clma126.79N/A /itr misex pdc s s N/A /itr s N/A /itr seq689.72N/A /itr spla bigkey876.46N/A /itr014 des868.03N/A /itr00 diffeq432.85N/A /itr0379 dsip N/A /itr00 elliptic240.35N/A /itr0192 ex ex5p frisc N/A /itr tseng N/A /itr Geomean Comparison of EPP Accuracy and Run Time

44 Circuit Monte Carlo simulationCube-based analysisGap(10 -4 ) † N sim (10 5 )Time(s)N cube (10 5 )Time(s)G cube G DCOW alu apex N/A /itr apex clma126.79N/A /itr misex pdc s s N/A /itr s N/A /itr seq689.72N/A /itr spla bigkey876.46N/A /itr014 des868.03N/A /itr00 diffeq432.85N/A /itr0379 dsip N/A /itr00 elliptic240.35N/A /itr0192 ex ex5p frisc N/A /itr tseng N/A /itr Geomean Comparison of EPP Accuracy and Run Time

45 Circuit Monte Carlo simulationCube-based analysisGap(10 -4 ) † N sim (10 5 )Time(s)N cube (10 5 )Time(s)G cube G DCOW alu apex N/A /itr apex clma126.79N/A /itr misex pdc s s N/A /itr s N/A /itr seq689.72N/A /itr spla bigkey876.46N/A /itr014 des868.03N/A /itr00 diffeq432.85N/A /itr0379 dsip N/A /itr00 elliptic240.35N/A /itr0192 ex ex5p frisc N/A /itr tseng N/A /itr Geomean Comparison of EPP Accuracy and Run Time

46 Comparison of SER Mitigation Circuit VPR(baseline)PPLCOPAR SER(FIT † )SER(FIT)RatioSER(FIT)Ratio alu % % apex % % apex % % clma % % misex % % pdc % % s % % s % % s % % seq % % spla % % bigkey % % des % % diffeq % % dsip % % elliptic % % ex % % ex5p % % frisc % % tseng % % Geomean % %

47 Comparison of SER Mitigation 100% 93.53% 85.61%

48 Conclusions Observe the gap between the SER evaluation criterion and guidance criterion for soft error mitigation (gini coefficient=0.646) Introduce cube-based EPP analysis to compute the application level factor (gap<1%) Propose a cross-layer optimized placement and routing algorithm (SER mitigation>14%)

49 Thank You for Your Attention Question?