Min Li and Azadeh Davoodi

Slides:

Advertisements

Similar presentations

Chapter 5 Multiple Linear Regression

Advertisements

Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.

Presenter : Shih-Tung Huang 2015/4/30 EICE team Automated Data Analysis Solutions to Silicon Debug Yu-Shen Yang Dept. of ECE University of Toronto Toronto,

Reap What You Sow: Spare Cells for Post-Silicon Metal Fix Kai-hui Chang, Igor L. Markov and Valeria Bertacco ISPD’08, Pages

Announcements Assignment 8 posted –Due Friday Dec 2 nd. A bit longer than others. Project progress? Dates –Thursday 12/1 review lecture –Tuesday 12/6 project.

Reporter:PCLee With a significant increase in the design complexity of cores and associated communication among them, post-silicon validation.

Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.

Face detection Many slides adapted from P. Viola.

Statistical Full-Chip Leakage Analysis Considering Junction Tunneling Leakage Tao Li Zhiping Yu Institute of Microelectronics Tsinghua University.

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Presenter ： Shao-Jay Hou. Today’s complex integrated circuit designs increasingly rely on post-silicon validation to eliminate bugs that escape from pre-silicon.

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Presenter: Shao-Jay Hou. Embedded logic analysis has emerged as a powerful technique for identifying functional bugs during post- silicon validation,

TH EDA NTHU-CS VLSI/CAD LAB 1 Re-synthesis for Reliability Design Shih-Chieh Chang Department of Computer Science National Tsing Hua University.

Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.

1 Validation and Verification of Simulation Models.

An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.

Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.

Computing OverApproximations with Bounded Model Checking Daniel Kroening ETH Zürich.

Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.

Feng-Xiang Huang A Design-for-Debug (DfD) for NoC-based SoC Debugging via NoC Hyunbean Yi 1, Sungju Park 2, and Sandip Kundu 1 1 Department of Electrical.

Project Report II RIPE: A Rapid Implication- based Power Estimator Sunil Motaparti, Gaurav Bhatia.

WISCAD – VLSI Design Automation GRIP: Scalable 3-D Global Routing using Integer Programming Tai-Hsuan Wu, Azadeh Davoodi Department of Electrical and Computer.

L i a b l eh kC o m p u t i n gL a b o r a t o r y Trace-Based Post-Silicon Validation for VLSI Circuits Xiao Liu Department of Computer Science and Engineering.

High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov.

A Parallel Integer Programming Approach to Global Routing Tai-Hsuan Wu, Azadeh Davoodi Department of Electrical and Computer Engineering Jeffrey Linderoth.

Reporter: PCLee. Assertions in silicon help post-silicon debug by providing observability of internal properties within a system which are.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

Confidentiality Preserving Integer Programming for Global Routing Hamid Shojaei, Azadeh Davoodi, Parmesh Ramanathan Department of Electrical and Computer.

Physical Layer Informed Adaptive Video Streaming Over LTE Xiufeng Xie, Xinyu Zhang Unviersity of Winscosin-Madison Swarun KumarLi Erran Li MIT Bell Labs.

Presenter : Ching-Hua Huang 2013/9/16 Visibility Enhancement for Silicon Debug Cited count : 62 Yu-Chin Hsu; Furshing Tsai; Wells Jong; Ying-Tsai Chang.

Resistant Learning on the Envelope Bulk for Identifying Anomalous Patterns Fang Yu Department of Management Information Systems National Chengchi University.

A Sensor-Assisted Self-Authentication for Hardware Trojan Detection Min Li*, Azadeh Davoodi*, Mohammad Tehranipoor** * University of Wisconsin-Madison.

Analysis of Algorithms

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

Department of Electrical Engineering, Southern Taiwan University Robotic Interaction Learning Lab 1 The optimization of the application of fuzzy ant colony.

(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

Reporter :PCLee The decisions on when to acquire debug data during post-silicon validation are determined by trigger events that are programmed.

Presenter: PCLee Post-silicon validation is used to identify design errors in silicon. Its main limitation is real-time observability of the.

Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.

1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.

Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.

Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.

Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University.

I N V E N T I V EI N V E N T I V E A Morphing Approach To Address Placement Stability Philip Chong Christian Szegedy.

Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.

Feng-Xiang Huang Test Symposium(ETS), th IEEE European Ko, Ho Fai; Nicolici, Nicola; Department of Electrical and Computer Engineering,

Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.

Jing Ye 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences.

De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.

DFT Applications Technology to calculate observables Global properties Spectroscopy DFT Solvers Functional form Functional optimization Estimation of theoretical.

Multi-Mode Trace Signal Selection for Post-Silicon Debug Min Li and Azadeh Davoodi Department of Electrical and Computer Engineering University of Wisconsin-Madison.

Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.

Sunpyo Hong, Hyesoon Kim

1 Information Content Tristan L’Ecuyer. 2 Degrees of Freedom Using the expression for the state vector that minimizes the cost function it is relatively.

-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.

Trace Signal Selection for Post-Silicon Debug W ISCAD Electronic Design Automation Lab

On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.

Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.

Finite state machine optimization

Finite state machine optimization

Chapter #6: Sequential Logic Design

Prepared by Viren Pandya

Experiment Evaluation

Avidan Efody, Mentor Graphics Corp.

Module Recognition Algorithms

Improved Design Debugging using Maximum Satisfiability

Presentation transcript:

A Hybrid Approach for Fast and Accurate Trace Signal Selection for Post-Silicon Debug Min Li and Azadeh Davoodi Department of Electrical and Computer Engineering University of Wisconsin-Madison WISCAD Electronic Design Automation Lab http://wiscad.ece.wisc.edu/

Comparison of Verification Methods Approach Throughput (Hz) System simulation ~103 RTL simulation 101 to 103 Gate simulation 10-1 to 101 Emulation ~105 FPGA prototyping ~106 Silicon 107 to 109 [Table from Aitken, et al DAC’10] Simulation is too slow! 4-8 orders of magnitude slower than silicon e.g., for Pentium IV: 2 years of simulation = 2 min operation

Post-Silicon Debug Post-Silicon Debug (PSD) stage Stage after the initial chip tape-out and before the final release of product Involves finding errors causing malfunctions Bugs found using real-time operation of a few manufactured chips with real-world stimulus Bugs fixed through multiple rounds of silicon steppings Has become significantly expensive and challenging Mainly due to poor visibility of the internal signals inside the chips

Embedded Logic Analyzer (ELA) On-chip ELA Used to increase visibility to internal signals Captures the values of a few flipflops (i.e., trace signals) real-time and stores them inside the Trace Buffer Control Unit Trigger Unit Sampling Unit Offload Unit Assertion Checker Trace Buffer Trigger signals Trigger condition Traced data Off-chip analysis Assertion flags Synchronization data Trace signals The traced data are then extracted off-chip and analyzed to restore the remaining signals inside the chip as many as possible

Overview of Trace Buffer Due to the limited on-chip area, the size of trace buffer is small e.g., B : 8 to 32 signals and M: 1K to 8K cycles Terminology “Capture window” has a size of BxM “Observation window” has a size of BxN where N << M Cycle 0, 1 ….M-1 𝑆 0 𝑆 1 𝑆 𝐵−1 … 𝑆 𝑖 B M 1 Trace buffer is an on-chip buffer of size BxM B is the buffer bandwidth and identifies the number of signals which can be traced M is the depth of buffer and is equal to the number of clock cycles that tracing is applied

Restoration Using Trace Signals Restoration using “X-Simulation” At each cycle of the capture window, forward and backward restoration steps are applied iteratively until no more signals can be restored f1 f2 f4 f5 f3 Forward Restoration Backward Restoration Traced flipflop DFF\Cycle 1 2 3 F1 X F2 F3 F4 F5 1 X X 1 X

Restoration Using Traced Signals Quality of restoration is measured by the State Restoration Ratio (SRR) Measured within a capture window (BxM) 𝑆𝑅𝑅= 𝐵×𝑀+#𝑟𝑒𝑠𝑡𝑜𝑟𝑒𝑑 𝑠𝑖𝑔𝑛𝑎𝑙𝑠 𝐵×𝑀 = 4+6 4 =2.5 Reflects the amount of restoration per trace signal per clock cycle DFF\Cycle 1 2 3 F1 X F2 F3 F4 F5 Restored signal

Trace Signal Selection Problem Challenges of PSD using trace buffers Due to the small trace buffer size, the capture window is small Different selections of the B trace signals can result in significantly different SRR Trace signal selection problem Given a trace buffer of size BxM Select B flipflops for tracing such that the remaining internal signals can be restored as many as possible during M cycles corresponding to the capture window Maximize the State Restoration Ratio (SRR)

Existing Trace Selection Algorithms Select one trace that leads to the largest SRR in each iteration Selected B traces? Terminate Yes No Empty trace set Forward Greedy Prune one trace that leads to the smallest SRR in each iteration B traces left? Terminate Yes No All traces included Backward Pruning Ko & Nicolici [DATE’08] Liu & Xu [DATE’09] Prabhakar & Xiao [ATS’09] Basu & Mishra [VLSI’11] Chatterjee & Bertacco [ICCAD’11]

Existing Trace Selection Algorithms Also categorized based on the way SRR is approximated Metric-based Uses quick metrics to approximate SRR with high error but fast runtime Ko & Nicolici [DATE’08] Liu & Xu [DATE’09] Prabhakar & Xiao [ATS’09] Basu & Mishra [VLSI’11] Davoodi & Shojaei [ICCAD’10] Simulation-based Uses X-Simulation to measure SRR accurately with backward pruning-travesal but still with a very long runtime Chatterjee & Bertacco [ICCAD’11]

Simulation-Based Trace Selection Much more accurate than metric-based Simulation can directly consider signal correlations Simulation accounts for the fact that a flipflop may be restored to different values within the observation window Much slower than metric-based Restoration of each gate is evaluated using X-Simulation for each clock cycle DFF\Cycle 1 2 3 F1 X F2 F3 F4 F5 1 X X 1 X

Contributions A hybrid trace signal selection algorithm Blend of simulation and metrics We propose a new set of metrics to quickly find a small number of top trace signal candidates at each step of the algorithm Next, among the few top candidates, X-Simulation is used to accurately evaluate the SRR and select the best We show our method has same or better solution quality compared to simulation-based approach with runtime as fast as the metric-based approaches

Overview of Our Algorithm Based on forward-greedy trace signal selection Proposed metrics Reachability List of a flipflop f A small subset of flipflops which are good candidates to be restored by f Restorability Rate Rate that each flipflop is restored using the trace signals selected so far Restoration Demand of flipflop i from flipflop f Where flipflop f is candidate for the next trace signal Impact Weight of flipflop f How much f can restore the untraced flipflops after accounting for restoration from the already-selected trace signals Initialize metrics Compute fast metrics to find a small number of top candidates for tracing Selected B traces? Terminate No Yes Update metrics Use a small number of X-Simulation to identify the best candidate (next trace) from the top candidates

“Reachability List” 𝐿 𝑓 𝑣 : Reachability list of flipflop f taking value v Defined for all flipflops f and values v = {0,1} A set of the flipflops which can be restored by f taking value v (without the help of any other flipflop) When evaluating how much a candidate trace signal f can restore other flipflops, only the elements in 𝐿 𝑓 𝑣 are considered Helps significantly reduce the algorithm runtime Computed once as a pre-processing step before the selection starts f1 f2 f4 f5 f3 𝐿 2 0 = { 𝑓 1 , 𝑓 5 }, 𝐿 2 1 = { 𝑓 1 , 𝑓 3 }

“Restorability Rate” 𝑟 𝑓 : restorability rate of flipflop f Defined for any untraced flipflop f at each iteration Probability that f can be restored using the trace signals identified so far Requires only one round of X-Simulation within a small observation window To compute for all untraced flipflops* * See Algorithm 3 in the paper for details DFF\Cycle 1 2 3 F1 X F2 F3 F4 F5 𝑟 3 = 2 4

“Restoration Demand” 𝑑 𝑖,𝑓 𝑣 : Restoration demand of flipflip i from flipflop f i should be in the reachability list of f 𝑑 𝑖,𝑓 𝑣 ≈min 1− 𝑟 𝑖 , 𝑎 𝑓 𝑣 ∀𝑖∈ 𝐿 𝑓 0 𝑜𝑟 𝑖∈ 𝐿 𝑓 1 1− 𝑟 𝑖 : the “remaining” restoration demand 𝑎 𝑓 𝑣 : probability that f takes values v The maximum f can offer to restore i f1 f2 f4 f5 f3 Potentially-traced 𝑑 3,2 1 ≈min(1− 𝑟 3 , 𝑎 2 1 ) This expression is just an upper-bound approximation of the actual demand however it can be evaluated very quickly!

“Impact Weight” 𝑤 𝑓 = 𝑣=0,1 ∀𝑖∈ 𝐿 𝑓 𝑣 𝑑 𝑖,𝑓 𝑣 Defined for any untraced flipflop f At each iteration of our algorithm, among the untraced flipflops, the ones with the highest impact weights are selected as the top candidates Top candidates set to only 5% of the number of flipflops 𝑤 2 = 𝑑 1,2 0 + 𝑑 5,2 0 + 𝑑 1,2 1 + 𝑑 3,2 1 𝐿 2 0 = { 𝑓 1 , 𝑓 5 }, 𝐿 2 1 = { 𝑓 1 , 𝑓 3 } f1 f2 f4 f5 f3

Trace Selection Process Method (i): At each iteration Identify top candidates using Impact Weights Select next trace from the top candidates using a small number of X-Simulations Method (ii): After every 8 selected traces, consider adding an “island” flipflop Flipflop f is an island type if 𝐿 𝑓 0 = 𝐿 𝑓 1 = ∅ Initialize metrics Select next trace signal Selected B traces? Terminate No Yes Method (i) Select using Impact Weights Method (ii) Consider adding an “island” signal Selected 8X traces? Update metrics Island flipflops will never be selected as a trace signal using Method (i) Use X-Simulation to measure SRR to identify the best island Few simulations because the number of islands are small (17% of the flipflops for S5378)

Simulation Setup Evaluation metric Comparison made with Use SRR to measure the restoration quality Experimented with trace buffers of size (8, 16, 32) X 4K cycles Comparison made with METR: Metric-based: [Shojaei et al, ICCAD’10] Mainly used for runtime comparison Best reported runtime SIM: Simulation-based: [Chatterjee et al, ICCAD’11] Mainly used to compare solution quality Best reported solution quality

Comparison of Runtime SIM significantly slower than METR and Ours Circuit #DFF #Traces METR (sec) SIM* (hr:min:sec) Ours S5378 163 8 00:06:50 5 16 27 00:06:40 32 66 00:05:30 28 S9234 145 6 00:07:28 26 17 00:06:05 84 38 00:04:10 86 S35932 1728 73 07:13:00 139 167 07:12:00 208 408 07:11:00 217 S38417 1564 3690 50:05:00 434 (8X faster) 7620 50:04:00 2508 (3X faster) 13428 50:02:00 2521 (5X faster) S38584 1166 53 16:33:00 140 16:32:00 741 354 16:31:00 752 SIM significantly slower than METR and Ours Ours has comparable or faster runtime than METR * SIM ran on a quad-core machine using up to 8 threads

Comparison of Solution Quality I Circuit #Traces SRR METR SIM Ours Improvement S5378 8 13.7 12.8 13.6 +6.3% 16 8.1 7.1 8.0 +12.7% 32 4.1 4.4 4.2 -4.5% S9234 8.4 9.1 9.8 +4.3% 5.8 6.6 6.8 +3.0% 3.4 3.6 +0.0% S35932 31.1 58.1 61.4 +5.7% 19.4 36.2 38.3 +5.8% 11.6 23.1 23.4 +1.3% S38417 17.6 29.4 51.4 +74.5% 13.1 17.8 30.1 +12.9% 9.7 20.0 17.5 -12.5% S38584 13.5 14.9 24.0 +31.1% 10.8 18.1 18.5 +2.2% 16.4 +6.7% Average 10.0% On average 10.0% improvement in SRR compared to SIM SIM typically has much higher SRR than METR, especially in larger benchmarks

Identification using Impact Weights How accurate are the top candidates identified by Impact Weights? Use SRR to identify the “actual” top candidates (resulting in the highest SRR) by X-Simulation Used as the golden case Identify the top candidates obtained using Impact Weights which are also top candidates in the golden case

Comparison of Solution Quality II Circuit #Traces SRR Ours-w/o SIM Ours Improvement S5378 8 13.4 13.6 -1.5% 16 7.9 8.0 -1.3% 32 4.0 4.2 -4.8% S9234 9.4 9.8 -4.1% 6.1 6.8 -10.3% 3.3 3.6 -8.3% S35932 31.6 61.4 -48.5% 18.9 38.3 -50.7% 11.3 23.4 -51.7% S38417 18.1 51.4 -64.8% 10.3 30.1 -65.8% 5.9 17.5 -66.3% S38584 18.3 24.0 -23.8% 14.8 18.5 -20.0% 10.7 -38.9% Ours-w/o SIM: Our algorithm when the next trace is the candidate with highest Impact Weight X-Simulation is not used to find the best candidate This experiment shows that X-Simulation is necessary

Comparison of Solution Quality III Circuit #Traces SRR Ours-w/o Islands Ours Improvement S5378 8 12.5 13.6 -8.1% 16 7.8 8.0 -2.5% 32 4.1 4.2 -2.4% S9234 8.1 9.8 -17.3% 6.5 6.8 -4.4% 3.5 3.6 -2.8% S35932 61.4 +0.0% 38.3 23.4 S38417 48.2 51.4 -6.2% 28.7 30.1 -4.7% 16.7 17.5 -4.6% S38584 23.9 24.0 -0.4% 18.5 Ours-w/o Islands: Our algorithm when 8X traces are selected Islands are not considered This experiment shows that the solution quality of some benchmarks are influenced by the islands Islands tend to have a larger impact on smaller trace buffer widths

Summary We presented a new trace signal selection algorithm Utilizes a small number of simulations with quickly-evaluated metrics at each iteration Has comparable or better solution quality with respect to a simulation-based algorithm Has similar runtime to a metric-based algorithm

Thank You! Questions? adavoodi@wisc.edu

Simulation-based Approximation of SRR Done using X-Simulation but for an “observation window” instead of the entire the capture window e.g., Chatterjee et al [ICCAD’11] shows the SRR computed for an observation window of 64 cycles is sufficiently close to the SRR corresponding to the capture window of 4K cycles observation window << capture window DFF\Cycle 1 F1 X F2 F3 F4 F5

Metric-based Approximation of SRR Traced f1 f2 f4 f5 f3 Example “Visibility” metric proposed by Liu, et al [DATE’09] Visibility of a flipflop represents how much it can be restored using the currently-selected trance signals Summation of visibility of all untraced flipflops is used as an estimate of SRR Total Visibility = 2+1+1 = 4

Metric-based Approximation of SRR 𝑉 0 =0.25 𝑉 1 =0.75 Example metric “Visibility” Liu, et al [DATE’09] Two visibility metrics computed per gate output 𝑉 0 / 𝑉 1 : The probability that the value “0/1” is actually restored at the output of each gate Computed using iteratively traversing the circuit and updating the gate visibilities until convergence Total visibility is the summation of 𝑉 0 / 𝑉 1 over all the untraced flipflops Inaccurate approximation of SRR due to ignoring signal correlations f1 f2 f4 f5 f3 𝑉 0 =1 𝑉 1 =1 𝑉 0 =1 𝑉 1 =1 𝑉 0 =0.75 𝑉 1 =0.25 𝑉 0 =1 𝑉 1 =1 𝑉 0 =1 𝑉 1 =1 Traced Visibility = 1+1+0.25+0.75+0.75+0.25 = 4

Comparison of Solution Quality IV Circuit #Traces SRR Forward Greedy Ours Improvement S5378 8 13.5 13.6 -0.7% 16 7.9 8.0 -1.3% 32 4.2 +0.0% S9234 9.8 5.9 6.8 -13.2% 3.5 3.6 -2.8% S35932 59.3 61.4 -3.4% 37.4 38.3 -2.3% 22.3 23.4 -4.7% S38417 51.5 51.4 24.0 30.1 -19.6% 16.8 17.5 -4.0% S38584 25.1 +4.6% 20.7 18.5 +11.9% 18.0 +2.9% Forward greedy: Simulation combined with forward greedy selection strategy

Distribution of Impact Weights Itr. 1 Itr. 2 Itr. 3 Observed after three iterations in benchmark S38417 Impact Weights of top candidates are much higher than the remaining signals