Global Critical Path: A Tool for System-Level Timing Analysis

Slides:



Advertisements
Similar presentations
Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.
Advertisements

Mihai Budiu Microsoft Research – Silicon Valley joint work with Girish Venkataramani, Tiberiu Chelcea, Seth Copen Goldstein Carnegie Mellon University.
TUNING SOC’S USING THE DYNAMIC CRITICAL PATH
Mihai Budiu Microsoft Research – Silicon Valley Girish Venkataramani, Tiberiu Chelcea, Seth Copen Goldstein Carnegie Mellon University Spatial Computation.
Mihai Budiu May 23, Based On Critical Path: A Tool for System-Level Timing Analysis Girish Venkataramani, Tiberiu Chelcea, Mihai Budiu, and Seth.
On the Critical Path of (Parallel) Computations Mihai Budiu March 30, 2005.
1 Lecture 16 Timing  Terminology  Timing issues  Asynchronous inputs.
Synchronous Sequential Logic
Combinational Logic.
Table 7.1 Verilog Operators.
Model Checking I What are LTL and CTL?. and or dreq q0 dack q0bar.
Clockless Logic System-Level Specification and Synthesis Ack: Tiberiu Chelcea.
VanarSena: Automated App Testing. App Testing Test the app for – performance problems – crashes Testing app in the cloud – Upload app to a service – App.
Model Checking I What are LTL and CTL?. and or dreq q0 dack q0bar D D.
RTL Hardware Design by P. Chu Chapter 161 Clock and Synchronization.
Give qualifications of instructors: DAP
Courseware Path-Based Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,
Digital System Design by Verilog University of Maryland ENEE408C.
ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.
Software modeling for embedded systems: static and dynamic behavior.
1 Software Testing and Quality Assurance Lecture 41 – Software Quality Assurance.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Sequential Logic 1  Combinational logic:  Compute a function all at one time  Fast/expensive  e.g. combinational multiplier  Sequential logic:  Compute.
Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.
Models of Computation for Embedded System Design Alvise Bonivento.
Spatial Computation Mihai Budiu CMU CS CALCM Seminar, Oct 21, 2003.
Logic Simulation 2 Outline –Timing Models –Simulation Algorithms Goal –Understand timing models –Understand simulation algorithms Reading –Gate-Level Simulation.
ENEE 408C Lab Capstone Project: Digital System Design Fall 2005 Sequential Circuit Design.
Embedded Systems Hardware: Storage Elements; Finite State Machines; Sequential Logic.
مرتضي صاحب الزماني  The registers are master-slave flip-flops (a.k.a. edge-triggered) –At the beginning of each cycle, propagate values from primary inputs.
CS 151 Digital Systems Design Lecture 32 Hazards
ASH: A Substrate for Scalable Architectures Mihai Budiu Seth Copen Goldstein CALCM Seminar, March 19, 2002.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.
Synthesis Presented by: Ms. Sangeeta L. Mahaddalkar ME(Microelectronics) Sem II Subject: Subject:ASIC Design and FPGA.
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
False Path. Timing analysis problems We want to determine the true critical paths of a circuit in order to: –To determine the minimum cycle time that.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Mahapatra-A&M-Fall'001 Co-design Finite State Machines Many slides of this lecture are borrowed from Margarida Jacome.
1 CSE370, Lecture 17 Lecture 17 u Logistics n Lab 7 this week n HW6 is due Friday n Office Hours íMine: Friday 10:00-11:00 as usual íSara: Thursday 2:30-3:20.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
1 CSE-308 Digital System Design (DSD) N-W.F.P. University of Engineering & Technology, Peshawar.
Topics Combinational network delay.
Parallel Routing for FPGAs based on the operator formulation
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Multi-Level Logic Synthesis.
TOPIC : Controllability and Observability
03/31/031 ECE 551: Digital System Design & Synthesis Lecture Set 8 8.1: Miscellaneous Synthesis (In separate file) 8.2: Sequential Synthesis.
Verification Technologies IBM Haifa Labs Formal Specification Using Sugar 2.0 Cindy Eisner September 2002.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
Static Timing Analysis
Synchronous Sequential Circuits by Dr. Amin Danial Asham.
SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2012 Synchronous Circuits.
Slack Analysis in the System Design Loop Girish VenkataramaniCarnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
Spatial Computation Computing without General-Purpose Processors
Blame Passing for Analysis and Optimisation
触发器 Flip-Flops 刘鹏 浙江大学信息与电子工程学院 March 27, 2018
Hardware Description Languages
CS341 Digital Logic and Computer Organization F2003
CSE 370 – Winter Sequential Logic - 1
332:437 Lecture 8 Verilog and Finite State Machines
EE115C – Winter 2009 Digital Electronic Circuits
De-synchronization: from synchronous to asynchronous
ECE 551: Digital System Design & Synthesis
Lecture 19 Logistics Last lecture Today
332:437 Lecture 8 Verilog and Finite State Machines
Fast Min-Register Retiming Through Binary Max-Flow
Preliminary design of the behavior level model of the chip
Instructor: Michael Greenbaum
Presentation transcript:

Global Critical Path: A Tool for System-Level Timing Analysis Mihai Budiu May 23, 2007

Based On Critical Path: A Tool for System-Level Timing Analysis Girish Venkataramani, Tiberiu Chelcea, Mihai Budiu, and Seth C. Goldstein, Design Automation Conference (DAC), San Diego, CA, June 4-8, 2007 Girish Venkataramani: summer intern here in 2005 Now graduating from CMU His Ph.D. thesis: A System Level Timing Analysis and Optimization Methodology for Hardware Compilation is based on the Global Critical Path

Critical Path Longest path between source and sink in DAG

Synchronous Combinational Circuits Longest signal propagating path between two consecutive latches. clk > crit path Latch Latch clk

Events Events = Signal Transitions on edges E Circuit (V, E) Events = (n1, t1) → (n2, t2) Events = Signal Transitions on edges E Circuit (V, E)

Chaining of Events Circuit (V, E)

Note: easy to model node computation delay too. Timed Graph Event: signal from (A, t1) to (B, t3) A A B B t0 t1 t2 t3 || (n1,t2) → (n2,t2) || = t2 – t1 Dynamic Critical Path = longest path in Timed Graph Note: easy to model node computation delay too.

Goal: Apply to Real Circuits In this work focused on asynchronous 4-way handshake circuits Delay C H/S + + reg reg Delay C H/S reqo + data reg 1 2 3 4 reqi Delay acki C H/S acko acki reqi data

Model Stages Using Behaviors reqo + data reg reqi Delay C H/S acko acki Behavior Input transitions (precondition) Output transitions (postcondition) Compute reqi0↑, reqi1↑, ack0↓ req0↑, acki↑ Return to zero req ack0↑ req0↓ Return to zero ack reqi0↓, reqi1↓ acki↓

Behaviors can Handle Choice arbiter mux Deterministic (unique) choice Nondeterministic choice In the absence of choice and non-deterministic delays a static analysis can determine the GCP.

Runtime: Locally Critical Events req0↑ acki↑ reqi0↑ reqi1↑ ack0↓ timeline Behavior Input transitions (precondition) Output transitions (postcondition) Compute reqi0↑, reqi1↑, ack0↓ req0↑, acki↑ Return to zero req ack0↑ req0↓ Return to zero ack reqi0↓, reqi1↓ acki↓

GCP Computation Algorithm 3. Some transitions repeated 2. Trace back along locally critical input event 1. Start from last node executed 0. At run-time each node records locally critical events

Possible Locally Critical Paths 2 1 reqi ↓ reqi↑ req0↑ acko↓ acki↓ 3 4 reqi ↑ req0↓ acko↓ acko↑ acki ↑

Chaining Events Backwards 1 acko↓ req0↑ reqi↑ 1 reqi↑ acki↓ reqi ↓ 2 req0↑ acko↓ acko↑ req0↓ 3 acko↓ acki ↑ reqi ↑ 4

Theorem PATHdata = [req↑]* PATHsync = [ack↑→ req↓→ ack↓]* GCP = [PATHdata → PATHsync]*

What does this mean? PATHdata = [req↑]* Good: wait for data PATHsync = [ack↑→ req↓→ ack↓]* Maybe bad: synchronization problem GCP = [PATHdata → PATHsync]*

An Example reqAD↑→ [reqDE↑→reqEG↑→ackGJ↑→reqJA↑]9 →reqDE↑→reqEG↑ →reqGM↑ →reqMN↑ reqAD↑→ [reqDE↑→reqEG↑→ackGC↑→reqCE↓→ackED↓]9 →reqDE↑→reqEG↑ →reqGM↑ →reqMN↑

Critical Path Toolflow GCP Feedback path CASH core Verilog back-end P/R model GCP extraction Synopsys, Cadence P/R PLI calls CASH generates Verilog, which is then synthesized using commercial tools in a 180nm technology. The results are evaluated using Verilog-level simulation. We only model the computation and memory access network; we do not model the memory itself, the cost of the rest of the computation, and the overhead of starting/stopping computation. We do not include the cost of memory in this comparison. Execution trace ModelSim asynchronous circuit layout Input data

Effectiveness

Conclusions: Global Critical Path Is defined as a path on the timed graph. Tracks dependences. Can be computed by automatic tools. Summarizes concurrent computation bottlenecks. Can be incorporated in a feedback loop. to drive optimizations and de-optimizations. Is a profiling (input-dependent) concept.