EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.

Slides:



Advertisements
Similar presentations
The Primal-Dual Method: Steiner Forest TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA A A A AA A A.
Advertisements

1 LP Duality Lecture 13: Feb Min-Max Theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
EDA (CS286.5b) Day 10 Scheduling (Intro Branch-and-Bound)
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Lectures on Network Flows
ECE 667 Synthesis and Verification of Digital Systems
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
Background: Scan-Based Delay Fault Testing Sequentially apply initialization, launch test vector pairs that differ by 1-bit shift A vector pair induces.
EDA (CS286.5b) Day 2 Covering. Why covering now? Nice/simple cost model problem can be solved well (somewhat clever solution) general/powerful technique.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 23, 2008 Covering.
EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.
EDA (CS286.5b) Day 1 Introduction. Apple Pie Intro (1) How do we design modern computational systems? –Millions->billions of devices –used in everything.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 7: February 11, 2008 Static Timing Analysis and Multi-Level Speedup.
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 16: March 23, 2009 Covering.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2009 Partitioning 2 (spectral, network flow)
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 81 Lecture 8 Testability Measures n Origins n Controllability and observability n SCOAP measures 
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 17: March 30, 2009 Clustering (LUT Mapping, Delay)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 18, 2009 Static Timing Analysis and Multi-Level Speedup.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 26, 2015 Covering Work preclass exercise.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: February 4, 2014 Partitioning 2 (spectral, network flow)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 20, 2015 Static Timing Analysis and Multi-Level Speedup.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)
Lecture 6: Mapping to Embedded Memory and PLAs September 27, 2004 ECE 697F Reconfigurable Computing Lecture 6 Mapping to Embedded Memory and PLAs.
Data Structures and Algorithms in Parallel Computing Lecture 2.
ELEC692 VLSI Signal Processing Architecture Lecture 3
Pipelining and Retiming
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
Give qualifications of instructors: DAP
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 20: April 4, 2011 Static Timing Analysis and Multi-Level Speedup.
Iterative Improvement for Domain-Specific Problems Lecturer: Jing Liu Homepage:
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
CS 312: Algorithm Design & Analysis Lecture #29: Network Flow and Cuts This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 2: September 28, 2005 Covering.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
CS137: Electronic Design Automation
CS137: Electronic Design Automation
Lectures on Network Flows
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
CprE / ComS 583 Reconfigurable Computing
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
CS137: Electronic Design Automation
Fast Min-Register Retiming Through Binary Max-Flow
Presentation transcript:

EDA (CS286.5b) Day 19 Covering and Retiming

“Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open notes, books, etc. Make available before 12:01am Mon. 3/13 Due 11:59pm Wed. 3/15

Previously Cover (map) LUTs for minimum delay –day 3 –solve optimally Retiming for minimum clock period –day 18 –solve optimally Simultaneous Cover and 1D placement –day 9 –optimal area cover for trees

Today Solving cover/retime separately not optimal Cover+retime Review

Example

Example: Retimed

Note: only 4 signals here (2 w/ 2 delays each)

Example 2

Cycle Bound: 2

Example 2: retimed

Cycle Bound: 1

Basic Observation Registers break up circuit, limiting coverage –fragmentation –prevent grouping

Phase Ordering Problem General problem we’ve seen before –e.g. placement don’t know where connected neighbors will be if unplaced… –don’t know effect/results of other mapping step Here –don’t know delay (what can be packed into LUT) if retime first –not retime first fragmention: forced breaks at bad places

Observation #1 Retiming flops to input of (fanout free) subgraph is trivial (and always doable)

Observation #1: Consequence Can cover ignoring flop placement Then retime LUTs to input

Fanout Problem? Can I use the same trick here?

Fanout Problem? Cannot retime without replicating. Replicating increase I/O (so cut size).

Different Replication Problem

Can now retime and cover with single LUT.

Replication Once add registers –can’t just grab max flow and get replication (compare flowmap) Or, can’t just ignore flop placement when have reconvergent fanout through flop

Replication Key idea: –represent timing paths in graph –differentiating based on number of registers in path –new graph: all paths from node to output have same number of flip-flops –label nodes u d where d is flip-flops to output

Deal with Replication Expanded Graph: –start with target output node –for each input u to current expanded graph grab it’s input edge (x  u) with weight (w(e)) add node x (d+w(e)) to graph (if necessary) add edge x (d+w(e))  u d with weight (w(e)) –continue breadth first until have enough enough for flow cut at most k*n node depth required

Example b c a c0c0 i j

b c a c0c0 a0a0 b1b1 i j

b c a c0c0 a0a0 b1b1 i j i0i0 c1c1 j0j0

b c a c0c0 a0a0 b1b1 i j i0i0 c1c1 j0j0 a1a1 b2b2

Example 2 e ac bd e0e0 c0c0 d0d0 a1a1 a0a0 b0b0 b1b1 i1i1 j1j1 i0i0 j0j0

Expanded Graph Expanded graph does not have fanout of different flip-flop depths from the same node. Can now cover ignoring flip-flops and trivially retime.

Labeling Key idea #1: –compute distances/delay like flowmap dynamic programming Key idea #2: –count distance from register (like G-1/c graph)

Labeling: Edge Weights To target clock period c –use graph G-1/c –paper: assign weight -c*w(e)+1 (same thing scaled by c and negated)

Labeling: Edge Weight Idea same idea: –will need register ever c LUT delays –credit with registers as encounter –charge a fraction (1/c) every LUT delay –know net distance at each point –if negative (delays > c*registers) cannot distribute to achieve c –otherwise labeling tells where to distribute

Labeling: Flow cut Label node as before (flowmap) –L(v)=min{l(u)+w(e)|  u  v} –trivially can be L(v)-1/c == new LUT note min vs. max and -1/c vs. +1 due to rescaling to match retiming formulation and G-1/c graph in this formulation, a combinational circuit of depth 4 would have L(v)=-4/c –if can put this and all L(v)’s in one LUT this can be L(v) construct and compute flow cut to test

LUT Map and Retime Start with outputs Cover with LUT based on cut –move flip-flops to inputs of LUT Recursively cover inputs Use label to retime –r(v)=  l(v)  +1/c

Target Clock Period c As before (retiming) –binary search to find optimal c

Variations Relaxation/Iteration –original computed labels iteratively Flow cover –Cong+Wu/ICCAD96 showed can use flowmap- style min-cut Find all k-cuts first –Pan+Liu/FPGA’98

Summary Can optimally solve –LUT map for delay –retiming for minimum clock period But, solving separately does not give optimal solution to problem Account for registers on paths Label based on register placement and (flow) cover ignoring registers Labeling gives delay,covering, retiming

Today’s Big Ideas Exploit freedom Cost of decomposition –benefit of composite solution Technique: –dynamic programming –network flow

This Class: Decomposition Scheduling Logic Optimization –sequential, two-level, multi-level Covering/gate-mapping Retiming Partitioning Placement Routing

This Class: Techniques Dynamic Programming Linear Programming (LP, ILP) Graph Algorithms Greedy Algorithms Randomization Search Heuristics Approximation Algorithms Iterative/Relaxation

Spring Quarter Primarily Project –select problem –formulate –survey attacks, similar problems, brainstorm –implement –benchmark/analyze Few lectures on additional topics –most geared to projects –time to discuss, present project