Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning Wai-Kei Mak Dept. of Computer Science and Engineering University of South Florida.

Slides:



Advertisements
Similar presentations
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Advertisements

February 20, Spatio-Temporal Bandwidth Reuse: A Centralized Scheduling Mechanism for Wireless Mesh Networks Mahbub Alam Prof. Choong Seon Hong.
L30: Partitioning 성균관대학교 조 준 동 교수
A Graph-Partitioning-Based Approach for Multi-Layer Constrained Via Minimization Yih-Chih Chou and Youn-Long Lin Department of Computer Science, Tsing.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Timing Optimization. Optimization of Timing Three phases 1globally restructure to reduce the maximum level or longest path Ex: a ripple carry adder ==>
Global Flow Optimization (GFO) in Automatic Logic Design “ TCAD91 ” by C. Leonard Berman & Louise H. Trevillyan CAD Group Meeting Prepared by Ray Cheung.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Lectures on Network Flows
The Simplex Method: Standard Maximization Problems
1 Network Coding: Theory and Practice Apirath Limmanee Jacobs University.
MCFRoute: A Detailed Router Based on Multi- Commodity Flow Method Xiaotao Jia, Yici Cai, Qiang Zhou, Gang Chen, Zhuoyuan Li, Zuowei Li.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.
Chapter 2 – Netlist and System Partitioning
Bogdan Tanasa, Unmesh D. Bordoloi, Petru Eles, Zebo Peng Department of Computer and Information Science, Linkoping University, Sweden December 3, 2010.
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
Utrecht, february 22, 2002 Applications of Tree Decompositions Stan van Hoesel KE-FdEWB Universiteit Maastricht
 Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD  Y. Hu,
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
1 Circuit Partitioning Presented by Jill. 2 Outline Introduction Cut-size driven circuit partitioning Multi-objective circuit partitioning Our approach.
1 CSC 6001 VLSI CAD (Physical Design) January
MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD
MAXIMIZING SPECTRUM UTILIZATION OF COGNITIVE RADIO NETWORKS USING CHANNEL ALLOCATION AND POWER CONTROL Anh Tuan Hoang and Ying-Chang Liang Vehicular Technology.
Escape Routing For Dense Pin Clusters In Integrated Circuits Mustafa Ozdal, Design Automation Conference, 2007 Mustafa Ozdal, IEEE Trans. on CAD, 2009.
1 Coordinator MPC for maximization of plant throughput Elvira Marie B. Aske* &, Stig Strand & and Sigurd Skogestad* * Department of Chemical Engineering,
Flows and Networks Plan for today (lecture 5): Last time / Questions? Blocking of transitions Kelly / Whittle network Optimal design of a Kelly / Whittle.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.
Modeling and Evaluation with Graph Mohammad Khalily Dermany Islamic Azad University, Khomein branch.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
BSG-Route: A Length-Matching Router for General Topology T. Yan and M. D. F. Wong University of Illinois at Urbana-Champaign ICCAD 2008.
Applications of the Max-Flow Min-Cut Theorem. S-T Cuts SF D H C A NY S = {SF, D, H}, T={C,A,NY} [S,T] = {(D,A),(D,C),(H,A)}, Cap [S,T] =
Maximization of Network Survivability against Intelligent and Malicious Attacks (Cont’d) Presented by Erion Lin.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
ARCHER:A HISTORY-DRIVEN GLOBAL ROUTING ALGORITHM Muhammet Mustafa Ozdal, Martin D. F. Wong ICCAD ’ 07.
The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,
Unit 13 Analysis of Clocked Sequential Circuits Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information.
Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.
1. Placement of Digital Microfluidic Biochips Using the T-tree Formulation Ping-Hung Yuh 1, Chia-Lin Yang 1, and Yao-Wen Chang 2 1 Dept. of Computer Science.
Configurable Multi-product Floorplanning Qiang Ma, Martin D.F. Wong, Kai-Yuan Chao ASP-DAC 2010.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.
Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.
Branch and Bound Algorithms Present by Tina Yang Qianmei Feng.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
Markov Random Fields in Vision
1 Chapter 6 Reformulation-Linearization Technique and Applications.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Tuesday, March 19 The Network Simplex Method for Solving the Minimum Cost Flow Problem Handouts: Lecture Notes Warning: there is a lot to the network.
CS137: Electronic Design Automation
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
ESE535: Electronic Design Automation
Chapter 2 – Netlist and System Partitioning
SAT-Based Optimization with Don’t-Cares Revisited
Timing Optimization.
A Fundamental Bi-partition Algorithm of Kernighan-Lin
CS137: Electronic Design Automation
Illustrative Example p p Lookup Table for Digits of h g f e ) ( d c b
Fast Min-Register Retiming Through Binary Max-Flow
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning Wai-Kei Mak Dept. of Computer Science and Engineering University of South Florida Evangeline F.Y. Young Dept. of Computer Science and Engineering The Chinese University of Hong Kong

Outline I.Dynamically reconfigurable FPGA II.Temporal partitioning = Conventional partitioning? III.Temporal logic replication  What?  Why?  How? IV.Experimental results V.Conclusions

Dynamically Reconfigurable FPGA zStore multiple contexts on chip. zReuse logic blocks and wire segments dynamically. zThe contexts stored can correspond to the multiple stages of a large circuit.

Temporal Circuit Partitioning zTemporal partitioning  multiple stages execute sequentially z Spatial partitioning  multiple components execute concurrently

Temporal Logic Replication zCan reduce buffering requirement. zEffectively utilize available slack logic capacity.

Temporal Constraints zFor a net n = (v 1, {v 2, …, v p }),  require s(v 1 )  s(v j ), j=2,…,p, if v 1 is a combinational node

Temporal Constraints (Cont’d)  require s(v j )  s(v 1 ), j=2,…,p, if v 1 is a flip-flop node

Temporal Partitioning with Replication Problem: Partition given circuit into pre-defined # stages satisfying all temporal constraints. Objective: Minimize buffers required between stages. Proposal: Utilize available slack logic capacity to reduce signal buffering. Solution: An effective 2-step approach.

2-Step Approach Step 1: Compute a temporal partition w/o replication. Step 2: Repeatedly identify the bottleneck stage and apply replication for that stage.

Advantages of 2-Step Approach zWill not replicate unnecessarily. zAll temporal constraints are already satisfied when replicating.

Min-Area Min-Cut Replication Let stage i be the bottleneck stage. Min-Cut Replication  Compute a subset of nodes R i in stage i for replication into stage i+1 to maximally reduce the communication cost at stage i. Min-Area Min-Cut Replication  Compute a minimum subset of nodes R i in stage i for replication into stage i+1 to maximally reduce the communication cost at stage i.

Optimal Solution for Min-Area Min-Cut Replication Observation 1 : The min-cut replication problem can be solved by computing a minimum cut (V i -R i,R i ) in stage i. Observation 2 : The min-area min-cut replication problem can be solved by computing a minimum cut (V i -R i,R i ) in stage i s.t. |R i | is minimized. Let V i = set of nodes in stage i.

Example A pre-partition: Computing a minimum cut in stage 2:

Example (Cont’d)  Computed R 2 = {j}

Network Modeling zNeed to ensure that cut size = buffer requirement zFor a net (v 1, {v 2, …, v p }),

The Case of Limited Slack Logic Capacity zThe solution of min-area min-cut replication suffices if slack logic capacity is sufficiently large.  Otherwise, |R i | exceeds the slack, then use a heuristic to reduce R i.  Use a repeated max-flow min-cut heuristic to gradually reduce R i (so cut size is only increased gradually). zH. Yang, D.F. Wong, “Efficient Network Flow based Min-Cut Balanced Partitioning”, ICCAD’94.

Algorithm Input: Stage area bound A. 1. Network modeling for bottleneck stage i. 2. Compute min-cut (V i -R i,R i ) s.t. |R i | is minimized. 3. If |V i+1 |+|R i |  A, stop and return R i. 4. Collapse a node in R i with all nodes in V i -R i, goto 2.

Experimental Results Circuit#buf w/o rep.#buf w/ rep.Imprv. %Rep. % C C C C S S S S S

Conclusions zProposed temporal logic replication to reduce buffering requirement in DRFPGA partitioning. zPresented an effective 2-step approach. zFormulated and optimally solved the min-area min-cut replication problem. zExtended to case of limited slack logic capacity. zIn the paper, a new timing-driven temporal partitioning algorithm was introduced to compute pre-partition.