1 Temperature-Aware Resource Allocation and Binding in High Level Synthesis Authors: Rajarshi Mukherjee, Seda Ogrenci Memik, and Gokhan Memik Presented.

Slides:



Advertisements
Similar presentations
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
Advertisements

Topics Electrical properties of static combinational gates:
ECE 667 Synthesis and Verification of Digital Circuits
Minimum Spanning Tree Sarah Brubaker Tuesday 4/22/8.
NTHU-CS VLSI/CAD LAB TH EDA De-Shiuan Chiou Da-Cheng Juan Yu-Ting Chen Shih-Chieh Chang Department of CS, National Tsing Hua University, Taiwan Fine-Grained.
SensIT: Jan LIGHTWEIGHT CRYPTOGRAPHIC TECHNIQUES Horace Yuen, Alan Sahakian Northwestern University Agnes Chan Northeastern University Majid Sarrafzadeh.
CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.
3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
CSE 421 Algorithms Richard Anderson Lecture 23 Network Flow Applications.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
1 Closed-Loop Modeling of Power and Temperature Profiles of FPGAs Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College.
May 18, 2004MS Defense: Uppalapati1 Low Power Design of Standard Cell Digital VLSI Circuits By Siri Uppalapati Thesis Directors: Prof. M. L. Bushnell and.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Power-Aware Placement
Temperature-Aware SoC Test Scheduling Considering Inter-Chip Process Variation Nima Aghaee, Zhiyuan He, Zebo Peng, and Petru Eles Embedded Systems Laboratory.
 Based on the resource constraints a lower bound on the iteration interval is estimated  Synthesis targeting reconfigurable logic (e.g. FPGA) faces the.
From Compaq, ASP- DAC00. Power Consumption Power consumption is on the rise due to: - Higher integration levels (more devices & wires) - Rising clock.
EECS Department, Northwestern University, Evanston Thermal-Induced Leakage Power Optimization by Redundant Resource Allocation Min Ni and Seda Ogrenci.
Analytical Thermal Placement for VLSI Lifetime Improvement and Minimum Performance Variation Andrew B. Kahng †, Sung-Mo Kang ‡, Wei Li ‡, Bao Liu † † UC.
Thermal-Aware SoC Test Scheduling with Test Set Partitioning and Interleaving Zhiyuan He 1, Zebo Peng 1, Petru Eles 1 Paul Rosinger 2, Bashir M. Al-Hashimi.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
Partitioning 1 Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem.
1 Jieyi Long, Ja Chun Ku, Seda Ogrenci Memik, Yehea Ismail Dept. of EECS, Northwestern Univ. SACTA: A Self-Adjusting Clock Tree Architecture to Cope with.
The CMOS Inverter Slides adapted from:
UC San Diego / VLSI CAD Laboratory Reliability-Constrained Die Stacking Order in 3DICs Under Manufacturing Variability Tuck-Boon Chan, Andrew B. Kahng,
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
Chalmers University of Technology FlexSoC Seminar Series – Page 1 Power Estimation FlexSoc Seminar Series – Daniel Eckerbert
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
CAD for Physical Design of VLSI Circuits
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
Chapter 4 sections 1 and 2.  Fig. 1  Not connected  All vertices are even.  Fig. 2  Connected  All vertices are even.
The George Washington University School of Engineering and Applied Science Department of Electrical and Computer Engineering ECE122 – Lab 7 MOSFET Parameters.
The George Washington University School of Engineering and Applied Science Department of Electrical and Computer Engineering ECE122 – Lab 7 MOSFET Parameters.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems Junhe Gan, Flavius Gruian, Paul Pop, Jan Madsen.
Robust Low Power VLSI ECE 7502 S2015 Minimum Supply Voltage and Very- Low-Voltage Testing ECE 7502 Class Discussion Elena Weinberg Thursday, April 16,
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Basics of Energy & Power Dissipation
DTM and Reliability High temperature greatly degrades reliability
ELEC692 VLSI Signal Processing Architecture Lecture 3
A High-Level Synthesis Flow for Custom Instruction Set Extensions for Application-Specific Processors Asia and South Pacific Design Automation Conference.
FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
University of Rostock Institute of Applied Microelectronics and Computer Engineering Monitoring and Control of Temperature in Networks-on- Chip Tim Wegner,
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
Review for E&CE Find the minimal cost spanning tree for the graph below (where Values on edges represent the costs). 3 Ans. 18.
Test complexity of TED operations Use canonical property of TED for - Software Verification - Algorithm Equivalence check - High Level Synthesis M ac iej.
Carnegie Mellon Lecture 8 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm Reading: Chapter 10.5 – 10.6 M. LamCS243: Software.
-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.
CS203 – Advanced Computer Architecture
CS203 – Advanced Computer Architecture
Memory Segmentation to Exploit Sleep Mode Operation
Architecture and Synthesis for Multi-Cycle Communication
Minimum Spanning Trees and Shortest Paths
Circuit Design Techniques for Low Power DSPs
Timing Optimization.
AB AC AD AE AF 5 ways If you used AB, then, there would be 4 remaining ODD vertices (C, D, E and F) CD CE CF 3 ways If you used CD, then, there.
Lev Finkelstein ISCA/Thermal Workshop 6/2004
Kruskal’s Algorithm AQR.
HotAging — Impact of Power Dissipation on Hardware Degradation
Achieving Design Closure Through Delay Relaxation Parameter
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

1 Temperature-Aware Resource Allocation and Binding in High Level Synthesis Authors: Rajarshi Mukherjee, Seda Ogrenci Memik, and Gokhan Memik Presented by: Nivya Papakannu ECE Department, UMASS Amherst

2 Overview: Introduction Temperature-Aware High Level Synthesis –Temperature Model and Assumptions –Resource Allocation and Binding Experimental Setup & Results Conclusions

3 Introduction: Continuous technology scaling following Moore’s Law Billion transistor IC Massive computational power Increase in power density –Increase in temperature One of the biggest challenges in VLSI design

4 Need for thermal Awareness: Functional Incorrectness –Carrier mobility decrease –Interconnect resistance increase Reliability Issues –Electro-migration –Transient and Permanent faults Thermal Considerations –10  C rise – component failure rate doubles Non-uniform distribution –“HOTSPOTS ” Leakage Power –Dominant in current technologies –Increasing with future technologies –Exponential dependence on temperature Higher Temperature  Higher PowerHigher Temperature  Higher Power

5 Thermal Awareness in HLS: Can we prevent high temperatures in the first place? Will power optimization help? –Not always –No individual consideration Incorporate physical phenomenon in all stages of design flow –Thermal driven floor planning and placement –Thermal Aware High Level Synthesis

6 Temperature Model & Leakage Model: Thermal Model –Analogy between heat transfer and RC circuits  T tot is the temperature contribution due to power dissipation –P tot = P switch + P leakage –  t = one clock cycle duration Temperature Variation –Modeled as exponential transient behavior analogous to electrical time constant RC –R : thermal resistance, C: thermal capacitance Leakage Model Leakage has exponential dependence –Threshold voltage V th –Temperature T 4 th order polynomial to represent P leakage At 180nm –15% of dynamic power at ambient temperature –Doubles every 25  C

7 Temperature Aware Resource Allocation and Binding: Scheduled Data Flow Graph –Allocation and Binding Compatibility graph for each operation type –Operations are vertices –Edges labeled with switched capacitance Two Modes for optimizing temperature –Temperature constrained resource minimization (TC) –Resource constrained temperature minimization (RC)

8 DFG & Compatibility Graph: For k resources – finds k paths s.t. sum of edge weights is min Data Flow Graph Compatibility Graph R1R2

9 Relaxation: Relaxation –Determine the predecessor or parent of each vertex –Relaxation idea based on Dijkstra’s shortest path algorithm –For each vertex the best parent is determined through which we could reach the vertex by relaxing the vertices based on the constraint criteria. Temperature Constrained (TC) –Relax vertices that do not violate temperature constraint Resource Constrained (RC) –Relax vertex with minimum rise in temperature.

10 Temperature Aware Resource Allocation and Binding Determine the parent of each vertex –Relaxation sw ab sw bc sw de sw ef sw fg gfedcb sw cd a

11 Temperature Constrained Resource Allocation and Binding Determine the parent of each vertex –Relaxation sw ab sw bc sw de sw ef sw fg gfedcb sw cd a

12 Temperature Constrained Resource Allocation and Binding Determine the parent of each vertex –Relaxation sw ab sw bc sw de sw ef sw fg gfedcb sw cd a T1T1 a Candidates Temperature of R1 TaTa

13 Temperature Constrained Resource Allocation and Binding Determine the parent of each vertex –Relaxation sw ab T2T2 sw bc sw de sw ef sw fg gfedcb sw cd a T2T2 T2T2 T2T2 T2T2 T2T2  (a) abac ag Temperature of R1 T ab T ac T ag T1T1 Candidates

14 Temperature Constrained Resource Allocation and Binding Determine the parent of each vertex –Relaxation sw ab T2T2 sw bc sw de sw ef sw fg gfedcb sw cd a T2T2 T3T3 T3T3 T2T2 T3T3  (a)  (b)  (a)  (b) abacdabe abc X Temperature of R1 T abd T abe T ac T1T1 Candidates

15 Temperature Constrained Resource Allocation and Binding Determine the parent of each vertex –Relaxation sw ab T2T2 sw bc sw de sw ef sw fg gfedcb sw cd a T2T2 T3T3 T3T3 T2T2 T4T4  (a)  (b)  (a)  (d) abde abdg X T1T1 Candidates Temperature of R1 T abdg abe T abe ac T ac

16 Temperature Constrained Resource Allocation and Binding Determine the parent of each vertex –Relaxation –Select the longest path –Bind to a resource –Shown for temperature constrained binding sw ab T2T2 gfedcba T2T2 T3T3 T3T3 T2T2 T4T4  (a)  (b)  (a)  (d) Resource 1 sw dg sw bd abdg R1 T1T1

17 Resource Constrained Allocation and Binding Determine the parent of each vertex –Relaxation sw ab sw bc sw de sw ef sw fg gfedcb sw cd a a T1T1 TaTa Temperature on R1 Candidate

18 Resource Constrained Allocation and Binding Determine the parent of each vertex –Relaxation sw ab T2T2 sw bc sw de sw ef sw fg gfedcb sw cd a  (a) T1T1 T ab ab Temperature on R1 Candidate

19 Resource Constrained Allocation and Binding Determine the parent of each vertex –Relaxation sw ab T2T2 sw bc sw de sw ef sw fg gfedcb sw cd a T3T3  (a) ab  (b) d T1T1 Temperature on R1 T abd Candidate

20 Resource Constrained Allocation and Binding Determine the parent of each vertex –Relaxation sw ab T2T2 sw bc sw de sw ef sw fg gfedcb sw cd a T3T3 T4T4  (a) ab R1  (b) d  (d) f T1T1 Temperature on R1 T abdf Candidate

21 Resource Constrained Allocation and Binding Determine the parent of each vertex –Relaxation sw ab T2T2 sw bc sw de sw ef sw fg gfedcb sw cd a T3T3 T4T4 T5T5  (a) ab R1  (b) d  (d) f  (f) g T1T1 Temperature on R1 T abdfg Candidate

22 Resource Constrained Allocation and Binding Determine the parent of each vertex –Relaxation –Returns the longest path –Bind to a resource –Shown for resource constrained binding sw ab gfedcba  (a)  (b)  (d)  (f) Resource 1 sw fd sw bd ab R1 dfg sw fg T2T2 T3T3 T4T4 T5T5 T1T1

23 Temperature Aware Resource Allocation and Binding Determine the parent of each vertex –Relaxation –Select the longest path –Bind to operations a resource –Remove operations from comparability graph –Build new comparability graph –Continue until all operations are bound to a resource sw ef fec sw ce

24 Temperature Aware Resource Allocation and Binding Successive paths from relaxation represent binding of the operations to a new resource Post-Processing –Merging/dividing resources sw ab T i+1 sw ef gfedcba T i+1  (a)  (c)  (d) Resource 1 Resource 2 sw dg sw bd sw cf  (b) abdg cf R1 R2

25 Experimental Flow: Applications in C SUIF Scheduler DFGs of Popular DSP Algorithms Min-Cost Flow Binding Temperature-Aware Allocation & Binding Min Resource Binding under TC Binding with optimal switching Min Temperature Binding under RC Temperature-Aware Binding DFGs RC TC Synopsys DC for Capacitance Extraction ModelSim Simulation for Switching Activity Compare with low power binding

26 Resource Overhead BenchmarksSW_OPT [MUL, ALU] TC_R_MIN [MUL, ALU] ewf3, 54, 8 arf4, 25, 4 jctrans_12, 32, 7 jctrans_20, 40, 6 jdmerge13, 63, 7 jdmerge23, 63, 9 jdmerge33, 63, 9 jdmerge43, 55, 9 motion_24, 66, 8 motion_34, 66, 8 noise_est_23, 44, 7 28% increase in MULs 54% increase in ALUs

27 Experimental Results – Temperature Maximum Temperature Reached by ALUs 11.9  C 3.6  C 19.2  C11.2  C

28 Experimental Results – Temperature Maximum Temperature Reached by Multipliers 7.6  C2.7  C 10.3  C 18.9  C

29 Experimental Results – Leakage Power Normalized leakage power consumption of the three techniques at 180nm 9% 2% 2.18

30 Experimental Results – Total Power Normalized total power consumption of the three techniques at 180nm 34% 5% 2.38

31 Conclusions: Introduced Resource binding Techniques to create temperature-awareness in HLS Temperature-aware resource allocation and binding Effectively minimized the maximum temperature reached by a module –Temperature constrained –Resource constrained Leakage and total power savings in future technologies A reliability driven methodology can leverage on this mechanism to prevent or reduce likelihood of hotspots on a chip