A Thermal-Aware Mapping Algorithm for Reducing Peak Temperature of an Accelerator Deployed in a 3D Stack A Thermal-Aware Mapping Algorithm for Reducing.

Slides:



Advertisements
Similar presentations
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Advertisements

National Tsing Hua University Po-Yang Hsu,Hsien-Te Chen,
Performance Evaluations of Finite Difference Applications Realized on a Single Flux Quantum Circuits-Based Reconfigurable Accelerator Hiroaki Honda 1,
Kyushu University KL, Malaysia Hardware and Software Requirements for Implementing a High-Performance Superconductivity Circuits-Based Accelerator Farhad.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Module R R RRR R RRRRR RR R R R R Technion – Israel Institute of Technology The Era of Many-Module SoC: Revisiting the NoC Mapping Problem Isask’har (Zigi)
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Trevor Burton6/19/2015 Multiprocessors for DSP SYSC5603 Digital Signal Processing Microprocessors, Software and Applications.
Thermal-Aware SoC Test Scheduling with Test Set Partitioning and Interleaving Zhiyuan He 1, Zebo Peng 1, Petru Eles 1 Paul Rosinger 2, Bashir M. Al-Hashimi.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute.
Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula School of Computing, Informatics, and Decision Systems Engineering Arizona State University June 2013.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer.
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
SoC TAM Design to Minimize Test Application Time Advisor Dr. Vishwani D. Agrawal Committee Members Dr. Victor P. Nelson, Dr. Adit D. Singh Apr 9, 2015.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
Abhishek Pandey Reconfigurable Computing ECE 506.
System-level, Unified In-band and Out-of-band Dynamic Thermal Control Dong LiVirginia Tech Rong GeMarquette University Kirk CameronVirginia Tech.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
1 Customer-Aware Task Allocation and Scheduling for Multi-Mode MPSoCs Lin Huang, Rong Ye and Qiang Xu CHhk REliable computing laboratory (CURE) The Chinese.
A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B.
Generating and Executing Multi-Exit Custom Instructions for an Adaptive Extensible Processor Hamid Noori †, Farhad Mehdipour ‡, Kazuaki Murakami †, Koji.
Background Gaussian Elimination Fault Tolerance Single or multiple core failures: Single or multiple core additions: Simultaneous core failures and additions:
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
Thermal-aware Issues in Computers IMPACT Lab. Part A Overview of Thermal-related Technologies.
An Integrated Temporal Partitioning and Mapping Framework for Handling Custom Instructions on a Reconfigurable Functional Unit Farhad Mehdipour †, Hamid.
Test Architecture Design and Optimization for Three- Dimensional SoCs Li Jiang, Lin Huang and Qiang Xu CUhk Reliable Computing Laboratry Department of.
Software Architecture for Dynamic Thermal Management in Datacenters Tridib Mukherjee Graduate Research Assistant IMPACT Lab ( Department.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems.
Design Space Exploration for a Coarse Grain Accelerator Farhad Mehdipour, Hamid Noori, Morteza Saheb Zamani*, Koji Inoue, Kazuaki Murakami Kyushu University,
Panorama Space Modeling Method for Observing an Object Jungil Jung, Heunggi Kim, Jinsoo Cho Gachon Univ., Bokjeong-dong, Sujeong-gu, Seongnam-si, Gyeonggi-do,
Developing an Architecture for a Single-Flux Quantum Based Reconfigurable Accelerator F. Mehdipour, Hiroaki Honda *, H. Kataoka, K. Inoue and K. Murakami.
Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
By P.-H. Lin, H. Zhang, M.D.F. Wong, and Y.-W. Chang Presented by Lin Liu, Michigan Tech Based on “Thermal-Driven Analog Placement Considering Device Matching”
High Performance, Low Power Reconfigurable Processor for Embedded Systems Farhad Mehdipour, Hamid Noori, Koji Inoue, Kazuaki Murakami Kyushu University,
Temperature-Sensitive Loop Parallelization for Chip Multiprocessors Sri HK Narayanan, Guilin Chen, Mahmut Kandemir, Yuan Xie Embedded Mobile Computing.
ROUTING ARCHITECTURE AND ALGORITHMS FOR A SUPERCONDUCTIVITY CIRCUITS-BASED COMPUTING HARDWARE Farhad Mehdipour, Hiroaki Honda, Hiroshi Kataoka, Koji Inoue,
Let’s Open Up New Fields for Next 10X! Koji Inoue Kyushu University, Japan
Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.
1 MOTIVATION AND OBJECTIVE  Discrete Signal Transforms (DSTs) –DFT, DCT: major performance component in many applications –Hardware accelerated but at.
ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009.
Task Mapping and Partition Allocation for Mixed-Criticality Real-Time Systems Domițian Tămaș-Selicean and Paul Pop Technical University of Denmark.
CS203 – Advanced Computer Architecture
NCTU, CS VLSI Information Processing Research Lab 研究生 : ABSTRACT Introduction NEW Recursive DFT/IDFT architecture Low computation cycle  1/2: Chebyshev.
Distributed Sequencing for Resource Sharing in Multi-Applicative Heterogeneous NoC Platforms 林鼎原 Department of Electrical Engineering National Cheng Kung.
Thermal Aware EM Computation
Thermal-aware Task Placement in Data Centers (part 4)
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
Hamid Noori*, Farhad Mehdipour†, Norifumi Yoshimastu‡,
Methodology of a Compiler that Compresses Code using Echo Instructions
ELEC 7770 Advanced VLSI Design Spring 2014 Introduction
NETWORK-ON-CHIP HARDWARE ACCELERATORS FOR BIOLOGICAL SEQUENCE ALIGNMENT Author: Souradip Sarkar; Gaurav Ramesh Kulkarni; Partha Pratim Pande; and Ananth.
EPIMap: Using Epimorphism to Map Applications on CGRAs
Multiprocessor network topologies
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
An Automated Design Flow for 3D Microarchitecture Evaluation
Die Stacking (3D) Microarchitecture -- from Intel Corporation
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Presentation transcript:

A Thermal-Aware Mapping Algorithm for Reducing Peak Temperature of an Accelerator Deployed in a 3D Stack A Thermal-Aware Mapping Algorithm for Reducing Peak Temperature of an Accelerator Deployed in a 3D Stack Farhad Mehdipour 1, Krishna Chaitanya Nunna 2, Lovic Gauthier 2, Koji Inoue 2, Kazuaki Murakami 2 1 E-JUST Center, Kyushu University, JAPAN 2 Department of Advanced Informatics, Kyushu University, JAPAN Farhad Mehdipour 1, Krishna Chaitanya Nunna 2, Lovic Gauthier 2, Koji Inoue 2, Kazuaki Murakami 2 1 E-JUST Center, Kyushu University, JAPAN 2 Department of Advanced Informatics, Kyushu University, JAPAN Thermal management is one of the main concerns in three-dimensional integration due to difficulty of dissipating heat through the stack of the integrated circuit. In a 3D stack involving a data-path accelerator, a base processor and memory components, peak temperature reduction is targeted in this paper. A mapping algorithm has been devised in order to distribute operations of data flow graphs evenly over the processing elements of the target accelerator in two steps: involving thermal-aware partitioning of input data flow graphs, and thermal-aware mapping of the partitions onto the processing elements. Uniform distribution of operations based on their power density leads to lower peak temperatures. ABSTRACT  A mapping algorithm for placing data flow graphs (DFGs) on a certain accelerator architecture is devised so that the peak temperature is reduced through evenly temperature distribution.  Further the accelerator is deployed in a 3D stack along with other components on a plane.  In the target platform accelerator is integrating to a base processor, memory components and etc. in a 3D stacking (Fig. 1(a)).  The accelerator is a pipelined data-path processing architecture, comprising a matrix of processing elements (PEs) as well as multiplexers among them to provide unidirectional flow of data (Fig. 1(b)).  A mapping algorithm for placing data flow graphs (DFGs) on a certain accelerator architecture is devised so that the peak temperature is reduced through evenly temperature distribution.  Further the accelerator is deployed in a 3D stack along with other components on a plane.  In the target platform accelerator is integrating to a base processor, memory components and etc. in a 3D stacking (Fig. 1(a)).  The accelerator is a pipelined data-path processing architecture, comprising a matrix of processing elements (PEs) as well as multiplexers among them to provide unidirectional flow of data (Fig. 1(b)). Fig. 1: (a) an integration of a DFG accelerator in a 3D stack (b) detailed architecture of the pipelined data-path accelerator  Data flow graphs (DFGs) extracted from various applications are mapped onto the accelerator for the sake of gaining higher performance.  A non-uniform distribution of DFG operations can be easily observed in the mapping result.  This may cause hot spots in condensed area of the accelerator as displayed in Fig. 2(c) which is indicating the peak temperature around 90 o C and minimum temperature of 56 o C for unused spaces.  Data flow graphs (DFGs) extracted from various applications are mapped onto the accelerator for the sake of gaining higher performance.  A non-uniform distribution of DFG operations can be easily observed in the mapping result.  This may cause hot spots in condensed area of the accelerator as displayed in Fig. 2(c) which is indicating the peak temperature around 90 o C and minimum temperature of 56 o C for unused spaces. The proposed mapping algorithm is a partitioning-based one and performs in two steps (Fig. 3 and Fig. 4): (a)thermal-aware partitioning of input DFG and creating a number of partitions associated with PE stripes, and (a)placing the DFG operations of each partition onto the PEs of corresponding PE stripe considering temperature distribution. The proposed mapping algorithm is a partitioning-based one and performs in two steps (Fig. 3 and Fig. 4): (a)thermal-aware partitioning of input DFG and creating a number of partitions associated with PE stripes, and (a)placing the DFG operations of each partition onto the PEs of corresponding PE stripe considering temperature distribution. Fig. 3: First step of the proposed algorithm, thermal- aware partitioning DFGs Fig. 2: (a) A sample DFG, (b) its corresponding map on the PE-array with non-uniform distribution (c) thermal map of the accelerator after mapping. INTRODUCTION MOTIVATION THERMAL-AWARE MAPPING ALGORITHM RESULTS CONCLUSION & FUTURE WORK The proposed thermal-aware partitioning and mapping algorithm is effective in reducing the peak temperature of the accelerator. We intend: to expand this algorithm for a 3D mutli-tier accelerator and take the effect of interconnections into account in thermal evaluations due to substantial share of interconnections on power consumption. The proposed thermal-aware partitioning and mapping algorithm is effective in reducing the peak temperature of the accelerator. We intend: to expand this algorithm for a 3D mutli-tier accelerator and take the effect of interconnections into account in thermal evaluations due to substantial share of interconnections on power consumption. This research was supported in part by New Energy and Industrial Technology Development Organization and Core Research for Evolutional Science and Technology (CREST) of Japan Science and Technology Corporation (JST). 1.K. Banerjee, S. J. Souri, P. Kapur and K. C. Saraswat, “3-D ICs: A novel chip design for improving deep submicrometer interconnect performance and system-on-chip integration,” Proc. of IEEE, 89(5), pp , May S. Im and K. Banerjee, “Full chip thermal analysis of planar (2-D) and vertically integrated (3-D) high performance ICs,” in IEDM Tech. Dig., pp. 727–730, F. Mehdipour, H. Honda, K. Inoue, H. Kataoka, and K. Murakami, A design scheme for a reconfigurable accelerator implemented by single-flux quantum circuits, J. Syst. Architect. Embedded Systems Design 57(1), pp , Jan K. Banerjee, S. J. Souri, P. Kapur and K. C. Saraswat, “3-D ICs: A novel chip design for improving deep submicrometer interconnect performance and system-on-chip integration,” Proc. of IEEE, 89(5), pp , May S. Im and K. Banerjee, “Full chip thermal analysis of planar (2-D) and vertically integrated (3-D) high performance ICs,” in IEDM Tech. Dig., pp. 727–730, F. Mehdipour, H. Honda, K. Inoue, H. Kataoka, and K. Murakami, A design scheme for a reconfigurable accelerator implemented by single-flux quantum circuits, J. Syst. Architect. Embedded Systems Design 57(1), pp , Jan REFERENCES Fig. 4: Thermal maps of the accelerator after and before applying the thermal-aware mapping algorithm (a) (b) Fig. 4: Second step of the proposed algorithm, assigning operations to PE stripes