Energy saving in multicore architectures Assoc. Prof. Adrian FLOREA, PhD Prof. Lucian VINTAN, PhD – Research.

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

Chapter 3 Embedded Computing in the Emerging Smart Grid Arindam Mukherjee, ValentinaCecchi, Rohith Tenneti, and Aravind Kailas Electrical and Computer.
Augsburg University, February 18 th 2010 Anticipatory Techniques in Advanced Processor Architectures Professor Lucian N. VINŢAN, PhD Lucian Blaga University.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Chapter 18 Multicore Computers
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Advanced Computer Architecture & Processing Systems Research Lab Ongoing Computer Engineering Research Projects at.
CPACT – The Conditional Parameter Adjustment Cache Tuner for Dual-Core Architectures + Also Affiliated with NSF Center for High- Performance Reconfigurable.
McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures Runjie Zhang Dec.3 S. Li et al. in MICRO’09.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Last Time Performance Analysis It’s all relative
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Computer Science Department In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces Kiyeon Lee and Sangyeun Cho.
Embedded System Lab. 김해천 The TURBO Diaries: Application-controlled Frequency Scaling Explained.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
Sep 08, 2009 SPEEDUP – Optimization and Porting of Path Integral MC Code to New Computing Architectures V. Slavnić, A. Balaž, D. Stojiljković, A. Belić,
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
An Integrated Design Environment to Evaluate Power/Performance Tradeoffs for Sensor Network Applications Amol Bakshi, Jingzhao Ou, and Viktor K. Prasanna.
Advanced Computer Architecture & Processing Systems Research Lab Framework for Automatic Design Space Exploration.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad*
Scheduling Issues on a Heterogeneous Single ISA Multicore IRISA, France Robert Guziolowski, André Seznec. Contact: 1. M. Becchi and P.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
By Islam Atta Supervised by Dr. Ihab Talkhan
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Migration Cost Aware Task Scheduling Milestone Shraddha Joshi, Brian Osbun 10/24/2013.
Tackling I/O Issues 1 David Race 16 March 2010.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
William Stallings Computer Organization and Architecture 8th Edition
Measuring Performance II and Logic Design
Reza Yazdani Albert Segura José-María Arnau Antonio González
Ioannis E. Venetis Department of Computer Engineering and Informatics
Gabor Madl Ph.D. Candidate, UC Irvine Advisor: Nikil Dutt
CSCI1600: Embedded and Real Time Software
Some challenges in heterogeneous multi-core systems
Computer Architecture Lecture 4 17th May, 2006
Using Packet Information for Efficient Communication in NoCs
Coe818 Advanced Computer Architecture
An Automated Design Flow for 3D Microarchitecture Evaluation
Computer Evolution and Performance
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 8th Edition
CSCI1600: Embedded and Real Time Software
What Are Performance Counters?
Presentation transcript:

Energy saving in multicore architectures Assoc. Prof. Adrian FLOREA, PhD Prof. Lucian VINTAN, PhD – Research chair Lecturer Arpad GELLERT, PhD Horia CALBOREAN, PhD Advanced Computer Architecture & Processing Systems Research Lab Anticipatory Techniques in Advanced Processor Architectures (superscalar, SMT) An Automatic Design Space Exploration Framework for Multicore Architecture Optimizations

Advanced Computer Architecture & Processing Systems Research Lab Computing hardware 14 Intel Compute nodes (2 processor HS21 blades with quad-core Intel Xeon) 2 Cell Compute nodes (2 processor QS22 blades withIBM PowerXCell 8i Processor )

Advanced Computer Architecture & Processing Systems Research Lab Issue Bottleneck (Data-flow) Conventional processing models are limited in their processing speed by the dynamic program’s critical path (Amdahl); 2 Solutions Dynamic Instruction Reuse (DIR) - a non-speculative technique. Value Prediction (VP) - a speculative technique. Common issue Value locality Challenges Selective Instruction Reuse (MUL & DIV) Selective Load Value Prediction (“Critical Loads”) Exploiting IR & VP in a Superscalar / Simultaneous Multithreaded (SMT) Architecture to anticipate Long-Latency Instructions Results Anticipatory Techniques in Advanced Processor Architectures (superscalar, SMT)

Advanced Computer Architecture & Processing Systems Research Lab Traditional value prediction techniques have been increasingly challenged by the advent of mobile, battery-operated devices due to the significant amount of energy consumption. This is essentially due to the on-chip memory required for computing the prediction and the overall number of accesses to the predictor itself. We introduce and analyze a selective value predictor which is triggered selectively only during specific cache miss events. Advantages:  Reduce the overall number of accesses and the energy consumption of the on-chip memory and logic reserved to the value speculation.  Improve over traditional value predictors in terms of performance and energy consumption.  Create room for a reduction of the data-cache size by preserving performance, thus enabling a reduction of the system cost. Exploiting Selective Value Prediction in Superscalar and SMT Architectures

Advanced Computer Architecture & Processing Systems Research Lab Tools, Metrics and some Results The M-SIM Simulator Cycle-Level Performance Simulator Hardware Configuration SPEC Benchmark Power Models Hardware Access Counts Performance Estimation Power Estimation

Design space exploration (DSE) of a Selective Load Value Prediction scheme suitable for energy- aware Simultaneous MultiThreaded (SMT) architectures a) Superscalar b) SMT Advanced Computer Architecture & Processing Systems Research Lab

Automatic Design Space Exploration Framework for Multicore Architecture Optimizations Multiobjective optimization of advanced computer architectures using experts’ domain- knowledge  HUGE design space (>19 parameters) M-SIM 2 – 2,5 millions of billions configurations (10 15 ) Manual design space exploration  impossible  Multi-objective optimization (performance processing, power consumption, integration area, thermal dissipation)  problem becomes even harder Solution  Heuristic algorithms ( genetic algorithms, bio-inspired algorithms ) Advanced Computer Architecture & Processing Systems Research Lab

Framework for Automatic Design Space Exploration (FADSE) - It must:  Simulate many individuals (  architectural configurations)  Slow! (24 hours/generations on 96 cores, one generation = 100 individuals)  Implement reliability mechanisms (bounded wait for client, resending individuals, checkpointing, etc) Accelerating process:  Simulate less configurations (database integration (up to 67% reuse), evaluate only 2500 configurations!!!)  Parallelize (distributed evaluation) Advanced Computer Architecture & Processing Systems Research Lab  Adding Computer Architecture Domain-Knowledge (Constraints, Hierarchical parameters, Fuzzy Rules) After 30 generations