Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker : Chun-Chung Chen Single-ISA.

Slides:



Advertisements
Similar presentations
Dynamic Thread Mapping for High- Performance, Power-Efficient Heterogeneous Many-core Systems Guangshuo Liu Jinpyo Park Diana Marculescu Presented By Ravi.
Advertisements

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen.
Scheduling Algorithms for Unpredictably Heterogeneous CMP Architectures J. Winter and D. Albonesi, Cornell University International Conference on Dependable.
A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.
Conjoining Soft-Core FPGA Processors David Sheldon a, Rakesh Kumar b, Frank Vahid a*, Dean Tullsen b, Roman Lysecky c a Department of Computer Science.
Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.
Project Proposal Presented by Michael Kazecki. Outline Background –Algorithms Goals Ideas Proposal –Introduction –Motivation –Implementation.
Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.
Chia-Yen Hsieh Laboratory for Reliable Computing Microarchitecture-Level Power Management Iyer, A. Marculescu, D., Member, IEEE IEEE Transaction on VLSI.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
Trevor Burton6/19/2015 Multiprocessors for DSP SYSC5603 Digital Signal Processing Microprocessors, Software and Applications.
Scheduling Reusable Instructions for Power Reduction J.S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M.J. Irwin Proceedings of the Design, Automation.
On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.
Compiler-Directed instruction cache leakage optimizations Discussed by Discussed by Raid Ayoub CSE D EPARTMENT.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
System-Wide Energy Minimization for Real-Time Tasks: Lower Bound and Approximation Xiliang Zhong and Cheng-Zhong Xu Dept. of Electrical & Computer Engg.
1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Variation Aware Application Scheduling in Multi-core Systems Lavanya Subramanian, Aman Kumar Carnegie Mellon University {lsubrama,
Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.
Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir.
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
University of Michigan Electrical Engineering and Computer Science Composite Cores: Pushing Heterogeneity into a Core Andrew Lukefahr, Shruti Padmanabha,
Power Management in High Performance Processors through Dynamic Resource Adaptation and Multiple Sleep Mode Assignments Houman Homayoun National Science.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Heterogeneity-Aware Peak Power Management for Accelerator-Based Systems Gui-Bin.
A Centralized Cache Miss Driven Technique to Improve Processor Power Dissipation Houman Homayoun, Avesta Makhzan, Jean-Luc Gaudiot, Alex Veidenbaum University.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.
Single-ISA Heterogeneous Multi-Core Architecture Zvika Guz November, 2004.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
Runtime Software Power Estimation and Minimization Tao Li.
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction University of California MICRO ’03 Presented by Jinho Seol.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:
CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State.
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Authors: Matthew DeVuyst, Rakesh Kumar, and Dean M. Tullsen.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Workload Clustering for Increasing Energy Savings on Embedded MPSoCs S. H. K. Narayanan, O. Ozturk, M. Kandemir, M. Karakoy.
E-MOS: Efficient Energy Management Policies in Operating Systems
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Variation Aware Application Scheduling in Multi-core Systems Lavanya Subramanian, Aman Kumar Carnegie Mellon University {lsubrama,
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Core Architecture Optimization for Heterogeneous CMPs R. Kumar, D. M. Tullsen, and N.P. Jouppi İlker YILDIRIM
Overview Motivation (Kevin) Thermal issues (Kevin)
Temperature and Power Management
Adaptive Cache Partitioning on a Composite Core
Resource Aware Scheduler – Initial Results
Ching-Chi Lin Institute of Information Science, Academia Sinica
Microarchitectural Techniques for Power Gating of Execution Units
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Some challenges in heterogeneous multi-core systems
Hui Chen, Shinan Wang and Weisong Shi Wayne State University
Computer Architecture
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
Presentation transcript:

Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker : Chun-Chung Chen Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Outline Introduction Motivation Architecture Experiment Results Conclusion 2

Motivation By 2015 processors will consume 300W Existing CMP designs use only homogeneous cores Applications with high ILP can be exploited on wider cores but applications with low ILP use less power on narrower cores with little loss in performance No need to design cores from scratch because existing Alpha cores run on practically the same ISA

General Idea Single-ISA heterogeneous multi-core architecture –A mechanism to reduce power dissipation System software dynamically choose the most power efficient processor under some performance constraints  Power efficiency

Modeling of CPU Cores EV4: Alpha EV5: Alpha EV6: Alpha EV8-: single-threaded version of Alpha Assumption Only one application runs at a time on only one core Unused cores are completely powered down (therefore no leakage)

Cores, cont. Assuming all cores are implemented in 0.10 micron technology We assume the four cores have private L1 data and instruction caches and share a common L2 cache, phase- lock loop circuitry, and pins. All cores run at 2.1GHz. ISA differences solved by. Either programs are compiled to the least common denominator (the EV4), or we use software traps for the older cores.

Modeling of Power 7

Core Switching Switching done at the operating system level OS switch involves cache flush and saving and loading user states for the cores Estimate that a core can be powered up in ~1000 cycles at 2.1 GHz Switching overhead turns out to be negligible (~1%)

Variation in Power & Performance Benchmark, applu Relative performance of the cores varies between phases. 9

Switching Algorithms: Oracle based dynamic switching using energy heuristic With oracle knowledge of power requirements and performance potential, chose the core that would have the lowest energy consumption, as long as it performs within 10% of EV8- applu

Switching Algorithms: Oracle based dynamic switching using energy-delay heuristic Oracle chooses the core that has the lowest energy–delay product. Choose the core that would maximize IPS 2 /Watt, as long as it performs within 50% of EV8- applu

Switching Algorithms: Realistic Dynamic Switching Every 100 time intervals, one or more cores are sampled for five intervals each. Neighbor –One of the neighboring cores is chosen at random to be sampled Neighbor-global –Similar to neighbor, except selecting the accumulated energy- delay product. Random –A core is chosen at random to be sampled All –All cores are sampled

Realistic Dynamic Switching Results Results shown normalized to EV8- performance Performance degradation of realistic schemes is less than in oracle-based schemes Realistic schemes resulted in more core switching

Conclusion Realistic dynamic switching algorithms show a decrease in energy and energy-delay with only a small decrease in performance. Single ISA heterogeneous multi-core processors using existing technology may be a way to curb power usage.