Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.

Slides:



Advertisements
Similar presentations
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt.
Advertisements

Reducing Network Energy Consumption via Sleeping and Rate- Adaption Sergiu Nedevschi, Lucian Popa, Gianluca Iannaccone, Sylvia Ratnasamy, David Wetherall.
Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Real- time Dynamic Voltage Scaling for Low- Power Embedded Operating Systems Written by P. Pillai and K.G. Shin Presented by Gaurav Saxena CSE 666 – Real.
Power Reduction Techniques For Microprocessor Systems
Evaluating an Adaptive Framework For Energy Management in Processor- In-Memory Chips Michael Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
Power Management in Cloud Computing using Green Algorithm -Kushal Mehta COP 6087 University of Central Florida.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
Chapter 4 Assessing and Understanding Performance
Cache Oblivious Search Trees via Binary Trees of Small Height
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS.
1 Lecture 10: Uniprocessor Scheduling. 2 CPU Scheduling n The problem: scheduling the usage of a single processor among all the existing processes in.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
Scheduling for Reduced CPU Energy M. Weiser, B. Welch, A. Demers, and S. Shenker.
1 Lecture 2: Metrics to Evaluate Systems Topics: Power and technology trends wrap-up, benchmark suites, performance equation, summarizing performance with.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat Mälardalen Real-time Research Center, Mälardalen University Västerås, Sweden Preemption Control.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Computer Performance Computer Engineering Department.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
Dynamic Slack Reclamation with Procrastination Scheduling in Real- Time Embedded Systems Paper by Ravindra R. Jejurikar and Rajesh Gupta Presentation by.
A S CHEDULABILITY A NALYSIS FOR W EAKLY H ARD R EAL - T IME T ASKS IN P ARTITIONING S CHEDULING ON M ULTIPROCESSOR S YSTEMS Energy Reduction in Weakly.
Energy Savings with DVFS Reduction in CPU power Extra system power.
1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.
Egle Cebelyte. Random Access Memory is simply the storage area where all software is loaded and works from; also called working memory storage.
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat Mälardalen Real-time Research Center, Mälardalen University Västerås, Sweden Towards Preemption.
1 Scheduling The part of the OS that makes the choice of which process to run next is called the scheduler and the algorithm it uses is called the scheduling.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Meikel Poess Oracle Corporation. Analytical Power Consumption Model Based on nameplate power consumption Nameplate power is conservative estimate Model.
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.
An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
F A S T Frequency-Aware Static Timing Analysis
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
Critical Power Slope: Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi †,Charles Lefurgy ‡, Eric Van Hensbergen ‡, Ram Rajamony ‡,
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Real Time Operating Systems Schedulability - Part 2 Course originally developed by Maj Ron Smith 12/20/2015Dr Alain Beaulieu1.
Outline Models Design of experiments Current Scheduler Completely Fair Scheduler(CFS) ◦ Since Linux ◦ /kernel/sched.c ◦ Maintain balance (fairness)
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
E-MOS: Efficient Energy Management Policies in Operating Systems
Equalizer: Dynamically Tuning GPU Resources for Efficient Execution Ankit Sethia* Scott Mahlke University of Michigan.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
1 Improved Policies for Drowsy Caches in Embedded Processors Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu.
Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker : Chun-Chung Chen Single-ISA.
Distributed Process Scheduling- Real Time Scheduling Csc8320(Fall 2013)
Overview Motivation (Kevin) Thermal issues (Kevin)
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
Green cloud computing 2 Cs 595 Lecture 15.
Experimental Calibration and Validation of a Speed Scaling Simulator
Flavius Gruian < >
Zhen Xiao, Qi Chen, and Haipeng Luo May 2013
TDC 311 Process Scheduling.
Dynamic Voltage Scaling
CPU SCHEDULING.
FAST: Frequency-Aware Static Timing Analysis
Presentation transcript:

Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar

Motivation Power management algorithms implicitly assume that lower performance points are more energy efficient that higher points This paper shows that this assumption is not always valid Also helps decide which operating points of a processor should be considered by an power management algorithm

Motivation  < : not always true  How do we choose which operating points to use? Evaluation of frequency scaling, clock throttling and dynamic voltage scaling on three existing processors Analytical model: Critical Power Slope Analysis on voltage scaling systems Conclusion Outline Watts

Techniques of Power Management Frequency scaling  Processor clock is reduced  Processor consumes less energy at the expense of reduced performance Clock throttling  Clock runs at original frequency  Clock signal is gated/disabled for some cycles at regular intervals Dynamic voltage scaling  Reduces power consumed by lowering the operating voltage  Advantageous because E ∝ V 2

Linux on Pentium Dell Inspiron 8000 laptop with 850 MHz PIII processor with 512Mb of RAM running Linux Processor runs at 8 different performance states  100% 87.5% 75% 62.5% 50% 37.5% 25% 12.5% Effect is evaluated by throttling the clock The following micro benchmarks were considered  Access to register  L1 cache (read)  L1 cache (write)  Access to memory (read)  Access to memory (write)  Disk Read

Micro benchmark performance - Pentium

Power usage in idle mode- Pentium Linux scheduler puts the processor into C1 or C2 sleep state Idle state power is considered to be a constant

Simple benchmark which exercises the CPU while changing the performance state from 100% % As performance is lowered system power usage decreases linearly Power measurements at different performance states - Pentium

Energy consumption Energy required to complete the benchmark – E active + E idle Compare energy used to execute same load at the same time interval at different operating points  The time interval does not end at E active since the system is kept on until next request arrives Idle time = Time to run the benchmark at a particular operating point – Time to run the benchmark at lowest performance states Idle power is known, hence E idle can be calculated

E active + E idle decreases slightly as performance state increases The benchmarks suggest we should run this system at the highest performance state possible

Linux on PowerPC PowerPC 405GP microprocessor, 8KB of D cache 16KB of I cache, 32MB RAM with Linux Frequency of the processor and processor local bus (PLB) can be changed directly affecting memory speed

PowerPC: Power measurements

PowerPC: Energy consumption Total energy = Eactive + Eidle Eactive = Ecpu + ESDRAM +Eother By lowering frequency, total energy used by the system descreases Results contrary to the Pentium based system

Characterization of the two systems Bimodal behavior – system will either be in active or idle mode Performance ∝ frequency P idle will be considered constant for all frequencies Consider CPU intensive workload W, lowest frequency f min  At f min utilization of the system is 1 and W takes T f min units of time to complete  (-eq. 1)  At frequency f (f> fmin) (E f = E active + E idle ) ( - eq. 2)

Critical Power Slope As power ∝ frequency and constant at idle state (from the graph) Substituting P f in eq. 2 (-eq. 3) There should be a slope m where energy usage at all frequencies is equal - critical power slope m critical Equating eq. 1 and eq.3 we get

If  Energy efficient to run at higher freq.  Pentium If  Energy efficient to run at lower freq.  PowerPC Implications of CPS

Non linear power savings : P ∝ V 2 Look at every operating point at frequency If  Energy efficient at higher frequency than If  Energy efficient at lower frequency than CPS for voltage scaling system

Analysis on SA-1100 A StrongARM processor (SA-1100) is considered Above 74MHz At 74MHz Below 74MHz Energy Inefficient below 74MHz! No incentive to operative between 74MHz and 59 MHz using voltage scaling

Critical Power slope in Realistic workload Static page requests on a web server Apache 1.3, Pentium based laptop At 100% performance – 1500 requests/sec At 62.5% performance – 700 requests/sec Energy increases linearly as request rate increases More energy efficient to run at higher performance Consistent with previous Pentium system analysis

Conclusion This paper shows the assumption that lower performance points are more energy efficient that higher performance points is not valid This paper helps decide which operating point to choose in a power management scheme

Questions?