Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 1 Krisztián Flautner Steve.

Slides:



Advertisements
Similar presentations
Reducing Network Energy Consumption via Sleeping and Rate- Adaption Sergiu Nedevschi, Lucian Popa, Gianluca Iannaccone, Sylvia Ratnasamy, David Wetherall.
Advertisements

1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Effects of Clock Resolution on the Scheduling of Interactive and Soft Real- Time Processes by Yoav Etsion, Dan Tsafrir, Dror G. Feitelson Presented by.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
Chapter 13 Embedded Systems
Krisztián Flautner - Automatic Monitoring for Interactive performance and Power Reduction 1 Automatic Monitoring for Interactive.
System-Wide Energy Minimization for Real-Time Tasks: Lower Bound and Approximation Xiliang Zhong and Cheng-Zhong Xu Dept. of Electrical & Computer Engg.
Processor Frequency Setting for Energy Minimization of Streaming Multimedia Application by A. Acquaviva, L. Benini, and B. Riccò, in Proc. 9th Internation.
1 Chapter 13 Embedded Systems Embedded Systems Characteristics of Embedded Operating Systems.
Scheduling for Reduced CPU Energy M. Weiser, B. Welch, A. Demers, and S. Shenker.
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
CPU Scheduling Chapter 6 Chapter 6.
Operating System Examples - Scheduling
Low-Power Wireless Sensor Networks
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Wanghong Yuan, Klara Nahrstedt Department of Computer Science University of.
Chapter 6 Scheduling. Basic concepts Goal is maximum utilization –what does this mean? –cpu pegged at 100% ?? Most programs are I/O bound Thus some other.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
4.1 Advanced Operating Systems Desktop Scheduling You are running some long simulations. In the mean time, why not watch an illegally downloaded Simpsons.
1 Scheduling The part of the OS that makes the choice of which process to run next is called the scheduler and the algorithm it uses is called the scheduling.
Reference: Ian Sommerville, Chap 15  Systems which monitor and control their environment.  Sometimes associated with hardware devices ◦ Sensors: Collect.
AutoDVS: An Automatic, General- Purpose, Dynamic Clock Scheduling System for Hand-Held Devices Selim Gurun Chandra Krintz Lab for Research on Adaptive.
Computer Science Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs Min Yeol Lim Computer Science Department Sep.
1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
Lecture 7: Scheduling preemptive/non-preemptive scheduler CPU bursts
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Oindrila.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
An Energy Efficient MAC Protocol for Wireless LANs, E.-S. Jung and N.H. Vaidya, INFOCOM 2002, June 2002 吳豐州.
Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.
© 2003, Carla Ellis Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
1.  System Characteristics  Features of Real-Time Systems  Implementing Real-Time Operating Systems  Real-Time CPU Scheduling  An Example: VxWorks5.x.
Enhancing Mobile Apps to Use Sensor Hubs without Programmer Effort Haichen Shen, Aruna Balasubramanian, Anthony LaMarca, David Wetherall 1.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
The CRISP Performance Model for Dynamic Voltage and Frequency Scaling in a GPGPU Rajib Nath, Dean Tullsen 1 Micro 2015.
Real-Time Operating Systems RTOS For Embedded systems.
Input and Output Optimization in Linux for Appropriate Resource Allocation and Management James Avery King.
Overview Motivation (Kevin) Thermal issues (Kevin)
Jacob R. Lorch Microsoft Research
Jacob R. Lorch Microsoft Research
Presented by Kristen Carlson Accardi
Chapter 2 Scheduling.
Automatic Performance Setting for Dynamic Voltage Scaling
Chapter 6: CPU Scheduling
Operating Systems CPU Scheduling.
Automatic Monitoring for Interactive Performance and Power Reduction
CS 143A - Principles of Operating Systems
CMSC 611: Advanced Computer Architecture
Energy Efficient Scheduling in IoT Networks
Dynamic Voltage Scaling
CPU SCHEDULING.
Lecture 2 Part 3 CPU Scheduling
CMSC 611: Advanced Computer Architecture
Performance of computer systems
Chapter 6: CPU Scheduling
Presentation transcript:

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 1 Krisztián Flautner Steve Reinhardt Trevor Mudge

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 2 Overview A mechanism for quantifying the user experience. –Metric: response time. –Automatic, no user program modifications required. –Run-time feedback to the kernel. Guiding performance setting of DVS processors. –For interactive episodes: slow down processor to save energy when response times are fast enough. –For periodic events: track periodicity, utilization and inter- task communication to establish necessary performance. Simulated and experimental results.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 3 Dynamic Voltage Scaling Voltage is proportional to the frequency. Reduce f and v to match performance demands. Reduced frequency implies longer execution time. Power = Capacitance voltage 2 frequency Energy ~ voltage 2 Execute only as fast as necessary to meet deadlines. Running fast and idling is not energy efficient.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 4 Why bother? Pentium(R) MMX Pentium Pro (R) Pentium II (R)  Max Power (Watts) ? Source: Intel Higher performance = increased power consumption.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 5 Power Density! Hot plate Nuclear Reactor Rocket Nozzle Sun’s Surface ? Source: Intel

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 6 Small performance reduction = big energy savings 20% performance reduction = 32% energy reduction 40% performance reduction = 55% energy reduction Graph based on Intel XScale data

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 7 Processors supporting DVS lpARMIntel SA-1100 Transmeta Crusoe 5600 Intel XScale Intel XScale Demo Min. 8Mhz 1.1V 1.8mW 59Mhz 0.79V 106mW 500Mhz 1.2V ~1W 150Mhz 0.75V 40mW 150Mhz 0.75V 40mW Max. 100Mhz 3.3V 220mW 251Mhz 1.65V 964mW 700Mhz 1.6V ~2W 800Mhz 1.5V 900mW 1000Mhz 1.75V 1.45W Process Max/min energy

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 8 Some recent desktop processors Intel Pentium IVIntel Pentium III AMD Athlon Model 4 MPC 7450 Core 1.7V 1.35V 1.65V 1.75V 1.75V 1.8V 1.8V I/O 400Mhz 100Mhz, 133Mhz 3.3V 200Mhz, 266Mhz 1.6V 133Mhz 1.8V-2.5V Process 0.18 Max. Power 66.3W 12W 19.1W 38W 66W 17W 19.1W

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 9 Performance setting algorithms Programmer specified –Works well but requires explicit specification of deadlines. Interval based algorithms –Use the ratio of idle to busy time to guide DVS. –Only work well if processor utilization is regular. –No service quality guarantees. Ours: episode classification based –Find important execution episodes – predict their performance. –Works with existing user programs. –Works well with irregular workloads. –Uses information in kernel to derive deadlines automatically. –Impact on response time is automatically quantified. Performance can be adapted to the user’s preference.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 10 Episode classification Interactive episodes –When the user is waiting for the computer to respond. Periodic episodes –Producer (e.g. MP3 player). –Consumer (e.g. sound daemon).

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 11 A utilization trace Each horizontal quantum is a millisecond, height corresponds to the utilization in that quantum.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 12 Episode classification Interactive (Acrobat Reader), Producer (MP3 playback), and Consumer (esd sound daemon) episodes.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 13 Mouse movement X server updates screen every ~10ms. Update takes ~0.25ms.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 14 Interactive episodes

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 15 Interactive episodes can include idle time Waiting for data from the network during a run of Netscape. Page rendering starts after 250ms.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 16 Finding interactive episodes One way: mouse click indicates start, idle time indicates end. –Inaccurate, latency in finding the end of the episode. Our approach: track inter-task communication. –Start of an interactive episode: X server sends a message to another task. –During interactive episode: Keep track of communicating tasks (episode’s task set). Compute desired metrics. –Conditions for ending the episode (applied to tasks in task set): No tasks are executing. Data written by the tasks have been consumed. No task was preempted the last time it ran. No tasks are blocked on I/O.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 17 Characteristics of Interactive Episodes Faster is not necessarily better. –Human perception has finite resolution. –Perception threshold is ~50ms. –The goal is to run fast enough to meet the perception threshold, no point to running any faster. Many interactive episodes are already fast enough. More will be imperceptible in the near future. –200ms perception threshold today estimates work done during 50ms 3 years from now. Slow down the processor!

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 18 Time above the perception threshold Time above the perception threshold is given as a percentage of time spent in all interactive episodes.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 19 The key: performance-setting algorithm Use episode detection and classification. –Interactive episodes. –Periodic episodes (producer and consumer). Performance-setting on a per episode basis. Stretch episodes to their deadlines. –Interactive episode: perception threshold. –Stretch producer to consumer. No modification of existing programs needed. Works with irregular processor utilization and multiprogramming.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 20 Cumulative interactive episode length distribution FrameMaker Episode length (sec) Cumulative number Cumulative time Minimum performance level sufficientMax. performance

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 21 Performance-setting strategy for interactive episodes Predict the performance factor that would be correct most of the time (not for most events). –Based on past optimal performance factors. Limit worst case impact on response time. –Run at full performance after PanicThreshold is reached.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 22 Performance-setting for interactive episodes Wait 5ms before transition to ignore short episodes Switch to predicted performance level. If episode duration reaches PanicThreshold, switch to maximum performance. Estimate full performance episode duration. Compute optimum performance level for past episode. Compute new prediction based on optimum settings. At the beginning of the episode During the episode At the end of the episode PanicThreshold = PerceptionThreshold(1 + PerformanceFactor) Predicted PerformanceFactor is the average of past optimum settings, weighted by the corresponding episode lengths.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 23 Performance-setting algorithm Enter period-sampling mode. Switch to maximum performance. Establish base performance level. Exit period-sampling mode. Periodic activity detected If not in period-sampling mode, apply interactive episode performance-setting policy. Start of interactive episode Update interactive episode statistics. Switch to base performance level, if there is periodic activity on the machine. End of interactive episode

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 24 Performance-setting during the Acrobat Reader benchmark (200ms p.t.) Time (sec) Performance factor Transitions to maximum performance level are due to reaching the PanicThreshold

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 25 Performance-setting during the Acrobat Reader + MP3 benchmark (200ms p.t.) Time (sec) Performance factor Transitions due to PanicThreshold Full performance for periodic activity.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 26 Hardware assumptions Minimum 0.75V Maximum 1.75V PLL resynch time (stalls execution) 0.02ms Voltage transition time1ms Assumptions based on Intel Xscale. We assume that processor switches to sleep mode when it is not executing an episode.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 27 Energy factors (no MP3)

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 28 Energy factors with MP3 playback

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 29 Changes in cumulative episode lengths as the result of performance scaling (Xemacs 50ms p.t. ) Episode length (sec) Before performance scaling After performance scaling Cumulative percentage of time

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 30 Vertigo A DVS implementation for Linux 2.4 kernel. Currently runs on Transmeta Crusoe. –Test machine: Sony PictureBook (PCG-C1VN) using TM5600 processor (300Mhz-600Mhz). Goals: Robust implementation. Evaluate our algorithms on computers with DVS. Contrast with conventional DVS algorithm (LongRun).

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 31 Vertigo implementation Some kernel modification required (~20 lines): –Socket, inode, task_struct datastructures, task create/exit notification. Episode detection done in kernel module. System calls dynamically patched through syscall table. Vertigod daemon User-mode process. Implements DVS policy. User processes Monitored through kernel hooks. System calls Task switch/create/exit Can specify hints. Kernel Hooks Vertigo Module Episode detection & tracking. Comm. with policy daemon. Event tracing. /proc interface.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 32 Vertigo implementation issues Need millisecond resolution timer interrupts. –Linux has 10ms resolution. –Generate extra “fake” interrupts from kernel hooks. Need constant-rate timestamp counter. –Always count at peak frequency rate, even when asleep. –Transmeta Crusoe does this. –Intel XScale does not. Need to query external clock. Policy implemented in user-mode process. –Flexible, can do floating point arithmetic. –Communication cost very platform dependent. Pentium II: ~6,000 cycles, Crusoe: >60,000 cycles. –API designed to minimize communication. –Move communication off the critical path (to after critical episodes).

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 33 Vertigo vs. LongRun LongRun: implemented as part of the processor. –Interval based algorithm (guided by busy vs. idle time). –Min. and max. range is controllable in software. Vertigo: implemented in OS kernel. –Classification based algorithm. –Distinguishes important from unimportant parts of execution. –Takes the quality of the user experience into account. Qualitative comparison on following graphs. –The two runs of the benchmarks are close but not identical. Human repeated the runs of the benchmark. –Transitions to sleep are not shown. –Same perceived interactive performance.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 34 No user activity Time (s) Performance level Time (s) LongRun Vertigo Frequency range of the TM5600 processor. 50% = 1.3V 100% = 1.6V Max. energy savings that should be expected on this processor is ~34%.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 35 Emacs Time (s) Performance level Time (s) LongRun Vertigo

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 36 Acrobat Reader Time (s) Performance level Time (s) LongRun Vertigo

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 37 Acrobat Reader with sleep transitions Time (s) Performance level Time (s) LongRun Vertigo Frequent transitions to/from sleep mode. Longer durations without sleeping.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 38 Desired improvements Processor parameters are good enough. –Faster voltage transitions would help a little. –As peak performance gets higher, lower minimum performance is desirable. More sophisticated prediction algorithms. –Distinguish between episode instances, not just episode types. Larger performance range for DVS processor. –Puts more pressure on performance-setting algorithm. –More opportunity for energy savings.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 39 Conclusions Many interactive episodes are already fast enough. –More will be fast enough in the near future. –Use Dynamic Voltage Scaling to save energy. Episode classification based on inter-task communication. –Fast, accurate, no user program modifications required. Performance-setting based on episode classification. –Works well with multiprogramming, irregular processor utilization. –Ensures high quality interactive performance. –Significant energy savings (10%-80%).

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 40 Future work Evaluate our algorithms on real hardware. –Processors are slowly becoming available. –Impact on interactive performance. An API to specify episodes. –Light-weight: specify hints, not complete information. –Works in concert with existing detection mechanism. Apply episode detection to other problems. –Scheduler: can real-time deadlines be detected automatically?

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 41 fin.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 42 Response time Faster is not always better. –Fundamental limit to what is perceptible to humans. Movies: frames per second. Perceptual causality: 50ms-100ms. Dragging objects on screen: 200ms. Non-continuous operation: 1-2sec. The time it takes for the computer to respond to user initiated events. The goal is to run fast enough to meet the perception threshold, no point to running any faster.

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 43 The performance gap

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 44 Cumulative interactive episode length distribution Xemacs Episode length (sec) Cumulative number Cumulative time Minimum performance level sufficientMax. performance

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 45 Communication between tasks

Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 46 Producer and consumer episodes Example: MP3 playback through esd sound daemon. Monitor communications to/from sound daemon. Distance between producer and consumer episodes determines necessary performance level.