Automatic Performance Setting for Dynamic Voltage Scaling

Slides:



Advertisements
Similar presentations
Reducing Network Energy Consumption via Sleeping and Rate- Adaption Sergiu Nedevschi, Lucian Popa, Gianluca Iannaccone, Sylvia Ratnasamy, David Wetherall.
Advertisements

Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Performance Evaluation
Krisztián Flautner - Automatic Monitoring for Interactive performance and Power Reduction 1 Automatic Monitoring for Interactive.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Scheduling for Reduced CPU Energy M. Weiser, B. Welch, A. Demers, and S. Shenker.
Energy, Energy, Energy  Worldwide efforts to reduce energy consumption  People can conserve. Large percentage savings possible, but each individual has.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
CPU Scheduling Chapter 6 Chapter 6.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
Operating System Examples - Scheduling
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Wanghong Yuan, Klara Nahrstedt Department of Computer Science University of.
Krisztián Flautner - Automatic Performance Setting for Dynamic Voltage Scaling 1 Krisztián Flautner Steve.
4.1 Advanced Operating Systems Desktop Scheduling You are running some long simulations. In the mean time, why not watch an illegally downloaded Simpsons.
1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S.
AutoDVS: An Automatic, General- Purpose, Dynamic Clock Scheduling System for Hand-Held Devices Selim Gurun Chandra Krintz Lab for Research on Adaptive.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Cpr E 308 Spring 2005 Process Scheduling Basic Question: Which process goes next? Personal Computers –Few processes, interactive, low response time Batch.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
An Energy Efficient MAC Protocol for Wireless LANs, E.-S. Jung and N.H. Vaidya, INFOCOM 2002, June 2002 吳豐州.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Soft Timers : Efficient Microsecond Software Timer Support for Network Processing - Mohit Aron & Peter Druschel CS533 Winter 2007.
Improving Dynamic Voltage Scaling Algorithms with PACE Jacob R. LorchAlan Jay Smith University of California Berkeley June 18, 2001 To make the most of.
Measuring Performance Based on slides by Henri Casanova.
Input and Output Optimization in Linux for Appropriate Resource Allocation and Management James Avery King.
Overview Motivation (Kevin) Thermal issues (Kevin)
Real-time Software Design
Lecture 2: Performance Today’s topics:
Jacob R. Lorch Microsoft Research
Jacob R. Lorch Microsoft Research
Green cloud computing 2 Cs 595 Lecture 15.
Chapter 5a: CPU Scheduling
Uniprocessor Scheduling
Presented by Kristen Carlson Accardi
Chapter 2 Scheduling.
Chapter 8 – Processor Scheduling
Operating Systems Processes Scheduling.
OPERATING SYSTEMS SCHEDULING
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
Flavius Gruian < >
Operating Systems CPU Scheduling.
Automatic Monitoring for Interactive Performance and Power Reduction
CS 143A - Principles of Operating Systems
Module 5: CPU Scheduling
Energy Efficient Scheduling in IoT Networks
Jason Neih and Monica.S.Lam
Dynamic Voltage Scaling
Smita Vijayakumar Qian Zhu Gagan Agrawal
Chapter 6: CPU Scheduling
Lecture 2 Part 3 CPU Scheduling
Jacob R. Lorch and Alan Jay Smith University of California, Berkeley
Uniprocessor scheduling
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Outline - Energy Management
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
Research Topics Embedded, Real-time, Sensor Systems Frank Mueller moss
CPU Scheduling CSE 2431: Introduction to Operating Systems
Presentation transcript:

Automatic Performance Setting for Dynamic Voltage Scaling Krisztián Flautner manowar@engin.umich.edu Steve Reinhardt Trevor Mudge Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Overview A mechanism for quantifying the user experience. Metric: response time. Automatic, no user program modifications required. Run-time feedback to the kernel. Guiding performance setting of DVS processors. For interactive episodes: slow down processor to save energy when response times are fast enough. For periodic events: track periodicity, utilization and inter-task communication to establish necessary performance. Simulated and experimental results. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Dynamic Voltage Scaling Execute only as fast as necessary to meet deadlines. Running fast and idling is not energy efficient. Power = Capacitance • voltage2 • frequency Voltage is proportional to the frequency. Reduce f and v to match performance demands. Reduced frequency implies longer execution time. Energy ~ voltage2 Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Why bother? ? Higher performance = increased power consumption. 100 386 486 Pentium(R) MMX Pentium Pro (R) Pentium II (R) 1 10 100 1.5m 1m 0.8m 0.6m 0.35m 0.25m 0.18m 0.13m Max Power (Watts) ? Source: Intel Note the logarithmic scale on the y axis. Higher performance = increased power consumption. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Power Density! Rocket Nozzle Sun’s Nuclear Reactor Surface ? Hot plate Source: Intel Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Small performance reduction = big energy savings Graph based on Intel XScale data 20% performance reduction = 32% energy reduction 40% performance reduction = 55% energy reduction Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Processors supporting DVS lpARM Intel SA-1100 Transmeta Crusoe 5600 Intel XScale Intel XScale Demo Min. 8Mhz 1.1V 1.8mW 59Mhz 0.79V 106mW 500Mhz 1.2V ~1W 150Mhz 0.75V 40mW Max. 100Mhz 3.3V 220mW 251Mhz 1.65V 964mW 700Mhz 1.6V ~2W 800Mhz 1.5V 900mW 1000Mhz 1.75V 1.45W Process 0.6 0.35 0.18 Max/min energy 9 4.4 1.8 4 5.4 Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Some recent desktop processors Intel Pentium IV Intel Pentium III AMD Athlon Model 4 MPC 7450 Core 1.4Ghz @ 1.7V 500Mhz @ 1.35V 733Mhz @ 1.65V 650Mhz @ 1.75V 1.2Ghz @ 1.75V 533Mhz @ 1.8V 667Mhz @ 1.8V I/O 400Mhz 100Mhz, 133Mhz 3.3V 200Mhz, 266Mhz 1.6V 133Mhz 1.8V-2.5V Process 0.18 Max. Power 66.3W 12W 19.1W 38W 66W 17W Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Performance setting algorithms Programmer specified Works well but requires explicit specification of deadlines. Interval based algorithms Use the ratio of idle to busy time to guide DVS. Only work well if processor utilization is regular. No service quality guarantees. Ours: episode classification based Find important execution episodes – predict their performance. Works with existing user programs. Works well with irregular workloads. Uses information in kernel to derive deadlines automatically. Impact on response time is automatically quantified. Performance can be adapted to the user’s preference. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Episode classification Interactive episodes When the user is waiting for the computer to respond. Periodic episodes Producer (e.g. MP3 player). Consumer (e.g. sound daemon). Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

A utilization trace Each horizontal quantum is a millisecond, height corresponds to the utilization in that quantum. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Episode classification Interactive (Acrobat Reader), Producer (MP3 playback), and Consumer (esd sound daemon) episodes. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Mouse movement X server updates screen every ~10ms. Update takes ~0.25ms. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Interactive episodes Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Interactive episodes can include idle time Waiting for data from the network during a run of Netscape. Page rendering starts after 250ms. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Finding interactive episodes One way: mouse click indicates start, idle time indicates end. Inaccurate, latency in finding the end of the episode. Our approach: track inter-task communication. Start of an interactive episode: X server sends a message to another task. During interactive episode: Keep track of communicating tasks (episode’s task set). Compute desired metrics. Conditions for ending the episode (applied to tasks in task set): No tasks are executing. Data written by the tasks have been consumed. No task was preempted the last time it ran. No tasks are blocked on I/O. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Characteristics of Interactive Episodes Faster is not necessarily better. Human perception has finite resolution. Perception threshold is ~50ms. The goal is to run fast enough to meet the perception threshold, no point to running any faster. Many interactive episodes are already fast enough. More will be imperceptible in the near future. 200ms perception threshold today estimates work done during 50ms 3 years from now. Slow down the processor! Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Time above the perception threshold 200ms perception threshold estimates 50ms perception threshold on a processor 3 years from now (performance doubles every ~1.5 years). Time above the perception threshold is given as a percentage of time spent in all interactive episodes. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

The key: performance-setting algorithm Use episode detection and classification. Interactive episodes. Periodic episodes (producer and consumer). Performance-setting on a per episode basis. Stretch episodes to their deadlines. Interactive episode: perception threshold. Stretch producer to consumer. No modification of existing programs needed. Works with irregular processor utilization and multiprogramming. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Cumulative interactive episode length distribution Minimum performance level sufficient Max. performance Notice: most time is spent in very few of the episodes. For most of the episodes the minimum performance level (assuming 5x performance scaling) is sufficient. FrameMaker Cumulative number Cumulative time Episode length (sec) Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Performance-setting strategy for interactive episodes Predict the performance factor that would be correct most of the time (not for most events). Based on past optimal performance factors. Limit worst case impact on response time. Run at full performance after PanicThreshold is reached. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Performance-setting for interactive episodes At the beginning of the episode Wait 5ms before transition to ignore short episodes Switch to predicted performance level. During the episode If episode duration reaches PanicThreshold, switch to maximum performance. At the end of the episode Estimate full performance episode duration. Compute optimum performance level for past episode. Compute new prediction based on optimum settings. PanicThreshold = PerceptionThreshold(1 + PerformanceFactor) Predicted PerformanceFactor is the average of past optimum settings, weighted by the corresponding episode lengths. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Performance-setting algorithm Periodic activity detected Enter period-sampling mode. Switch to maximum performance. Establish base performance level. Exit period-sampling mode. Start of interactive episode If not in period-sampling mode, apply interactive episode performance-setting policy. End of interactive episode Update interactive episode statistics. Switch to base performance level, if there is periodic activity on the machine. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Performance-setting during the Acrobat Reader benchmark (200ms p.t.) Performance factor Transitions to maximum performance level are due to reaching the PanicThreshold Time (sec) Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Performance-setting during the Acrobat Reader + MP3 benchmark (200ms p Performance-setting during the Acrobat Reader + MP3 benchmark (200ms p.t.) Full performance for periodic activity. Transitions due to PanicThreshold Performance factor Time (sec) Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Hardware assumptions Minimum performance 150Mhz @ 0.75V Maximum performance 1000Mhz @ 1.75V PLL resynch time (stalls execution) 0.02ms Voltage transition time 1ms Assumptions based on Intel Xscale. We assume that processor switches to sleep mode when it is not executing an episode. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Energy factors (no MP3) This is the projected energy savings assuming that processor goes into idle mode whenever it is not active. This is a conservative assumption. If processor runs more, then more energy savings due to voltage scaling. Boxed area shows the energy factor range that we can expect on today’s high-end processors. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Energy factors with MP3 playback Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Changes in cumulative episode lengths as the result of performance scaling (Xemacs 50ms p.t. ) Cumulative percentage of time Before performance scaling After performance scaling Episode length (sec) Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Vertigo A DVS implementation for Linux 2.4 kernel. Currently runs on Transmeta Crusoe. Test machine: Sony PictureBook (PCG-C1VN) using TM5600 processor (300Mhz-600Mhz). Goals: Robust implementation. Evaluate our algorithms on computers with DVS. Contrast with conventional DVS algorithm (LongRun). Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Vertigo vs. LongRun LongRun: implemented as part of the processor. Interval based algorithm (guided by busy vs. idle time). Min. and max. range is controllable in software. Vertigo: implemented in OS kernel. Classification based algorithm. Distinguishes important from unimportant parts of execution. Takes the quality of the user experience into account. Qualitative comparison on following graphs. The two runs of the benchmarks are close but not identical. Human repeated the runs of the benchmark. Transitions to sleep are not shown. Same perceived interactive performance. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

No user activity LongRun Vertigo Performance level Time (s) Frequency range of the TM5600 processor. 50% = 300Mhz @ 1.3V 100% = 600Mhz @ 1.6V Max. energy savings that should be expected on this processor is ~34%. Performance level Vertigo Time (s) Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Emacs LongRun Vertigo Performance level Time (s) Performance level Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Acrobat Reader LongRun Vertigo Performance level Time (s) Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Acrobat Reader with sleep transitions Performance level LongRun Frequent transitions to/from sleep mode. Longer durations without sleeping. Time (s) Performance level Vertigo Time (s) Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Desired improvements Processor parameters are good enough. Faster voltage transitions would help a little. As peak performance gets higher, lower minimum performance is desirable. More sophisticated prediction algorithms. Distinguish between episode instances, not just episode types. Larger performance range for DVS processor. Puts more pressure on performance-setting algorithm. More opportunity for energy savings. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Conclusions Many interactive episodes are already fast enough. More will be fast enough in the near future. Use Dynamic Voltage Scaling to save energy. Episode classification based on inter-task communication. Fast, accurate, no user program modifications required. Performance-setting based on episode classification. Works well with multiprogramming, irregular processor utilization. Ensures high quality interactive performance. Significant energy savings (10%-80%). Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Future work Evaluate our algorithms on real hardware. Processors are slowly becoming available. Impact on interactive performance. An API to specify episodes. Light-weight: specify hints, not complete information. Works in concert with existing detection mechanism. Apply episode detection to other problems. Scheduler: can real-time deadlines be detected automatically? Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

fin. Automatic Performance Setting for Dynamic Voltage Scaling Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Response time The time it takes for the computer to respond to user initiated events. Faster is not always better. Fundamental limit to what is perceptible to humans. Movies: 20-30 frames per second. Perceptual causality: 50ms-100ms. Dragging objects on screen: 200ms. Non-continuous operation: 1-2sec. The goal is to run fast enough to meet the perception threshold, no point to running any faster. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

The performance gap Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Cumulative interactive episode length distribution Minimum performance level sufficient Max. performance Xemacs Cumulative number Cumulative time Episode length (sec) Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Communication between tasks Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling

Producer and consumer episodes Example: MP3 playback through esd sound daemon. Monitor communications to/from sound daemon. Distance between producer and consumer episodes determines necessary performance level. Krisztián Flautner - manowar@engin.umich.edu Automatic Performance Setting for Dynamic Voltage Scaling