Presentation is loading. Please wait.

Presentation is loading. Please wait.

Krisztián Flautner - Automatic Monitoring for Interactive performance and Power Reduction 1 Automatic Monitoring for Interactive.

Similar presentations


Presentation on theme: "Krisztián Flautner - Automatic Monitoring for Interactive performance and Power Reduction 1 Automatic Monitoring for Interactive."— Presentation transcript:

1 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 1 Automatic Monitoring for Interactive Performance and Power Reduction Krisztián Flautner manowar@engin.umich.edu

2 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 2 Overview A mechanism for quantifying the user experience. –Metric: response time. –Automatic, no user program modifications required. –Run-time feedback to the kernel. Multiprocessing to improve response times. Slow down processor to save energy when response times are fast enough.

3 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 3 Research contributions A metric (TLP) and a portable methodology for quantifying the amount of concurrency in a multiprocessor system. An automatic technique for detecting execution episodes that directly impact the user-perceived response times of interactive applications. Quantifying how much multiprocessing improves the responsiveness of interactive applications. An automatic mechanism for setting the optimum performance level of processors that support dynamic voltage scaling.

4 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 4 Response time Faster is not always better. –Fundamental limit to what is perceptible to humans. Movies: 20-30 frames per second. Perceptual causality: 50ms-100ms. Dragging objects on screen: 200ms. Non-continuous operation: 1-2sec. The time it takes for the computer to respond to user initiated events. The goal is to run fast enough to meet the perception threshold, no point to running any faster.

5 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 5 Episode classification Interactive episodes –When the user is waiting for the computer to respond. Periodic episodes –Producer (e.g. MP3 player). –Consumer (e.g. sound daemon).

6 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 6 A utilization trace Each horizontal quantum is a millisecond, height corresponds to the utilization in that quantum.

7 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 7 Episode classification Interactive (Acrobat Reader), Producer (MP3 playback), and Consumer (esd sound daemon) episodes.

8 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 8 Mouse movement X server updates screen every ~10ms. Update takes ~0.25ms.

9 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 9 Interactive episodes

10 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 10 Interactive episodes can include idle time Waiting for data from the network during a run of Netscape. Page rendering starts after 250ms.

11 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 11 Finding interactive episodes One way: mouse click indicates start, long idle time indicates end. –Not always accurate. –Not all episodes are initiated by mouse click. –Latency in finding the ends of episodes. Our approach: track inter-task communication. –Accurate. –Finds all interactive episodes. –No latency. –No program modifications required.

12 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 12 Tracking interactive episodes Start of an interactive episode: –X server sends a message to another task. During interactive episode: –Keep track of communicating tasks (episode’s task set). –Compute desired metrics. Conditions for ending the episode (applied to tasks in the episode’s task set): –No tasks are executing. –Data written by the tasks have been consumed. –No task was preempted the last time it ran. –No tasks are blocked on I/O.

13 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 13 Communication between tasks

14 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 14 Does multiprocessing improve interactive performance? Metrics: Response time, thread-level parallelism (TLP). Response time: duration of interactive episode. Machine is idle when all processors are idle. TLP: machine utilization when machine is not idle. Results relevant to SMT, CMP processors. Workloads: interactive desktop applications. OS: Linux 2.3.99-pre3, Mandrake 7.1, glibc 2.1.3, XFree86 3.3.6. Hardware: Dell Precision 410 Workstation: dual Pentium II 450Mhz, 512M RAM, Matrox Millennium II AGP 4M.

15 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 15 Why use TLP? Machine utilization only quantifies concurrency if there is no idle time during execution.

16 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 16 Initial results Surveyed >50 desktop applications –BeOS, Linux and Windows NT. Lots of threads, but limited concurrency. –Multimedia, web: 1.2~1.4. –TLP is workload dependent. Photoshop: 1.23-2.36 TLP. –Java apps similar to Windows apps. Lots of idle time (often >80% of execution time). 4 processor machine is overkill (for apps other than make –j and parallel MPEG player). Does TLP translate into improved response times?

17 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 17 Workloads and TLP results BenchmarkVersionDescription Dual processorUniprocessor TLP ie TLP run Idle run Acroread4.0PDF viewer 1.201.1988%87% FrameMaker5.5.6bDocument editor 1.351.3393% Ghostview3.5.8PS and PDF viewer 1.421.3984% GIMP1.1.22Image editor 1.261.2488%84% Netscape4.7Web browser 1.341.2890%89% Xemacs21.1p8Text editor 1.261.2193%92% Average 1.311.2789%88%

18 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 18 Methodology All benchmarks run by a human –Non-intrusive automation is difficult. Repeated runs of the same workload are not identical. –Inexact repeat of mouse movement. –Different amounts of idle times between episodes. –Background activity. Average results of seven runs in each configuration. –Mouse clicks used to synchronize traces. –TLP identical, response time variance <3%.

19 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 19 Response time improvement over uniprocessor BenchmarkTLP ie Response-time ( T R ) improvement Acroread 1.2015% FrameMaker 1.3522% Ghostview 1.4234% GIMP 1.2619% Netscape 1.3421% Average 1.3222% Very little idle time (<1%) during interactive episodes. Max. possible response-time improvement is 50% on a dual-processor. T R improvement = 1 - T R(DP) / T R(UP) (expected to be close to 1 – 1 / TLP)

20 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 20 Background activity: MP3 playback No MP3MP3 Avg. T R improvement on dual-processor 22%29% TLP ie 1.311.36 TLP run 1.271.23 UniprocessorDual-processor Avg. T R increase due to MP3 playback 14%4%

21 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 21 Time above the perception threshold Time above the perception threshold is given as a percentage of time spent in all interactive episodes. Data is from the uniprocessor runs.

22 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 22 Characteristics of Interactive Episodes Many interactive episodes are already fast enough. More will be imperceptible in the near future. –200ms perception threshold today estimates work done during 50ms 3 years from now. Faster is not necessarily better. –Human perception has finite resolution. Slow down the processor!

23 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 23 Why bother? 386 486 Pentium(R) MMX Pentium Pro (R) Pentium II (R) 1 10 100  Max Power (Watts) ? Source: Intel Higher performance = increased power consumption.

24 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 24 Power Density! Hot plate Nuclear Reactor Rocket Nozzle Sun’s Surface ? Source: Intel

25 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 25 Dynamic Voltage Scaling Voltage is proportional to the frequency. Reduce frequency (and corresponding voltage) to match performance demands. Since reduced frequency implies increased execution time, energy is proportional to v 2. Power = Capacitance voltage 2 frequency Energy ~ voltage 2

26 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 26 Processors supporting DVS lpARMIntel SA-1100 Transmeta Crusoe 5600 Intel XScale Intel XScale Demo Min. 8Mhz 1.1V 1.8mW 59Mhz 0.79V 106mW 500Mhz 1.2V ~1W 150Mhz 0.75V 40mW 150Mhz 0.75V 40mW Max. 100Mhz 3.3V 220mW 251Mhz 1.65V 964mW 700Mhz 1.6V ~2W 800Mhz 1.5V 900mW 1000Mhz 1.75V 1.45W Process 0.60.350.18 Max/min energy 94.41.845.4

27 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 27 Some recent desktop processors Intel Pentium IVIntel Pentium III AMD Athlon Model 4 MPC 7450 Core 1.4Ghz @ 1.7V 500Mhz @ 1.35V 733Mhz @ 1.65V 650Mhz @ 1.75V 1.2Ghz @ 1.75V 533Mhz @ 1.8V 667Mhz @ 1.8V I/O 400Mhz 100Mhz, 133Mhz 3.3V 200Mhz, 266Mhz 1.6V 133Mhz 1.8V-2.5V Process 0.18 Max. Power 66.3W 12W 19.1W 38W 66W 17W 19.1W

28 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 28 Small performance reduction = big energy savings 20% performance reduction = 32% energy reduction 40% performance reduction = 55% energy reduction Graph based on Intel XScale data

29 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 29 The key: performance-setting algorithm Use episode detection and classification. –Interactive episodes. –Periodic episodes (producer and consumer). Performance-setting on a per episode basis. Stretch episodes to their deadlines. –Interactive episode: perception threshold. –Stretch producer to consumer. No modification of existing programs needed. Works with irregular processor utilization and multiprogramming.

30 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 30 Producer and consumer episodes Example: MP3 playback through esd sound daemon. Monitor communications to/from sound daemon. Distance between producer and consumer episodes determines necessary performance level.

31 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 31 Cumulative interactive episode length distribution FrameMaker Episode length (sec) Cumulative number Cumulative time Minimum performance level sufficientMax. performance

32 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 32 Cumulative interactive episode length distribution Xemacs Episode length (sec) Cumulative number Cumulative time Minimum performance level sufficientMax. performance

33 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 33 Performance-setting strategy for interactive episodes Predict the performance factor that would be correct most of the time (not for most events). –Based on past optimal performance factors. Limit worst case impact on response time. No need to predict episode length. –Performance factors have smaller range.

34 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 34 Performance-setting for interactive episodes Wait 5ms before transition to ignore short episodes Switch to predicted performance level. If episode duration reaches PanicThreshold, switch to maximum performance. Estimate full performance episode duration. Compute optimum performance level for past episode. Compute new prediction based on optimum settings. At the beginning of the episode During the episode At the end of the episode PanicThreshold = PerceptionThreshold(1 + PerformanceFactor) Predicted PerformanceFactor is the average of past optimum settings, weighted by the corresponding episode lengths.

35 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 35 Performance-setting algorithm Enter period-sampling mode. Switch to maximum performance. Establish base performance level. Exit period-sampling mode. Periodic activity detected If not in period-sampling mode, apply interactive episode performance-setting policy. Start of interactive episode Update interactive episode statistics. Switch to base performance level, if there is periodic activity on the machine. End of interactive episode

36 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 36 Advantages Automatic. Impact on response time is quantifiable. –Performance can be adapted to the user’s preference. Works well in the presence of multiprogramming. Irregular processor utilization is not a problem. Implementation requires very little state. –Weighted average: two counters. Rescale to adapt to temporal variations. Existing interval-based schemes: No feedback about service quality. Only work well if processor utilization is regular.

37 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 37 Performance-setting during the Acrobat Reader benchmark (200ms p.t.) Time (sec) Performance factor Transitions to maximum performance level are due to reaching the PanicThreshold

38 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 38 Performance-setting during the Acrobat Reader + MP3 benchmark (200ms p.t.) Time (sec) Performance factor Transitions due to PanicThreshold Full performance for periodic activity.

39 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 39 Hardware assumptions Minimum performance150Mhz @ 0.75V Maximum performance1000Mhz @ 1.75V PLL resynch time (stalls execution) 0.02ms Voltage transition time1ms Assumptions based on Intel Xscale. We assume that processor switches to sleep mode when it is not executing an episode.

40 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 40 Energy factors (no MP3)

41 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 41 Energy factors with MP3 playback

42 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 42 Changes in cumulative episode lengths as the result of performance scaling (Xemacs 50ms p.t. ) Episode length (sec) Before performance scaling After performance scaling Cumulative percentage of time

43 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 43 Desired improvements Processor parameters are good enough. –Faster voltage transitions would help a little. –As peak performance gets higher, lower minimum performance is desirable. More sophisticated prediction algorithms. –Distinguish between episode instances, not just episode types.

44 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 44 Conclusions Multiprocessing can significantly improve response times. –Measured 15%-38% improvement (out of possible 50%)! Many interactive episodes are already fast enough. –More will be fast enough in the near future. –Use Dynamic Voltage Scaling to save energy. Episode classification based on inter-task communication. –Fast, accurate, no user program modifications required. Performance-setting based on episode classification. –Works well with multiprogramming, irregular processor utilization. –Ensures high quality interactive performance. –Significant energy savings (10%-80%).

45 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 45 Future work Evaluate our algorithms on real hardware. –Processors are slowly becoming available. –Impact on interactive performance. An API to specify episodes. –Light-weight: specify hints, not complete information. –Works in concert with existing detection mechanism. Apply episode detection to other problems. –Scheduler: can real-time deadlines be detected automatically?

46 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 46 fin.

47 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 47 The performance gap

48 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 48 Applicability to other environments Technique exploits information from existing design patterns. On Linux with X windows: Communication through sockets, pipes, signals. Well-known tasks: X server, sound daemon, etc. Select syscall used for asynchonous I/O. Use of blocking system calls in dedicated threads. Other systems: Adapt to that system’s design patterns and IPC mechanisms.

49 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 49 Computing the performance factor for interactive episodes

50 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 50 Performance scaling

51 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 51 Energy-delay (no MP3) Increase of perceptible interactive episode lengths Energy factor

52 Krisztián Flautner - manowar@engin.umich.edu Automatic Monitoring for Interactive performance and Power Reduction 52 Energy-delay (MP3) Increase of perceptible interactive episode lengths Energy factor


Download ppt "Krisztián Flautner - Automatic Monitoring for Interactive performance and Power Reduction 1 Automatic Monitoring for Interactive."

Similar presentations


Ads by Google