Software-Hardware Cooperative Power Management Technique for Main Memory So, today I’m going to be talking about a software-hardware cooperative power.

Software-Hardware Cooperative Power Management Technique for Main Memory
So, today I’m going to be talking about a software-hardware cooperative power management technique for main memory. This work was done at IBM Austin Research Lab during the summer Hai Huang, Kang G. Shin University of Michigan Charles Lefurgy, Karthick Rajamani, Tom Keller, Eric Van Hensbergen, Freeman Rawson IBM Austin Research Lab

Motivation High power dissipation causes a lot problems for many computing systems, especially for large servers High electric and cooling cost Unreliable electronic components Low rack-density Intelligent management of system power is important to ensure these systems can continue to function The motivation of this work is that high-power dissipation is causing a lot heat-related problems for many computing systems, especially for large servers, for example, high electric and cooling cost, unreliable electronic components, and lower rack density. To alleviate these problems, we need intelligent management for the system power

DRAM: A Power Hog Main memory (DRAM) consumes a significant portion of the total power – which makes it a good candidate to optimize power for E.g., in an IBM mid-range eServer system, around 40% of the total power is consumed by the main memory The main focus of this work is to reduce power for the main memory system because it can consume a very significant portion of the total system power. It has been reported that for an IBM mid-range eServer system, around 40% of the total power is being dissipated by the main memory. Therefore, it is definitely a good candidate for us to manage power.

Outline Motivation Background Previous Work A Cooperative Approach
Results Conclusion The outline for the rest of the talk is as follows. Next, I’m going to give a little bit of background information on DRAM, and specifically, we are going to be focusing on its power management capabilities. Then I’m going to be talking a little about previous works. Then I am going to propose a new cooperative power management technique, followed by some experimental results and finally we conclude.

Results Conclusion

Background DRAM dissipates power continuously
Self-refresh, row/column decoders, amplifiers, data queue, etc. DRAM’s power management capabilities Multiple power states Memory controller is used to implement a simple interface to transition between these states Transitions have non-negligible delays Trade-offs between power and performance DRAMs are simple solid-state devices that consume power continuously. Some of the main energy consuming components include the self-refresh circuitry, row/column decoders, amplifiers, and data queues. In order to reduce power, we need to power down some of these components. To make things easier for the system programmer, memory controller is used to implement a simple interface such that we can use it to transition memory devices to various low-power states, and the memory controller takes care of the all the power-up and power-down operations and all the timing constraints. But because the transitional delays between various power states can be non-negligible, we still need to play the usual game of energy-delay tradeoff.

Example: DDR Example: Registered 512MB DDR module w/8 devices per rank
Read/Write (779.1 mW) Power-down (150 mW) Standby (275.0 mW) Self-refresh (20.87 mW) 5ns 1000ns auto To make things more concrete, let’s look at a real example using DDR devices. DDR devices have four major power states defined. In this transition diagram we show the power dissipation of each of the power states and the transitional delays between them. As we can see, the lower the power Normally, it is in Standby mode, and it transitions to Read/Write state automatically when I/O commands are issued and it transitions back to Standby state right after I/O completes. From the Standby state, we can also manually transition to two low-power states, where self-refresh dissipates much less energy than power-down, but with a much higher resynchronization delay. These are the power and performance characteristics, and now let’s looks at some of the power management techniques leveraging these characteristics Example: Registered 512MB DDR module w/8 devices per rank

Software Techniques Hardware Techniques A Cooperative Approach Results Conclusion We first look at software techniques, where power management decisions are made by the operating system software, and then we look at hardware techniques, where these decisions are done at a much lower level – usually at the memory controller level. We then analyze the advantages and disadvantages of the two approaches and propose a software-hardware cooperative technique and show why it is superior

Software Technique Process i: uses ranks 0 and 2
Process j: uses rank 3 OS can track each process’ virtual-to-physical memory mappings Self-refresh Standby Process i context-switched in Process j context-switched in Self-refresh Standby Rank 0 Rank 1 Rank 2 Rank 3 time In the software approach, operating system is in total control of the power. Because the operating system knows everything about a process, including each process’ virtual-to-physical memory mapping, the OS knows exactly which memory regions are used, and which are not by each of the processes. At each context switch, it turns off unused memory regions by the schedule process, which not only saves energy but also not affects performance. So let’s look at an example. In this example, process i has mapped pages in rank 0 and 2, therefore from the time process I contexted switched in til the time it contexted switched out, the OS can safely turn off ranks 1 and 3 to reduce power while not suffering from any performance penalties. Then, say the next process only uses rank 3, the OS can turn off rank 0, 1 and 2, and so on. The advantage of software techniques is that it doesn’t require complicated hardware modifications and has simple control. However, due to its coarse-grained control, many energy-saving opportunities are lost. For example, if process I uses pages mapped to rank 2 very rarely, it is not very energy efficient to keep it in Standby mode at all times while this process is executing. Now, let’s look at hardware techniques to see how they manage power for the memory.

Hardware Technique Allows for much finer-grained control of power
Monitors each memory access Predicts when to transition to lower power modes Idle time > Threshold Standby Self-refresh Idle time < Threshold read/write time power Hardware techniques allow for much finer-grained control of power because they continuously monitor every memory access, and based on the past observations, they make predictions on when to transition to lower power states. Again, we use an example to illustrate its fine-grained control mechanism. Each of the blue arrows indicates a memory access, and after each memory access completes, the memory controller starts a timer to keep track of idle time, and if this idle time exceeds a dynamically determined threshold, this memory rank is transitioned to a lower power state. If another memory access starts before the idle time exceeds this threshold, we restart the timer. As we can see, such fine grained control mechanism can extract a lot of idle times for energy saving purpose. But it also has a major problem

Hardware Technique: Problems
Hardware techniques can be easily confused by constant context-switching Different processes would have different memory access behavior, and it takes time for the memory controller to adapt, readapt, readapt… Process i Process j time The problem is that the hardware technique monitors memory accesses and controls power at a such a low hardware level that it doesn’t understand what’s going on at the software layer. But ironically, it is the OS and the user-level processes that are driving all the memory accesses. So, by not knowing this information, it will likely to make the wrong power management decisions that not only negatively affects power but also performance as well. The one thing that causes the most problem for the hardware is the constant context-switching in the software layer because different processes may have very different memory access behaviors, which means the hardware needs to adapt, readapt over and over again every time we have a context switch, which makes it very inefficient memory accesses - Imagine hundreds of parallel processes instead of 2! - context switching interval ~ 1 msec

Results Conclusion Now, let’s now look at how we can improve upon these previous techniques by showing a software-hardware cooperative technique where the software is used to assist the hardware to better manage the power

Cooperative Approach Improve the hardware technique so we don’t have to readapt, readapt, readapt… Need system software cooperation Make the hardware understand the notion of processes At each context switch, OS sends a signal to the memory controller Upon receiving this signal, the memory controller saves and restores its internal registers, which are used for keeping past memory access patterns Essentially, we can now manage power for the current process solely depending on this and only this process’ past memory accesses So what we did was to improve upon the hardware technique so it doesn’t have to needlessly readapt over and over again at every time we context switch. To do this, we would need to make the hardware understand the notion of processes, which requires some collaboration from the system software. So, this is how it works: At each context switch, the operating system sends a context-switching signal to the memory controller. Upon receiving this signal, the memory controller saves its internal registers, which are used for keeping past memory access patterns, and associates this set of registers to the current process, and restores a set of register contents into the memory controller internal registers that were previously saved in the same manner for the now-scheduled process. This is just like the way we’re saving and restoring CPU registers at each context switch. Essentially, the memory controller can now manage power for the currently running process depending solely on this process’s past memory accesses

Context-Aware Memory Controller
Registers Threshold predictor CPU Registers Saves current process’ CPU context MC context Restores scheduled process’ CPU context and MC context Signals context switch This is a graphical representation of what I just said

Cooperative Technique: Per-Process
Process i memory accesses Process j Process i Process j time We have seen this example before, and also have seen why hardware technique is inefficient. By using the cooperative technique, the memory controller can quickly adapt its power management strategies depending on which is the current running process.

Results Conclusion Now, let’s look at some experiments and compare the results

Experimental Setup Mambo: Memsim: Workloads:
A full-machine simulator to run various workloads and collect memory traces Memsim: Trace-driven simulator that produces performance and power results for the main memory Workloads: SPECjbb + bzip2 + crafty (low memory-intensive) SPECjbb + art + mcf (high memory-intensive) As the hardware is not available in any of today’s systems, the next best thing we can do is to implement our system using a machine simulator. We chose Mambo, which we used to run various workloads and collect memory traces. The these traces then are fed into a main memory simulator called Memsim, which produces performance and power results. We used two different workloads for this work. We call them low memory-intensive and high memory-intensive workload. In the low memory intensive workload, we used SPECjbb with two of the low memory intensive workloads from the SPECcpu benchmarks. In the high memory intensive workload, we used SPECjbb with two of the high-memory intensive workloads from the SPECcpu benchmarks.

Results Low-memory intensive workload High-memory intensive workload
Here we show the results for our workloads. High-memory intensive workload

Conclusion Cooperative technique Future Work
Uses 72–75% less power than when no power management is applied, with 11–14% slow-down in average response time Uses 14–17% less power than the hardware technique Uses 16–26% less power than the software technique Has a comparable performance to HW and SW techniques Future Work Communicate hints directly from user processes to the hardware

Software-Hardware Cooperative Power Management Technique for Main Memory So, today I’m going to be talking about a software-hardware cooperative power.

Similar presentations

Presentation on theme: "Software-Hardware Cooperative Power Management Technique for Main Memory So, today I’m going to be talking about a software-hardware cooperative power."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Software-Hardware Cooperative Power Management Technique for Main Memory So, today I’m going to be talking about a software-hardware cooperative power.

Similar presentations

Presentation on theme: "Software-Hardware Cooperative Power Management Technique for Main Memory So, today I’m going to be talking about a software-hardware cooperative power."— Presentation transcript:

Similar presentations

About project

Feedback