Zehan Cui, Yan Zhu, Yungang Bao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011.

Slides:



Advertisements
Similar presentations
6-April 06 by Nathan Chien. PCI System Block Diagram.
Advertisements

HARDWARE Rashedul Hasan..
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
MEMORY TECHNOLOGY FOR SMALL FORM FACTOR SYSTEMS
CP1610: Introduction to Computer Components Primary Memory.
Recent Progress In Embedded Memory Controller Design
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
LOGO.  Concept:  Is read-only memory.  Do not lose data when power is lost.  ROM memory is used to produce chips with integrated.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
IT Essentials: PC Hardware and Software 1 Chapter 3 Assembling a Computer.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
* Definition of -RAM (random access memory) :- -RAM is the place in a computer where the operating system, application programs & data in current use.
Optimizing RAM-latency Dominated Applications
CH05 Internal Memory Computer Memory System Overview Semiconductor Main Memory Cache Memory Pentium II and PowerPC Cache Organizations Advanced DRAM Organization.
DDR MEMORY  NEW TCEHNOLOGY  BANDWIDTH  SREVERS, WORKSTATION  NEXT GENERATION OF SDRAM.
Memory Technology “Non-so-random” Access Technology:
Computer Architecture Part III-A: Memory. A Quote on Memory “With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
Computer maintenance chapters 1-7 review By Benjamin Houlton.
Basic Computer Structure and Knowledge Project Work.
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
Embedded System Lab. 최 길 모최 길 모 Kilmo Choi A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore.
Systems Overview Computer is composed of three main components: CPU Main memory IO devices Refers to page
A Detailed Discussion of SRAM Niels Asmussen Maggie Hamill William Hunt.
CPEN Digital System Design
Guangdeng Liao, Xia Zhu, Steen Larsen, Laxmi Bhuyan, Ram Huggahalli University of California, Riverside Intel Labs.
Computer system & Architecture
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
Computer Architecture Lecture 24 Fasih ur Rehman.
Lecture#15. Cache Function The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
Instructor: Syed Shuja Hussain Chapter 2: The System Unit.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
OPERATING SYSTEMS CS 3502 Fall 2017
Computer Memory.
The University of Adelaide, School of Computer Science
What happens inside a CPU?
CS-301 Introduction to Computing Lecture 17
Unit 2 Computer Systems HND in Computing and Systems Development
Hui Chen, Shinan Wang and Weisong Shi Wayne State University
The Main Memory system: DRAM organization
Lecture: DRAM Main Memory
Introduction to Computing
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Adapted from slides by Sally McKee Cornell University
Chapter 4: MEMORY.
DDR SDRAM The Memory of Choice for Mobile Computing
15-740/ Computer Architecture Lecture 19: Main Memory
AKT211 – CAO 07 – Computer Memory
Bob Reese Micro II ECE, MSU
Haonan Wang, Adwait Jog College of William & Mary
Presentation transcript:

Zehan Cui, Yan Zhu, Yungang Bao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011

Motivation Design & Implementation Experiments Conclusion & Work in Progress

Motivation Design & Implementation Experiments Conclusion & Work in Progress

Watts/Server [source: The Problem of Power Consumption in Servers,Intel,2009] CPU no longer dominates the system power. [source: Barroso et. al., The datacenter as a computer, 2009]

Measurement is the basis. Low power Hardware Software model measurement

Component-Level: ATX-based method accuracy Directly powered through ATX wires. Modern motherboards mostly have dedicated ATX wires for processor. VRM (Voltage Regulation Module) loss Modern motherboards mostly have dedicated ATX wires for processor. VRM (Voltage Regulation Module) loss Usually deduced from multi ATX wires. Platform dependent. Usually deduced from multi ATX wires. Platform dependent.

Motivation Design & Implementation Experiments Conclusion & Work in Progress

Power Supply Disk & CPU Similar to other ATX-based methods Memory & Add-in Card Devices Wrapper-based methods Advantages Accurate: direct measurement Easy-to-use: no deduction needed Portable: multi-platform Current Sensor

Prototype Disk power CPU power Memory power

Motivation Design & Implementation Experiments Conclusion & Work in Progress

401.bzip2 from SPECCPU2006

More frequently we measure the power, more details we can get. Observation: 5,000 samples/s is an appropriate sample frequency at component level. Observation: 5,000 samples/s is an appropriate sample frequency at component level.

Higher BW, but lower Power Higher BW, but lower Power Lower BW, Higher Power Lower BW, Higher Power

Time: 6.5 times longer Power: slightly lower Energy: 5.9 times higher Time: 6.5 times longer Power: slightly lower Energy: 5.9 times higher Malloc 512MB Access in different strides Two causes Row conflict Lots of TLB miss increase row buffer hit rate large page may be more efficient What is the relationship between performance and power?

64MB memory Random vs. Sequential Jump at least 64B eliminate cache hit Large page(2MB) eliminate TLB miss Load/Sotre_Unit % = LSU_stall_time/CPU_Cycle Observation: It seems that DRAM power is already proportional to bandwidth. But the fact is that … Observation: It seems that DRAM power is already proportional to bandwidth. But the fact is that …

Use different SEEDs to generate different random access patterns; Power varies less than 1.1%. Observation: DRAM power is highly correlated to two factors Load/Store Unit Utilization Sequential / Random We can build memory power models based on the two factors rather than Bandwidth. Observation: DRAM power is highly correlated to two factors Load/Store Unit Utilization Sequential / Random We can build memory power models based on the two factors rather than Bandwidth.

Motivation Design & Implementation Experiments Conclusion & Work in Progress

We use a hybrid approach ATX-Based CPU/Disk Wrapper card DRAM/… 5KHz is an appropriate sampling frequency to disclose fine-grain power behavior. DRAM power is highly correlated to Load/Store Unit Utilization, rather than Bandwidth.

Upgrade current system Support DDR3 Support Large memory capacity Support 40 simultaneous measuring channels Use FPGA to collect measured data Correlate the measured power data with high-level semantics information

Thanks & Questions?

Backup

Wrapper Card already exists We only did several small modifications Current Sensor Power SupplySignals

DIMM slot Motherboard DIMM: Dual-Inline Memory Module Normal

With our initial wrapper card DIMM slot Motherboard DIMM Wrapper Card

28 Bank 0 Sense Amps Column Decoder Sense Amps Column Decoder Row Decoder ODT Recievers Driver s Registers Write FIFO Banks Independent arrays Asynchronous: independent of memory bus speed Banks Independent arrays Asynchronous: independent of memory bus speed I/O Circuitry Runs at bus speed Clock sync/distribution Bus drivers and receivers Buffering/queueing I/O Circuitry Runs at bus speed Clock sync/distribution Bus drivers and receivers Buffering/queueing On-Die Termination Required by bus electrical characteristics for reliable operation Resistive element that dissipates power when bus is active On-Die Termination Required by bus electrical characteristics for reliable operation Resistive element that dissipates power when bus is active [Source: H. David et. al., Memory Power Management via Dynamic Voltage/Frequency Scaling, ICAC, 2011]

Can be approximately divided into Background power considered to be stable Bank power active/precharge Related to frequency of row operation I/O power Burst proportional to bandwidth Termination power Termination resistors Proportional to bandwidth

P = U * I ADC or DMM ADC or DMM CSA (Current-Sense Amplifier) CSA (Current-Sense Amplifier) DC Voltage DC Current Doesnt fluctuate too much, less than 2% in our platform. Collector (PC) Collector (PC) Data

Possible reason for non-proportional of random power in slide17: When bandwidth is low, auto-precharge (caused by refresh) cause every access needs ACTIVE; the bank power is proportional to bandwidth. When bandwidth is high, some access may hit in the row buffer, which need less ACTIVE; the slope of bank power increase is lower than before.