System and Circuit Level Power Modeling of Energy-Efficient 3D-Stacked Wide I/O DRAMs Karthik Chandrasekar TU Delft Christian Weis $, Benny Akesson*, Norbert.

Slides:



Advertisements
Similar presentations
Tunable Sensors for Process-Aware Voltage Scaling
Advertisements

Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
A Case for Refresh Pausing in DRAM Memory Systems
1 Exploiting 3D-Stacked Memory Devices Rajeev Balasubramonian School of Computing University of Utah Oct 2012.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
End of Column Circuits Sakari Tiuraniemi - CERN. EOC Architecture 45 9 Ref CLK 40 MHz DLL 32-bit TDC bank address RX 5 TDC bank address RX 5 TDC bank.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures June 14 th 2014 Prashant J. Nair - Georgia Tech David A. Roberts- AMD Research.
XPower for CoolRunner™-II CPLDs
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014 MICRO-47 Cambridge UK Prashant Nair - Georgia Tech David.
Low Power Memory. Quick Start Training Agenda What constitutes low power memory Variations & vendors of low power memory How to interface using CoolRunner-II.
DDR2 Serial Presence Detect Revision 1.1 (& DIMM Labels) Bill Gervasi Senior Technologist, Netlist Chairman, JEDEC Small Modules & DRAM Packaging Committees.
A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
 2003 Micron Technology, Inc. All rights reserved. Information is subject to change without notice. High Performance Next­ Generation Memory Technology.
Mobile System Considerations for SDRAM Interface Trends Andrew B. Kahng †‡, Vaishnav Srinivas ‡¥ June 5 th, 2011 CSE † and ECE ‡ Departments University.
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
1 Lecture 7: Caching in Row-Buffer of DRAM Adapted from “A Permutation-based Page Interleaving Scheme: To Reduce Row-buffer Conflicts and Exploit Data.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
1 Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu.
Jamie Unger-Fink John David Eriksen. Outline Intro to LCDs Power Issues Energy Model New Reduction Techniques Results Conclusion.
Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM.
DDR SDRAM ASIC Course Saeed Bakhshi May 2004 Class presentation based on ISSCC2003 paper: A 1.8V, 700Mb/s/pin, 512Mb DDR-II SDRAM with On-Die Termination.
Spring 2007W. Rhett DavisNC State UniversityECE 747Slide 1 ECE 747 Digital Signal Processing Architecture SoC Lecture – Working with DRAM April 3, 2007.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Run-Time Power-Down Strategies for Real-Time SDRAM Memory Controllers Karthik Chandrasekar 1, Benny Akesson 2, and Kees Goossens 2 1 TU Delft and 2 TU.
Virtual Platforms for Memory Controller Design Space Exploration Matthias Jung, Christian Weis, Norbert Wehn University of Kaiserslautern, Germany.
Chalmers University of Technology FlexSoC Seminar Series – Page 1 Power Estimation FlexSoc Seminar Series – Daniel Eckerbert
A CMOS SoC for 56/18/16 CD/DVD-dual/RAM Applications (ISSCC2006 paper 14.8) Speaker: Bing-Yu Hsieh MediaTek Inc., Hsin-Chu, Taiwan Authors: Jyh-Shin Pan,
Reducing Refresh Power in Mobile Devices with Morphable ECC
XPower for CoolRunner™ XPLA3 CPLDs. Quick Start Training Overview Design power considerations Power consumption basics of CMOS devices Calculating power.
A Mixed Time-Criticality SDRAM Controller MeAOW Sven Goossens, Benny Akesson, Kees Goossens COBRA – CA104 NEST.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
An 800-MHz Embedded DRAM With a Concurrent Refresh mode Toshiaki Kirihata, Senior Member, IEEE, Paul Parries, David R. Hanson, Hoki Kim, Member, IEEE,
Routing Prefix Caching in Network Processor Design Huan Liu Department of Electrical Engineering Stanford University
BEAR: Mitigating Bandwidth Bloat in Gigascale DRAM caches
PPEP: online Performance, power, and energy prediction framework
Morgan Kaufmann Publishers
Thanushan Kugathasan, CERN Plans on ALPIDE development 02/12/2014, CERN.
Low Power, High-Throughput AD Converters
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
1 DCIS’03, Ciudad Real, November 2003 Measuring Power and Energy of CMOS Circuits: A Comparative Analysis J. Rius, A. Peidro, S. Manich, R. Rodriguez.
©Wen-mei W. Hwu and David Kirk/NVIDIA, University of Illinois, CS/EE 217 GPU Architecture and Parallel Programming Lecture 6: DRAM Bandwidth.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.
Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.
H. Krüger, , DEPFET Workshop, Heidelberg1 System and DHP Development Module overview Data rates DHP function blocks Module layout Ideas & open questions.
Low Power, High-Throughput AD Converters
PPEP: online Performance, power, and energy prediction framework
ECE 4100/6100 Advanced Computer Architecture Lecture 11 DRAM
Reducing Memory Interference in Multicore Systems
Andrea Acquaviva, Luca Benini, Bruno Riccò
System On Chip.
A DCO Compiler for All-Digital PLL Design
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Short Circuiting Memory Traffic in Handheld Platforms
Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor
What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study Saugata Ghose, A. Giray Yağlıkçı, Raghav Gupta, Donghyuk Lee,
DRAM Bandwidth Slide credit: Slides adapted from
DRAM Hwansoo Han.
Presentation transcript:

System and Circuit Level Power Modeling of Energy-Efficient 3D-Stacked Wide I/O DRAMs Karthik Chandrasekar TU Delft Christian Weis $, Benny Akesson*, Norbert Wehn $ & Kees Goossens # $ * #

Motivation for 3D-stacking of DRAMs Problem Statement - Power Modeling Circuit-level DRAM architecture & power model System-level DRAM power model (DRAMPower) Comparison: Results and Analysis Summary Overview 19-Mar-131Karthik Chandrasekar / TU Delft

[I/O power per bit: 0.7mW in TSV vs 2.3mW in PoP vs 4.6mW in Off-Chip – Samsung] The Performance Vs. Power Factor Motivation: Why 3D-Stacked DRAMs? 19-Mar-13Karthik Chandrasekar / TU Delft2 Images & Data Courtesy: HMC, JEDEC 42.6, FineTech, Nvidia, Samsung  

An accurate 3D-DRAM Power Model to design DRAM-stacked SoCs What’s missing? [Problem Statement] 19-Mar-13Karthik Chandrasekar / TU Delft3

Circuit-level Power Model – Modeling the DRAM architecture at the circuit-level in SPICE – Pros: Accurate and detailed – Cons: Slow, requires circuit-level understanding of DRAM architecture & technology specifications for DRAMs are publicly unavailable System-level Power Model (like Micron’s) – Based on vendor provided datasheet measures and JEDEC specifications – Pros: Fast, easy to integrate & employs simple models for memory operations – Cons: Accuracy is unclear. Not directly applicable for 3D-DRAMs and is not verified against circuit-level models or hardware measurements. Approaches to power modeling 19-Mar-13Karthik Chandrasekar / TU Delft4 Need: Fast, Simple & Accurate Model

Develop A System-Level 3D-DRAM Power Model i.e. as accurate as What’s the solution? 19-Mar-13Karthik Chandrasekar / TU Delft5 A Circuit-Level 3D-DRAM Power Model

19-Mar-13Karthik Chandrasekar / TU Delft6 Circuit-Level DRAM Modeling Baseline DRAM Model (Weis) DATE‘11 and DAC‘13 NGSPICE - PTM/BSIM 1T1C Cell to Banks 2D to 3D (New) Based on DATE ‘11 & JEDEC Wide IO – x512 4 Banks/Channel 4 Channels TSV Routing – Data, Cmd & Addr – Control, Clock & Power No ODT (On Die Termination) – Low Freq. & IO Capacitance No DLL (Delay Locked Loop) TSV model from IMEC/GaTech

System-Level Power Model (DRAMPower) 19-Mar-13Karthik Chandrasekar / TU Delft7 Problem with Micron’s model: Not directly applicable for 3D-DRAMs (Multiple voltage domains and IO) Accuracy is unclear (State transitions not addressed & Approx. workload used) Not verified against circuit-level models or hardware power measurements. Comparison to Micron model Adapting to 3D-DRAMs: Considers multiple voltage domains: (a) Core (b) Derived (Wordline) Includes IO power consumption (Incl. I/O Pads, Buffers, Bumps, Drivers & Pins) RD operation Energy (Generic equation): Modeling for Accuracy: Models memory state transitions – from active to power-down Models self-refresh accurately (functional correctness & timing difference) Most importantly: Is almost as accurate as the circuit-level model

Self-Refresh Operation - Accuracy 19-Mar-13Karthik Chandrasekar / TU Delft8 MicronSREFNOP SREXNOP Timings< SR EF >< XSDLL > Active Current Bckgnd CurrentIDD6 IDD2N ActualSREFNOP SREXNOP Timings< RFC-RP > < R P >< SREF >< X S > Active Current IDD5- IDD3N IDD5- IDD2N Bckgnd CurrentIDD3P0 IDD2P0 IDD6 IDD2N Actual Internal Refresh No DLL We furnish new equations in the system-level power model to address such accuracy issues

Experiment I: – Different Operations – Different Granularity Results: – Less than 2% difference – Adapted Micron SR (200): 72% diff. Experiment II: – H.263 Encoder & EPIC Encoder – JPEG Encoder & MPEG2 Decoder – Different Loads and Power Modes Results: – Less than 2% difference – Adapted Micron: 12% diff. (SR 500MHz) The 2% difference is due to the use of JEDEC-specified averaged IDD currents. Comparison: Results & Analysis 19-Mar-13Karthik Chandrasekar / TU Delft9 Shows the accuracy of the system-level power model

Key Highlights: Presented an accurate datasheet-based system-level power model for Wide I/O 3D-stacked DRAMs. Verified the system-level model for accuracy against as a detailed SPICE-based circuit-level 3D-DRAM architecture and power model. Observed < 2% difference in power and energy estimates for different memory operations and for any variations in memory load. Other Important Contributions: Provided estimates for IDD current measures for different JEDEC 3D-DRAM configurations, in place of the as yet unavailable datasheets (in the paper). The system-level power model (DRAMPower) has been released online as an open-source 3D-DRAM power estimation tool. Download link: Summary 19-Mar-13Karthik Chandrasekar / TU Delft10