Virtual Platforms for Memory Controller Design Space Exploration Matthias Jung, Christian Weis, Norbert Wehn University of Kaiserslautern, Germany.

Slides:



Advertisements
Similar presentations
Using emulation for RTL performance verification
Advertisements

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Mohammadsadegh Sadri, Christian Weis, Norbert When and Luca Benini
Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)
Jared Casper, Ronny Krashinsky, Christopher Batten, Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA A Parameterizable.
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin.
Berlin, Germany – January 21st, 2013 A2B: A F RAMEWORK FOR F AST P ROTOTYPING OF R ECONFIGURABLE S YSTEMS Christian Pilato, R. Cattaneo, G. Durelli, A.A.
System and Circuit Level Power Modeling of Energy-Efficient 3D-Stacked Wide I/O DRAMs Karthik Chandrasekar TU Delft Christian Weis $, Benny Akesson*, Norbert.
A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.
Computer Architecture Evaluation, Simulation and Research OSU ECE OS Interaction with Cache Memories Dr. Sohum Sohoni School of Electrical and Computer.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
RTL Processor Synthesis for Architecture Exploration and Implementation Schliebusch, O. Chattopadhyay, A. Leupers, R. Ascheid, G. Meyr, H. Steinert, M.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
DDR MEMORY  NEW TCEHNOLOGY  BANDWIDTH  SREVERS, WORKSTATION  NEXT GENERATION OF SDRAM.
Temperature Variation Aware Energy Optimization in Heterogeneous MPSoCs Mohammadsadegh Sadri Department of Electrical, Electronic and Information Engineering.
UNDERSTANDING THE ROLE OF THE POWER DELIVERY NETWORK IN 3-D STACKED MEMORY DEVICES Manjunath Shevgoor, Niladrish Chatterjee, Rajeev Balasubramonian, Al.
Please do not distribute
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
SystemC: A Complete Digital System Modeling Language: A Case Study Reni Rambus Inc.
A Flexible Multi-Core Platform For Multi-Standard Video Applications Soo-Ik Chae Center for SoC Design Technology Seoul National University MPSoC 2009.
Timing Channel Protection for a Shared Memory Controller Yao Wang, Andrew Ferraiuolo, G. Edward Suh Feb 17 th 2014.
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
Analyzing Performance Vulnerability due to Resource Denial-Of-Service Attack on Chip Multiprocessors Dong Hyuk WooGeorgia Tech Hsien-Hsin “Sean” LeeGeorgia.
1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
Min Lee, Vishal Gupta, Karsten Schwan
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.
SOC Virtual Prototyping: An Approach towards fast System- On-Chip Solution Date – 09 th April 2012 Mamta CHALANA Tech Leader ST Microelectronics Pvt. Ltd,
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
Project 11: Influence of the Number of Processors on the Miss Rate Prepared By: Suhaimi bin Mohd Sukor M
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Fast Energy Evaluation of Embedded Applications for Many-core Systems Felipe Rosa, Luciano Ost, Thiago Raupp, Fernando Moraes, Ricardo Reis.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
15-740/ Computer Architecture Lecture 25: Main Memory
Effect of Instruction Fetch and Memory Scheduling on GPU Performance Nagesh B Lakshminarayana, Hyesoon Kim.
1 COMP427 Embedded Systems Lecture 3. Virtual Platform Prof. Taeweon Suh Computer Science Education Korea University.
University of Tehran 1 Microprocessor System Design Omid Fatemi.
Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin
Lynn Choi School of Electrical Engineering
Framework For Exploring Interconnect Level Cache Coherency
Seth Pugsley, Jeffrey Jestes,
Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
SystemC Simulation Based Memory Controller Optimization
A Requests Bundling DRAM Controller for Mixed-Criticality System
QuickPath interconnect GB/s GB/s total To I/O
The Triplets – ROM & RAM & Cache
IEEE1666 SystemC TLM2.0 Coulpling gem5 with Dipl.-Ing. Matthias Jung
Staged-Reads : Mitigating the Impact of DRAM Writes on DRAM Reads
Milad Hashemi, Onur Mutlu, Yale N. Patt
Horizontally Partitioned Hybrid Main Memory with PCM
Die Stacking (3D) Microarchitecture -- from Intel Corporation
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
Manjunath Shevgoor, Rajeev Balasubramonian, University of Utah
Demystifying Complex Workload–DRAM Interactions: An Experimental Study
Rajeev Balasubramonian
Funded by the Horizon 2020 Framework Programme of the European Union
CS Introduction to Operating Systems
Presentation transcript:

Virtual Platforms for Memory Controller Design Space Exploration Matthias Jung, Christian Weis, Norbert Wehn University of Kaiserslautern, Germany

Microelectronic Systems Design 64B Standard Memory System msms ss HDD/SSDDRAM L3 shared cache 8-12MB. Memory Controller: 3 Channels L2 private cache 256KB L1 private cache 64KB CORE 16B nsns SRAM Pin limitation due to package Power hungry I/O transceivers Bandwidth RequirementsMemory Wall 512B

Microelectronic Systems Design 3D Stacked Wide I/O DRAM Stacked DRAM dies TSV connections Multiple Channels Increasing bandwidth demand Higher available bandwidth 1 or 2 Channel DDR3 Memory controller bottleneck  New generation of Memory Controllers is required 3D stacked DRAM MPSoC

Microelectronic Systems Design Design Space Exploration with Virtual Platforms Huge design space of 3D-DRAM controller Flexible and cycle approx. models are needed for fast investigation RTL simulation is too slow for system level analysis  TLM based virtual platforms with Synopsys Platform Architect  Speedup of TLM models up to 377x compared to CA 1  Simulating in seconds instead of hours 1 M. Jung, et al. TLM Modelling of 3D Stacked Wide I/O DRAM Subsystems, in Proc. HiPEAC Conference 2013, Berlin.

Microelectronic Systems Design Special TLM DRAM Protocol 1 Application specific phases with DECLARE_EXTENDED_PHASE() Phases derived from DRAM commands (Jedec Wide I/O Standard) DRAM commands: ACT, PRE, RD, WR, REFA … Example: 1 M. Jung, et al. TLM Modelling of 3D Stacked Wide I/O DRAM Subsystems, HiPEAC, 2013, Berlin.

Microelectronic Systems Design Experiments and Results TLM model was compared with cycle accurate SystemC implementation Tested with Mediabench and CHStone Benchmark traces Speedup up to two magnitudes! 1h 41m 42s

Microelectronic Systems Design Power Modeling of 3D-DRAM with TLM2.0 2 Two parts of power consumption: 1.Background Power 2.Command Power  DRAM Power states accounted with TLM phases 2 M. Jung et al. Power Modelling of 3D-Stacked Memories with TLM2.0, SNUG 2013, Munich ACTWRPREACTRDWRPRE I t

Microelectronic Systems Design Results (Power Simulation) TLM model was compared with a cycle accurate SystemC implementation and the standalone power simulator DRAMPower 3 Tested with Mediabench and CHStone Benchmark traces Deviation max 5% to reference models 3

Microelectronic Systems Design Current Work: Thermal Simulation Co-simulation with 3D- ICE Simulator 3 Traces will be generated from GEM5 4 Closed Loop Control

Microelectronic Systems Design Conclusion 3D stacked DRAMs are the future technology Virtual platform for DSE of new multi-channel Wide I/O DRAM controllers are mandatory DRAM specific TLM protocol was introduced (can be used for any kind of DRAM) Precise Power model presented Early checkpoint for SW implementations Current and Future Work: Advanced scheduling and arbitration algorithms

Microelectronic Systems Design Thank you! Visit my Poster!