Kevin Lim*, Jichuan Chang +, Trevor Mudge*, Parthasarathy Ranganathan +, Steven K. Reinhardt* †, Thomas F. Wenisch* June 23, 2009 Disaggregated Memory.

Slides:



Advertisements
Similar presentations
IBM Software Group ® Integrated Server and Virtual Storage Management an IT Optimization Infrastructure Solution from IBM Small and Medium Business Software.
Advertisements

Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
55:035 Computer Architecture and Organization Lecture 7 155:035 Computer Architecture and Organization.
Nikos Hardavellas, Northwestern University
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
KMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments Bin Wang Zhengwei Qi Haibing Guan Haoliang Dong Wei Sun Shanghai Key Laboratory.
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.
11 HDS TECHNOLOGY DEMONSTRATION Steve Sonnenberg May 12, 2014 © Hitachi Data Systems Corporation All Rights Reserved.
Memory Buddies: Exploiting Page Sharing for Smart Colocation in Virtualized Data Centers Timothy Wood, Gabriel Tarasuk-Levin, Prashant Shenoy, Peter Desnoyers*,
A Case for Virtualizing Nodes on Network Experimentation Testbeds Konrad Lorincz Harvard University June 1, 2015June 1, 2015June 1, 2015.
Introduction to Virtualization
NoHype: Virtualized Cloud Infrastructure without the Virtualization Eric Keller, Jakub Szefer, Jennifer Rexford, Ruby Lee ISCA 2010 Princeton University.
Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna.
Microsoft Virtual Server 2005 Product Overview Mikael Nyström – TrueSec AB MVP Windows Server – Setup/Deployment Mikael Nyström – TrueSec AB MVP Windows.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Virtualization in Data Centers Prashant Shenoy
The SNIA NVM Programming Model
CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.
Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Revisiting Memory Errors in Large-Scale Production Data Centers Analysis and Modeling of New Trends from.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
1 Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Manycore Network Interfaces for In-Memory Rack-Scale Computing Alexandros Daglis, Stanko Novakovic, Edouard Bugnion, Babak Falsafi, Boris Grot.
Compressed Memory Hierarchy Dongrui SHE Jianhua HUI.
Revisiting Network Interface Cards as First-Class Citizens Wu-chun Feng (Virginia Tech) Pavan Balaji (Argonne National Lab) Ajeet Singh (Virginia Tech)
NVM Programming Model. 2 Emerging Persistent Memory Technologies Phase change memory Heat changes memory cells between crystalline and amorphous states.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
NVSleep: Using Non-Volatile Memory to Enable Fast Sleep/Wakeup of Idle Cores Xiang Pan and Radu Teodorescu Computer Architecture Research Lab
Virtualization Lab 3 – Virtualization Fall 2012 CSCI 6303 Principles of I.T.
Defining Anomalous Behavior for Phase Change Memory
SAIGONTECH COPPERATIVE EDUCATION NETWORKING Spring 2010 Seminar #1 VIRTUALIZATION EVERYWHERE.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
CS533 Concepts of Operating Systems Jonathan Walpole.
Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University.
Module – 4 Intelligent storage system
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
IO Memory Management Hardware Goes Mainstream
Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.
Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015.
High-Performance Computing An Applications Perspective REACH-IIT Kanpur 10 th Oct
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Server Virtualization
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM
Min Lee, Vishal Gupta, Karsten Schwan
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Full and Para Virtualization
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Memory Resource Management in VMware ESX Server By Carl A. Waldspurger Presented by Clyde Byrd III (some slides adapted from C. Waldspurger) EECS 582 –
1 Implementing a Virtualized Dynamic Data Center Solution Jim Sweeney, Principal Solutions Architect, GTSI.
sRoute: Treating the Storage Stack Like a Network
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
Dynamic Memory and Remote Fx Perumal Raja Dell India R & D Centre.
Agile Paging: Exceeding the Best of Nested and Shadow Paging
Network Requirements for Resource Disaggregation
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA
Internet and Web Simple client-server model
Presentation transcript:

Kevin Lim*, Jichuan Chang +, Trevor Mudge*, Parthasarathy Ranganathan +, Steven K. Reinhardt* †, Thomas F. Wenisch* June 23, 2009 Disaggregated Memory for Expansion and Sharing in Blade Servers * University of Michigan + HP Labs † AMD

Motivation: The memory capacity wall  Memory capacity per core drop ~30% every 2 years Capacity Wall 2

Opportunity: Optimizing for the ensemble  Dynamic provisioning across ensemble enables cost & power savings Intra-server variation (TPC-H, log scale)Inter-server variation (rendering farm) Time 3

Contributions Goal: Expand capacity & provision for typical usage New architectural building block: memory blade −Breaks traditional compute-memory co-location Two architectures for transparent mem. expansion Capacity expansion: −8x performance over provisioning for median usage −Higher consolidation Capacity sharing: −Lower power and costs −Better performance / dollar 4

Outline Introduction Disaggregated memory architecture −Concept −Challenges −Architecture Methodology and results Conclusion 5

Disaggregated memory concept  Break CPU-memory co-location Leverage fast, shared communication fabrics Memory blade Blade systems with disaggregated memory CPUs DIMM CPUs DIMM CPUs DIMM CPUs DIMM Backplane 6 DIMM Conventional blade systems

What are the challenges? Transparent expansion to app., OS −Solution 1: Leverage coherency −Solution 2: Leverage hypervisor Commodity-based hardware Match right-sized, conventional systems −Performance −Cost Backplane Memory blade Compute Blade OS App Software Stack Hypervisor CPUs DIMM 7

General memory blade design Memory blade (enlarged) Backplane Protocol engine Memory controller Address mapping Cost: Leverage sweet- spot of RAM pricing Other optimizations Transparency: Enforces allocation, isolation, and mapping Cost: Handles dynamic memory partitioning DIMM  Design driven by key challenges CPUs DIMM CPUs DIMM CPUs DIMM CPUs DIMM Perf.: Accessed as memory, not swap spaceCommodity: Connected via PCIe or HT 8

Fine-grained remote access ( FGRA ) Memory blade Compute Blade OS App Software Stack DIMM Backplane Connected via coherent fabric to memory blade (e.g., HyperTransport™) Add minor hardware: Coherence Filter Filters unnecessary traffic Memory blade doesn’t need all coherence traffic! On access: Data transferred at cache- block granularity CPUs  Extends coherency domain HyperTransport CF 9

Page-swapping remote memory ( PS ) Hypervisor Leverage existing remapping between OS and hypervisor Performance dominated by transfer latency; insensitive to small changes  Use indirection from hypervisor Memory blade OS App DIMM Backplane CPUs PCI Express On access: Local data page swapped with remote data page Connected via commodity fabric to memory blade (PCI Express) On access: Data transferred at page (4KB) granularity Bridge 10 Compute BladeSoftware Stack

Summary: Addressing the challenges FGRAPS Transparent expansion Extends coherencyHypervisor indirection Commodity HWHyperTransportPCI Express High performanceDirect accessLeverage locality Cost comparableShared memory blade infrastructure Right-provisioned memory 11

Outline Introduction Disaggregated memory architecture Methodology and results −Performance −Performance-per-cost Conclusion 12

Methodology Trace-based −Memory traces from detailed simulation Web 2.0, compute-intensive, server −Utilization traces from live data centers Animation, VM Consolidation, Web 2.0 Two baseline memory sizes −M-max Sized to largest workload −M-median Sized to median of workloads Simulator parameters Remote DRAM120 ns, 6.4 GB/s PCIe120 ns, 1 GB/s HyperTransport60 ns, 4 GB/s 13

Baseline: M-median local + disk Performance 8X 2X Footprint > M-median Performance 8X higher, close to ideal  FGRA slower on these memory intensive workloads  Locality is most important to performance 14

Performance / Cost 1.3X 1.4X Footprint > M-median Baseline: M-max local + disk  PS able to provide consistently high performance / $  M-median has significant drop-off on large workloads 15

Conclusions Motivation: Impending memory capacity wall Opportunity: Optimizing for the ensemble Solution: Memory disaggregation −Transparent, commodity HW, high perf., low cost −Dedicated memory blade for expansion, sharing − PS and FGRA provide transparent support Please see paper for more details! 16

Thank you! Any questions? 17