Min Lee, Vishal Gupta, Karsten Schwan

Slides:



Advertisements
Similar presentations
Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.
Advertisements

Virtualization Technology
Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
1 Lecture 17: Large Cache Design Papers: Managing Distributed, Shared L2 Caches through OS-Level Page Allocation, Cho and Jin, MICRO’06 Co-Operative Caching.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.
Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,
Difference Engine: Harnessing Memory Redundancy in Virtual Machines by Diwaker Gupta et al. presented by Jonathan Berkhahn.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
Teaching Old Caches New Tricks: Predictor Virtualization Andreas Moshovos Univ. of Toronto Ioana Burcea’s Thesis work Some parts joint with Stephen Somogyi.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.
VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
KMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments Bin Wang Zhengwei Qi Haibing Guan Haoliang Dong Wei Sun Shanghai Key Laboratory.
1 Ally: OS-Transparent Packet Inspection Using Sequestered Cores Jen-Cheng Huang 1, Matteo Monchiero 2, Yoshio Turner 3, Hsien-Hsin Lee 1 1 Georgia Tech.
1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.
NoHype: Virtualized Cloud Infrastructure without the Virtualization Eric Keller, Jakub Szefer, Jennifer Rexford, Ruby Lee ISCA 2010 Princeton University.
1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
Kevin Lim*, Jichuan Chang +, Trevor Mudge*, Parthasarathy Ranganathan +, Steven K. Reinhardt* †, Thomas F. Wenisch* June 23, 2009 Disaggregated Memory.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Virtualization in Data Centers Prashant Shenoy
1 Lecture 11: Large Cache Design Topics: large cache basics and… An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches, Kim et al.,
European Organization for Nuclear Research Virtualization Review and Discussion Omer Khalid 17 th June 2010.
Memory Allocation via Graph Coloring using Scratchpad Memory
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
2017/4/21 Towards Full Virtualization of Heterogeneous Noc-based Multicore Embedded Architecture 2012 IEEE 15th International Conference on Computational.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Jakub Szefer, Eric Keller, Ruby B. Lee Jennifer Rexford Princeton University CCS October, 2011 報告人:張逸文.
KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Christoffer Dall Department of Computer Science Columbia University
Improving Network I/O Virtualization for Cloud Computing.
KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Christoffer Dall Department of Computer Science Columbia University
Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.
Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University.
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Embedded System Lab. 오명훈 Memory Resource Management in VMware ESX Server Carl A. Waldspurger VMware, Inc. Palo Alto, CA USA
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill Proceedings. The 25th Annual International Symposium.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
Presented by: Reem Alshahrani. Outlines What is Virtualization Virtual environment components Advantages Security Challenges in virtualized environments.
“Trusted Passages”: Meeting Trust Needs of Distributed Applications Mustaque Ahamad, Greg Eisenhauer, Jiantao Kong, Wenke Lee, Bryan Payne and Karsten.
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation Jason Bosko March 5 th, 2008 Based on “Managing Distributed, Shared L2 Caches through.
1 Lecture 13: Cache, TLB, VM Today: large caches, virtual memory, TLB (Sections 2.4, B.4, B.5)
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Memory System Performance in a NUMA Multicore Multiprocessor Zoltan Majo and Thomas R. Gross Department of Computer Science ETH Zurich 1.
Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Oindrila.
Full and Para Virtualization
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
Technical Reading Report Virtual Power: Coordinated Power Management in Virtualized Enterprise Environment Paper by: Ripal Nathuji & Karsten Schwan from.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
7/2/20161 Re-architecting VMMs for Multicore Systems: The Sidecore Approach Presented by: Sanjay Kumar PhD Candidate, Georgia Institute of Technology Co-Authors:
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos.
vCAT: Dynamic Cache Management using CAT Virtualization
Presented by Yoon-Soo Lee
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Comparison of the Three CPU Schedulers in Xen
Milad Hashemi, Onur Mutlu, Yale N. Patt
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
Co-designed Virtual Machines for Reliable Computer Systems
Presentation transcript:

Min Lee, Vishal Gupta, Karsten Schwan Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems Min Lee, Vishal Gupta, Karsten Schwan College of Computing Georgia Tech, Atlanta

3D-Stacked Memory Architectures for Multi-core Processors [ISCA’08]

Heterogeneous Memory Capacity constraint on die-stacked memory  Cores Stacked On-chip Memory Off-chip Memory (DDR) Capacity constraint on die-stacked memory Heterogeneous memory organization consisting of on-chip and off-chip memory

Usage Models Large Cache: hardware managed large last-level-cache (L4) Heterogeneous Memory: system-visible heterogeneous memory (our focus)

Why not H/W cache? Advantages: Quickly react to changing access patterns Supports legacy software Challenges: Overhead of tags, requiring significant on-chip resources Extending coherency protocols (design challenges and latency penalty) Diminishing returns No higher-level information

Challenges in Virtualized Systems Managing Heterogeneity: How to handle memory allocation? Access tracking: How to track VM memory activity in the hypervisor? Transparent Page migration: How to implement page migrations without guest VM involvement?

Outline Motivation Memory management mechanisms Experimental Evaluation Conclusions & Future Work

Basic idea Detect Hot pages and put them in the stacked memory Implemented in Xen hypervisor, a popular open-source hypervisor Heterogeneity management is transparent

Memory Access Tracking Use page-table access-bits to build access-bit history Scan page tables every 100 ms and maintain 32-bit access-bit history

Hot Page Management

Reverse-maps for remapping shared pages Handling Shared Pages Reverse-maps for remapping shared pages

Transparent Page Migration Mirror page tables for transparent page-remapping

Accounting in virtual-time instead of real-time for higher accuracy Virtual Time Tracking Accounting in virtual-time instead of real-time for higher accuracy 3 events: VM switch Process switch Timer tick

Memory Allocation Policies Share-based allocation: allocation stacked memory using administrator-defined weights WSS-based allocation: use dynamically determined working-set sizes as the weights for allocation

Experimental Evaluation

Emulation Platform A dual-socket Westmere platform, with memory throttling to emulate heterogeneous memory Socket 0 (8 – cores) Socket 1 (8 – cores) Throttled DRAM DRAM

Impact of Memory Throttling Emulating different memory configurations

Working-set Size Tracking Diverse WSS across applications with time-changing behavior requiring dynamic management

Memory Access Patterns Diverse memory access patterns

Significant performance impact for several applications Impact of Memory Speed Significant performance impact for several applications

Micro-benchmark Results Benchmark: Random access to large area of memory (100MB) All memory initially allocated from slow memory Significant decrease in memory access latency with page-migrations enabled

Conclusions A case for managing heterogeneous memory resources in software Hypervisor-level mechanisms for transparent memory management Future Work Various optimization to reduce page-migration costs Implementation of memory allocation policies

Thank you. Questions or comments?