Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007.

Slides:



Advertisements
Similar presentations
Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors Mohammad Hammoud, Sangyeun Cho, and Rami Melhem Presenter: Socrates Demetriades.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar Electrical & Computer Engineering ISCA 2010.
Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE.
EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
OS Fall’02 Virtual Memory Operating Systems Fall 2002.
Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin and Sangyeun Cho Dept. of Computer Science University.
Hardware Support for Spin Management in Overcommitted Virtual Machines Philip Wells Koushik Chakraborty Gurindar Sohi {pwells, kchak,
(C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project
Paging and Virtual Memory. Memory management: Review  Fixed partitioning, dynamic partitioning  Problems Internal/external fragmentation A process can.
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
Handling the Problems and Opportunities Posed by Multiple On-Chip Memory Controllers Manu Awasthi, David Nellans, Kshitij Sudan, Rajeev Balasubramonian,
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.
Interactions Between Compression and Prefetching in Chip Multiprocessors Alaa R. Alameldeen* David A. Wood Intel CorporationUniversity of Wisconsin-Madison.
Tanenbaum 8.3 See references
Shuchang Shan † ‡, Yu Hu †, Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.
(C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.
Cooperative Caching for Chip Multiprocessors Jichuan Chang Guri Sohi University of Wisconsin-Madison ISCA-33, June 2006.
Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University.
Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University.
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy Jason Zebchuk, Elham Safi, and Andreas Moshovos
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Virtualization Part 2 – VMware. Virtualization 2 CS5204 – Operating Systems VMware: binary translation Hypervisor VMM Base Functionality (e.g. scheduling)
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Revisiting Hardware-Assisted Page Walks for Virtualized Systems
Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Rerun: Exploiting Episodes for Lightweight Memory Race Recording Derek R. Hower and Mark D. Hill Computer systems complex – more so with multicore What.
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
Cache Coherence Techniques for Multicore Processors Dissertation Defense Mike Marty 12/19/2007.
1 Lecture 13: Cache, TLB, VM Today: large caches, virtual memory, TLB (Sections 2.4, B.4, B.5)
1 CACM July 2012 Talk: Mark D. Hill, Cornell University, 10/2012.
Min Lee, Vishal Gupta, Karsten Schwan
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation Sangyeun Cho and Lei Jin Dept. of Computer Science University of Pittsburgh.
CMP L2 Cache Management Presented by: Yang Liu CPS221 Spring 2008 Based on: Optimizing Replication, Communication, and Capacity Allocation in CMPs, Z.
Optimizing Replication, Communication, and Capacity Allocation in CMPs Z. Chishti, M. D. Powell, and T. N. Vijaykumar Presented by: Siddhesh Mhambrey Published.
컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
Timestamp snooping: an approach for extending SMPs Milo M. K. Martin et al. Summary by Yitao Duan 3/22/2002.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
Reducing Memory Interference in Multicore Systems
Container-based Operating System Virtualization: A scalable, High-performance Alternative to Hypervisors Stephen Soltesz, Herbert Potzl, Marc E. Fiuczynski,
ASR: Adaptive Selective Replication for CMP Caches
Lecture: Large Caches, Virtual Memory
Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA
Cache Memory Presentation I
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
Lecture: Large Caches, Virtual Memory
Lecture 13: Large Cache Design I
OS Virtualization.
Address Translation for Manycore Systems
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Lecture 2: Snooping-Based Coherence
Preventing Performance Degradation on Operating System Reboots
Improving Multiple-CMP Systems with Token Coherence
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
LogTM-SE: Decoupling Hardware Transactional Memory from Caches
High Performance Computing
Presentation transcript:

Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation 64-core CMP Motivation: Server Consolidation www server database server #1 database server #2 middleware server #1 Core L2 Cache L1

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation 64-core CMP Motivation: Server Consolidation www server database server #1 database server #2 middleware server #1

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation 64-core CMP Motivation: Server Consolidation www server database server #1 database server #2 middleware server #1 data Optimize Performance

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation 64-core CMP Motivation: Server Consolidation www server database server #1 database server #2 middleware server #1 Isolate Performance

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation 64-core CMP Motivation: Server Consolidation www server database server #1 database server #2 middleware server #1 Dynamic Partitioning

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation 64-core CMP Motivation: Server Consolidation www server database server #1 database server #2 middleware server #1 data Inter-VM Sharing VMWare’s Content-based Page Sharing  Up to 60% reduced memory

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Executive Summary Motivation: Server Consolidation Many-core CMPs increase opportunities Goals of Memory System: Performance Performance Isolation between VMs Dynamic Partitioning (VM Reassignment) Support Inter-VM Sharing Hypervisor/OS Simplicity Proposed Solution: Virtual Hierarchy Overlay 2-level hierarchy on physically-flat CMP Harmonize with VM Assignment

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Outline Motivation Server consolidation Memory system goals Non-hierarchical approaches Virtual Hierarchies Evaluation Conclusion

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation T AG -D IRECTORY duplicate tag directory A Read A

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation duplicate tag directory T AG -D IRECTORY A getM A 1 fwd data 3 duplicate tag directory 2 Read A A

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation S TATIC -B ANK -D IRECTORY getM A 1 2 fwd data 3 A A Read A

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation S TATIC -B ANK -D IRECTORY getM A 1 2 fwd data 3 A A Read A with hypervisor-managed cache

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Goals Optimize Performance Isolate Performance Allow Dynamic Partitioning Support Inter-VM Sharing Hypervisor/OS Simplicity Yes ? Yes No Yes S TATIC -B ANK -D IRECTORY T AG -D IRECTORY S TATIC -B ANK -D IRECTORY w/ hypervisor-managed cache No Yes

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Outline Motivation Virtual Hierarchies Evaluation Conclusion

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Virtual Hierarchies Key Idea: Overlay 2-level Coherence Hierarchy on CMP - First level harmonizes with VM/Workload - Second level allows inter-VM sharing, migration, reconfig

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation VH: First-Level Protocol Intra-VM Directory Protocol w/ interleaved directories Questions: How to name directories? How to name sharers? Dynamic home tile selected by VM Config Table Hardware VM Config Table at each tile Set by hypervisor during scheduling Full bit-vector to track any possible sharer Intra-VM broadcast also possible INV getM

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation VH: First-Level Protocol Example: Hypervisor/OS can freely change VM Config Table No cache flushes No atomic updates No explicit movement of directory state Address …… Home Tile: p14 offset 6 VM Config Table p12 p13 p p12 p13 p p13 p12 p14 Core L2 Cache L1 per-Tile Dynamic

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Virtual Hierarchies Two Solutions for Global Coherence: VH A and VH B memory controller(s)

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Protocol VH A Directory as Second-level Protocol Any tile can act as first-level directory How to track and name first-level directories? Full bit-vector of sharers to name any tile State stored in DRAM Possibly cache on-chip + Maximum flexibility - DRAM State - Complexity

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation VH A Example directory/memory controller getM A 1 2 data 6 A A Fwd data 4 3 getM A 5 Fwd data A

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Protocol VH B Broadcast as Second-level Protocol Attach token count for each block [token coherence] T tokens for each block. One token to read, all to write Allows 1-bit at memory per block Eliminates system-wide ACK + Minimal DRAM State + Enables easier optimizations - Global coherence requires more activity

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation VH B Example memory controller getM A global getM A getM A Data+tokens 4 A A

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Memory System Goals Optimize Performance Isolate Performance Allow Dynamic Partitioning Support Inter-VM Sharing Hypervisor/OS Simplicity VH A and VH B Yes

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Outline Motivation Virtual Hierarchies Evaluation Conclusion

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Evaluation: Methods Wisconsin GEMS Full-system, execution-driven simulation Based on Virtutech Simics 64-core tiled CMP In-order SPARC cores 512 KB, 16-way L2 cache per tile 2D mesh interconnect, 16-byte links, 5-cycle link latency Four on-chip memory controllers, 275-cycle DRAM latency

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Evaluation: Methods Workloads: OLTP, SpecJBB, Apache, Zeus Separate instance of Solaris for each VM Approximating Virtualization Multiple Simics checkpoints interleaved onto CMP Assume workloads map to adjacent cores Bottomline: No hypervisor simulated

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Evaluation: Protocols T AG -D IRECTORY : 3-cycle central tag directory (1024 ways!) S TATIC -B ANK -D IRECTORY Home tiles interleaved by frame address VH A All data allocates in L2 bank of dynamic home tile VH B Unshared data always allocates in local L2 All Protocols: one L2 copy of block on CMP

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Normalized Runtime Result: Runtime OLTP Apache T AG -D IR S TATIC -B ANK -D IR VH A VH B ZeusSpecJBB T AG -D IR S TATIC -B ANK -D IR VH A VH B T AG -D IR S TATIC -B ANK -D IR VH A VH B T AG -D IR S TATIC -B ANK -D IR VH A VH B Eight VMs x Eight Cores Each = 64 Cores (e.g. eight instances of Apache)

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Result: Memory Stall Cycles OLTPApache T AG -D IR S TATIC -B ANK -D IR VH A VH B Zeus SpecJBB T AG -D IR S TATIC -B ANK -D IR VH A VH B T AG -D IR S TATIC -B ANK -D IR VH A VH B T AG -D IR S TATIC -B ANK -D IR VH A VH B Eight VMs x Eight Cores Each = 64 cores (e.g. eight instances of Apache)

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Executive Summary Server Consolidation an Emerging Workload Goals of Memory System: Performance Performance Isolation between VMs Dynamic Partitioning (VM Reassignment) Support Inter-VM Sharing Hypervisor/OS Simplicity Proposed Solution: Virtual Hierarchy Overlay 2-level hierarchy on physically-flat CMP Harmonize with Workload Assignment

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Backup Slides

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Omitting 2 nd -Level Coherence: Protocol VH 0 Impacts: Dynamic Partitioning Inter-VM Sharing (VMWare’s Content-based Page Sharing) Hypervisor/OS complexity Example: Steps for VM Migration from Tiles {M} to {N} 1.Stop all threads on {M} 2.Flush {M} caches 3.Update {N} VM Config Tables 4.Start threads on {N}

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Omitting 2 nd -Level Coherence: Protocol VH 0 Example: Inter-VM Content-based Page Sharing Up to 60% reduced memory demand Is read-only sharing possible with VH 0 ? VMWare’s Implementation: Global hash table to store hashes of pages Guest pages scanned by VMM, hashes computed Full comparison of pages on hash match Potential VH 0 Implementation: How does hypervisor scan guest pages? Are they modified in cache? Even read-only pages must initially be written at some point

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Physical Hierarchy / Clusters P L1 $ P P P P P P P Shared L2 P L1 $ Shared L2 P L1 $ P P P P P P

Mike Marty, University of Wisconsin Virtual Hierarchies to Support Server Consolidation Physical Hierarchy / Clusters P L1 $ P P P P P P P Shared L2 P L1 $ Shared L2 P L1 $ P P P P P P www server database server #1 middleware server #1