Kinshuk Govil, Dan Teodosiu*, Yongqiang Huang, and Mendel Rosenblum

Slides:



Advertisements
Similar presentations
Virtual Memory (II) CSCI 444/544 Operating Systems Fall 2008.
Advertisements

Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Difference Engine: Harnessing Memory Redundancy in Virtual Machines by Diwaker Gupta et al. presented by Jonathan Berkhahn.
Disco: Running Commodity Operation Systems on Scalable Multiprocessors E Bugnion, S Devine, K Govil, M Rosenblum Computer Systems Laboratory, Stanford.
1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented.
Code Transformations to Improve Memory Parallelism Vijay S. Pai and Sarita Adve MICRO-32, 1999.
© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.
Virtualization in HPC Minesh Joshi CSC 469 Dr. Box Feb 1, 2012.
KMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments Bin Wang Zhengwei Qi Haibing Guan Haoliang Dong Wei Sun Shanghai Key Laboratory.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and.
NUMA Tuning for Java Server Applications Mustafa M. Tikir.
Multiprocessors CS 6410 Ashik Ratnani, Cornell University.
Shared Memory Multiprocessors Ravikant Dintyala. Trends Higher memory latencies Large write sharing costs Large secondary caches NUMA False sharing of.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
G Robert Grimm New York University Disco.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Computer Science Lecture 7, page 1 CS677: Distributed OS Multiprocessor Scheduling Will consider only shared memory multiprocessor Salient features: –One.
1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum, Stanford University, 1997.
EECC722 - Shaaban #1 Lec # 4 Fall Operating System Impact on SMT Architecture The work published in “An Analysis of Operating System Behavior.
1 RAKSHA: A FLEXIBLE ARCHITECTURE FOR SOFTWARE SECURITY Computer Systems Laboratory Stanford University Hari Kannan, Michael Dalton, Christos Kozyrakis.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Virtualization and Cloud Computing Research at Vasabilab Kasidit Chanchio Vasabilab Dept of Computer Science, Faculty of Science and Technology, Thammasat.
Cellular Disco: resource management using virtual clusters on shared memory multiprocessors Published in ACM 1999 by K.Govil, D. Teodosiu,Y. Huang, M.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Virtual Machine Scheduling for Parallel Soft Real-Time Applications
IISWC 2007 Panel Benchmarking in the Web 2.0 Era Prashant Shenoy UMass Amherst.
Kit Cischke 09/09/08 CS Overview  Background  What are we doing here?  A Return to Virtual Machine Monitors  What does Disco do?  Disco: A.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Profiling Memory Subsystem Performance in an Advanced POWER Virtualization Environment The prominent role of the memory hierarchy as one of the major bottlenecks.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum Summary By A. Vincent Rayappa.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
Supporting Multi-Processors Bernard Wong February 17, 2003.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
© 2012 IBM Corporation Platform Computing 1 IBM Platform Cluster Manager Data Center Operating System April 2013.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Performance Comparison Xen vs. KVM vs. Native –Benchmarks: SPEC CPU2006, SPEC JBB 2005, SPEC WEB, TPC –Case studies Design instrumentations for figure.
Computer Science Lecture 7, page 1 CS677: Distributed OS Multiprocessor Scheduling Will consider only shared memory multiprocessor Salient features: –One.
1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.
Introduction to virtualization
MEMORY RESOURCE MANAGEMENT IN VMWARE ESX SERVER 김정수
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Ian Gable HEPiX Spring 2009, Umeå 1 VM CPU Benchmarking the HEPiX Way Manfred Alef, Ian Gable FZK Karlsruhe University of Victoria May 28, 2009.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Memory Resource Management in VMware ESX Server By Carl A. Waldspurger Presented by Clyde Byrd III (some slides adapted from C. Waldspurger) EECS 582 –
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Presented by: Sagnik Bhattacharya Kingshuk Govil, Dan Teodosiu, Yongjang Huang, Mendel Rosenblum.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Cellular Disco Resource management using virtual clusters on shared-memory multiprocessors.
Multiprocessor Virtualization
Disco: Running Commodity Operating Systems on Scalable Multiprocessors
Department of Computer Science University of California, Santa Barbara
Page Replacement.
Virtual machines benefits
High Performance Computing
Virtual Memory: Working Sets
Department of Computer Science University of California, Santa Barbara
Presentation transcript:

Cellular Disco: Resource management using virtual clusters on shared-memory multiprocessors Kinshuk Govil, Dan Teodosiu*, Yongqiang Huang, and Mendel Rosenblum Computer Systems Laboratory, Stanford University * Xift, Inc., Palo Alto, CA www-flash.stanford.edu

Motivation Why buy a large shared-memory machine? Performance, flexibility, manageability, show-off These machines are not being used at their full potential Operating system scalability bottlenecks No fault containment support Lack of scalable resource management Operating systems are too large to adapt

Previous approaches Operating system: Hive, SGI IRIX 6.4, 6.5 Knowledge of application resource needs Huge implementation cost (a few million lines) Hardware: static and dynamic partitioning Cluster-like (fault containment) Inefficient, granularity, OS changes, large apps Virtual machine monitor: Disco Low implementation cost (13K lines of code) Cost of virtualization

Questions Can virtualization overhead be kept low? Usually within 10% Can fault containment overhead be kept low? In the noise Can a virtual machine monitor manage resources as well as an operating system? Yes

Overview of virtual machines IBM 1960s Trap privileged instructions Physical to machine address mapping No/minor OS modifications Virtual Machine OS App Virtual Machine App OS Virtual Machine Monitor Hardware

Avoiding OS scalability bottlenecks Virtual Machine VM Application App OS Operating System Cellular Disco CPU CPU CPU CPU CPU CPU CPU . . . Interconnect 32-processor SGI Origin 2000

Experimental setup Workloads IRIX 6.2 Cellular Disco 32P Origin 2000 vs. 32P Origin 2000 Workloads Informix TPC-D (Decision support database) Kernel build (parallel compilation of IRIX5.3) Raytrace (from Stanford Splash suite) SpecWEB (Apache web server)

MP virtualization overheads +20% +10% +1% +4% Worst case uniprocessor overhead only 9%

Fault containment VM VM VM Cellular Disco CPU Interconnect Requires hardware support as designed in FLASH multiprocessor

Fault containment overhead @ 0% +1% -2% +1% +1% 1000 fault injection experiments (SimOS): 100% success

Resource management challenges Conflicting constraints Fault containment Resource load balancing Scalability Decentralized control Migrate VMs without OS support

CPU load balancing VM VM VM VM VM VM Cellular Disco CPU Interconnect

Idle balancer (local view) Check neighboring run queues (intra-cell only) VCPU migration cost: 37µs to 1.5ms Cache and node memory affinity: > 8 ms Backoff Fast, local CPU 0 CPU 1 CPU 2 CPU 3 A0 A1 VCPUs B0 B1 B1

Periodic balancer (global view) 4 Check for disparity in load tree Cost Affinity loss Fault dependencies 1 3 1 CPU 0 CPU 1 2 CPU 2 1 CPU 3 A0 A1 B0 B1 B1 fault containment boundary

CPU management results +9% +0.3% IRIX overhead (13%) is higher

Memory load balancing VM VM VM VM Cellular Disco RAM Interconnect

Memory load balancing policy Borrow memory before running out Allocation preferences for each VM Borrow based on: Combined allocation preferences of VMs Memory availability on other cells Memory usage Loan when enough memory available

Memory management results Only +1% overhead DB DB Cellular Disco Cellular Disco 32 CPUs, 3.5GB 4 4 4 4 4 Interconnect Interconnect Ideally: same time if perfect memory balancing

Comparison to related work Operating system (IRIX6.4) Hardware partitioning Simulated by disabling inter-cell resource balancing 16 process Raytrace TPC-D Cellular Disco 8 CPUs 8 CPUs 8 CPUs 8 CPUs Interconnect

Results of comparison CPU utilization: 31% (HW) vs. 58% (VC)

Conclusions Virtual machine approach adds flexibility to system at a low development cost Virtual clusters address the needs of large shared-memory multiprocessors Avoid operating system scalability bottlenecks Support fault containment Provide scalable resource management Small overheads and low implementation cost