Minimal-overhead Virtualization of a Large Scale Supercomputer John R. Lange and Kevin Pedretti, Peter Dinda, Chang Bae, Patrick Bridges, Philip Soltero,

Slides:



Advertisements
Similar presentations
1 Towards Virtual Passthrough I/O on Commodity Devices Lei Xia, Jack Lange, Peter Dinda {lxia, jarusl, Department of Electrical.
Advertisements

Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads Jack Lange Assistant Professor University of Pittsburgh.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
 You have exascale problems? ◦ Load Balancing? ◦ Failure? ◦ Power Management?  My system software will solve these problems.
Advanced Virtualization Techniques for High Performance Cloud Cyberinfrastructure Andrew J. Younge Ph.D. Candidate Indiana University Advisor: Geoffrey.
Palacios and Kitten: New High Performance Operating Systems For Scalable Virtualized and Native Supercomputing John R. Lange and Kevin Pedretti Trammell.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,
GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.
HPMMAP: Lightweight Memory Management for Commodity Operating Systems
G Robert Grimm New York University Disco.
Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Automatic Compaction of OS Kernel Code via On-Demand Code Loading Haifeng He, Saumya Debray, Gregory Andrews The University of Arizona.
KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Fall 2014 Presented By: Probir Roy.
Operating System Support for Virtual Machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
虛擬化技術 Virtualization and Virtual Machines
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
1 Some Context for This Session…  Performance historically a concern for virtualized applications  By 2009, VMware (through vSphere) and hardware vendors.
Tanenbaum 8.3 See references
Symbiotic Virtualization John R. Lange Thesis Proposal Department of Electrical Engineering and Computer Science Northwestern University June 2009.
SymCall: Symbiotic Virtualization Through VMM-to-Guest Upcalls John R. Lange and Peter Dinda University of Pittsburgh (CS) Northwestern University (EECS)
Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Achieving Isolation in Consolidated Environments Jack Lange Assistant Professor University of Pittsburgh.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Virtualization Concepts Presented by: Mariano Diaz.
Operating System Support for Virtual Machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
An architecture for space sharing HPC and commodity workloads in the cloud Jack Lange Assistant Professor University of Pittsburgh.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
Virtual Machine Monitors: Technology and Trends Jonathan Kaldor CS614 / F07.
Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted.
CS533 Concepts of Operating Systems Jonathan Walpole.
Introduction to the Palacios VMM and the V3Vee Project John R. Lange Department of Computer Science University of Pittsburgh September 28th, 2010.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Multi-stack System Software Jack Lange Assistant Professor University of Pittsburgh.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
Architecting a Symbiotic Virtual Machine Monitor for Scalable High Performance Computing John R. Lange Department of Electrical Engineering and Computer.
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
The Role of Virtualization in Exascale Production Systems Jack Lange Assistant Professor University of Pittsburgh.
Introduction to virtualization
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Understanding Virtualization Overhead.
Agile Paging: Exceeding the Best of Nested and Shadow Paging
Kernel-based Virtual Machine (KVM) Memory Management Unit (MMU) Virtualization Mohammad H. Mofrad February 11, 2016
Virtualization-optimized architectures
Introduction to Virtualization
Virtual Machine Monitors
Is Virtualization ready for End-to-End Application Performance?
Xen and the Art of Virtualization
Presented by Yoon-Soo Lee
Lecture 24 Virtual Machine Monitors
Morgan Kaufmann Publishers
Virtualization, Empathic Systems, and Sensors
OS Virtualization.
Department of Computer Science University of California, Santa Barbara
Operating System Support for Virtual Machines
Brian Kocoloski Jack Lange Department of Computer Science
Department of Computer Science University of California, Santa Barbara
System Virtualization
Presentation transcript:

Minimal-overhead Virtualization of a Large Scale Supercomputer John R. Lange and Kevin Pedretti, Peter Dinda, Chang Bae, Patrick Bridges, Philip Soltero, Alexander Merritt University of Pittsburgh Northwestern University Sandia National Labs University of New Mexico

2 Summary Palacios – First VMM for scalable HPC – Open Source and available Kitten – First open source Lightweight Kernel for High Performance Computing (HPC) – Open Source and available Palacios: A New Open Source Virtual Machine Monitor for Scalable High Performance Computing, Lange, et al (IPDPS 2010) HPC virtualization at scale – Performance within 3% of native – Large scale study of virtualization (4096 nodes)

Outline Palacios and Kitten – VMM/OS for HPC virtualization Large scale test – Parallel apps running on supercomputer Minimal overhead techniques – Passthrough I/O – Virtual Paging – Controlled Preemption

4 Virtualization in HPC Virtualization benefits applied to HPC – Fault tolerance – Broader usage for legacy applications – Testbeds for future exascale systems DOE X-Stack project to deploy virtualization on future exascale systems – UNM, NWU, Pitt, SNL, ORNL Only if it doesn’t degrade performance… – Tightly coupled parallel applications – petascale and soon exascale

5 Palacios VMM OS-independent embeddable virtual machine monitor Open source and freely available Virtualization layer for Kitten – Lightweight supercomputing OS from Sandia National Labs Successfully used on supercomputers, clusters (Infiniband and Ethernet), and servers

6 Kitten: An Open Source LWK Better match for user expectations – Provides mostly Linux-compatible user environment Including threading – Supports unmodified compiler toolchains and ELF executables Better match vendor expectations – Modern code-base with familiar Linux-like organization Drop-in compatible with Linux – Infiniband support

7 HPC Performance Evaluation Virtualization is useful for HPC, but… Only if it doesn’t hurt performance Virtualized RedStorm with Palacios – Evaluated with Sandia’s system evaluation benchmarks Cray XT cores ~3500 sq ft 2.5 MegaWatts $90 million

8 Scalability at Large Scale (Weak Scaling) Catamount Guest OS CTH: multi-material, large deformation, strong shockwave simulation Within 3% Scalable

Minimal Overhead Virtualization Passthrough I/O – Direct I/O access with no virtualization overheads Optimized virtual paging – Nested and shadow paging optimizations Controlled Preemption – Host OS noise minimization – Characterizing application sensitivity to OS interference using kernel- level noise injection, Ferreira, et al (Supercomputing 2008)

Passthrough I/O I/O virtualization significantly degrades performance Mitigated by hardware support – SRIOV/IOMMUs In HPC we can do better – Passthrough I/O without any translation overhead

Passthrough I/O architecture Host Memory Guest Memory PCI DEV Guest Offset DMA_Address = Guest_DMA_Address + Guest_Offset if (DMA_Address > (guest_memory_size + Guest_Offset)) { //error }

Trust HPC environments run trusted software stacks – Can rely on guest/VMM cooperation Guest directly controls DMA operations – But sets DMA addresses cooperatively with VMM – The VMM trusts the guest to do DMA correctly DMA address calculations are centralized in guest OS – Linux DMA modifications: 20 lines of code

13 Infiniband on Commodity Linux 2 node Infiniband Ping Pong bandwidth measurement (Linux guest on IB cluster)

Polling Interrupt Overheads MPI Ping-Pong Latency Interrupt Driven

15 Virtualized Paging CatamountCompute Node Linux HPCCG: conjugant gradient solver Shadow Paging Lange, et al (IPDPS 2010)

Virtual Paging mechanisms Nested Paging No paging exits More TLB misses Good: – Concentrated access patterns Bad – Random access patterns Shadow Paging More paging exits Better TLB behavior Good – Infrequent page table modifications Bad – Frequent context switches

Improving Nested Paging Palacios + Kitten makes large pages trivial Palacios preallocates guest in contiguous host memory – Kitten ensures large page alignment Stream Random Access

Selective Virtual Paging Nested paging does better… – But shadow paging still performs better with 4KB guest pages Still need to selectively choose paging approach Stream Random Access

Controlled Preemption OS noise generates a large performance penalty at scale – Timers, competing kernel threads, etc – 2.5% overhead leads to order of magnitude application performance drop Ferreira et al, Supercomputing, 2008 Palacios/Kitten allow per guest control over scheduling – VM only yields when appropriate 10x reduction in host overhead compared to minimal configuration of KVM/Linux

Summary Virtualization can scale – Near native performance for optimized VMM/guest VMM and guests need to cooperate – Bidirectional information sharing is necessary Symbiotic Virtualization – A virtual machine interface designed for guest/VMM cooperation – 2 components Guest OS provides internal state to VMM Guest OS services requests from VMM – Interfaces are optional

Conclusion Palacios: V3VEE Project: Kitten:

22 Symbiotic Virtualization in HPC HPC environments are well suited to symbiotic techniques Full trust of the software stack – Fewer security concerns Specific hardware configurations – Limited number of devices Environments are much smaller – Internal OS state is simpler than a general purpose OS At large scale performance impact is dramatic – Large impetus to optimize VMM and OS

23 Summary Virtualization can scale – Near native performance for optimized VMM/guest VMM needs to know about guest internals – Should modify behavior for each guest environment – Example: Paging method to use depends on guest Black Box inference is not desirable in HPC environment – Unacceptable performance overhead – Convergence time – Mistakes have large consequences Need guest cooperation – Guest and VMM relationship should be symbiotic

24 Summary Black Box inference is not desirable in HPC environment – Unacceptable performance overhead – Convergence time – Mistakes have large consequences Need guest cooperation – Guest and VMM relationship should be symbiotic