Introduction to the Palacios VMM and the V3Vee Project John R. Lange Department of Computer Science University of Pittsburgh September 28th, 2010.

Slides:

Advertisements

Similar presentations

1 Towards Virtual Passthrough I/O on Commodity Devices Lei Xia, Jack Lange, Peter Dinda {lxia, jarusl, Department of Electrical.

Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads Jack Lange Assistant Professor University of Pittsburgh.

Virtualisation From the Bottom Up From storage to application.

Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.

Palacios and Kitten: New High Performance Operating Systems For Scalable Virtualized and Native Supercomputing John R. Lange and Kevin Pedretti Trammell.

Minimal-overhead Virtualization of a Large Scale Supercomputer John R. Lange and Kevin Pedretti, Peter Dinda, Chang Bae, Patrick Bridges, Philip Soltero,

Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,

HPMMAP: Lightweight Memory Management for Commodity Operating Systems

Virtual Machine Security Design of Secure Operating Systems Summer 2012 Presented By: Musaad Alzahrani.

Towards Virtual Networks for Virtual Machine Grid Computing Ananth I. Sundararaj Peter A. Dinda Prescience Lab Department of Computer Science Northwestern.

Increasing Application Performance In Virtual Environments Through Run-time Inference and Adaptation Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience.

G Robert Grimm New York University Disco.

Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)

The Whats and Whys of Whole System Virtualization Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University

Dynamic Topology Adaptation of Virtual Networks of Virtual Machines Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience Lab Department of Computer.

Operating System Support for Virtual Machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.

Virtualization for Cloud Computing

Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,

Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield.

Virtualization Technology Prof D M Dhamdhere CSE Department IIT Bombay Moving towards Virtualization… Department of Computer Science and Engineering, IIT.

Symbiotic Virtualization John R. Lange Thesis Proposal Department of Electrical Engineering and Computer Science Northwestern University June 2009.

SymCall: Symbiotic Virtualization Through VMM-to-Guest Upcalls John R. Lange and Peter Dinda University of Pittsburgh (CS) Northwestern University (EECS)

Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.

Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.

Microkernels, virtualization, exokernels Tutorial 1 – CSC469.

Jakub Szefer, Eric Keller, Ruby B. Lee Jennifer Rexford Princeton University CCS October, 2011 報告人：張逸文.

Achieving Isolation in Consolidated Environments Jack Lange Assistant Professor University of Pittsburgh.

CS533 Concepts of Operating Systems Jonathan Walpole.

Operating System Support for Virtual Machines Samuel T. King, George W. Dunlap,Peter M.Chen Presented By, Rajesh 1 References [1] Virtual Machines: Supporting.

Virtualization Concepts Presented by: Mariano Diaz.

Operating System Support for Virtual Machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.

Secure & flexible monitoring of virtual machine University of Mazandran Science & Tecnology By : Esmaill Khanlarpour January.

Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.

Xen I/O Overview.

An architecture for space sharing HPC and commodity workloads in the cloud Jack Lange Assistant Professor University of Pittsburgh.

Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.

Virtual Machine Monitors: Technology and Trends Jonathan Kaldor CS614 / F07.

Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.

High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.

Multi-stack System Software Jack Lange Assistant Professor University of Pittsburgh.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

Architecting a Symbiotic Virtual Machine Monitor for Scalable High Performance Computing John R. Lange Department of Electrical Engineering and Computer.

Cloud Operating System Unit 09 Cloud OS Core Technology M. C. Chiang Department of Computer Science and Engineering National Sun Yat-sen University Kaohsiung,

The Role of Virtualization in Exascale Production Systems Jack Lange Assistant Professor University of Pittsburgh.

Introduction to virtualization

A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.

Full and Para Virtualization

Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.

CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,

Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.

Virtualization Neependra Khare

Introduction to Operating Systems Concepts

Virtualization.

Virtual Machine Monitors

Self Healing and Dynamic Construction Framework:

Virtualization overview

Group 8 Virtualization of the Cloud

Virtualization, Empathic Systems, and Sensors

Introduction to Operating Systems

Virtualization Techniques

Operating System Support for Virtual Machines

Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience Lab

Brian Kocoloski Jack Lange Department of Computer Science

The Design & Implementation of Hyperupcalls

CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors

CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors

System Virtualization

Presentation transcript:

Introduction to the Palacios VMM and the V3Vee Project John R. Lange Department of Computer Science University of Pittsburgh September 28th, 2010

2 Outline V3Vee Project –Multi-institutional collaboration Palacios –Developed first VMM for scalable High Performance Computing (HPC) Largest scale study of virtualization –Proved HPC virtualization is effective at scale Current Research –Symbiotic Virtualization –Cloud HPC integration –New system architectures

Virtuoso Project ( ) virtuoso.cs.northestern.edu “Infrastructure as a Service” distributed/grid/cloud virtual computing system –Particularly for HPC and multi-VM scalable apps First adaptive virtual computing system –Drives virtualization mechanisms to increase the performance of existing, unmodified apps running in collections of VMs –Focus on wide-area computation 3 R. Figueiredo, P. Dinda, J. Fortes, A Case for Grid Computing on Virtual Machines, Proceedings of the 23rd International Conference on Distributed Computing (ICDCS 2003), May, 2003.

4 Virtuoso: Adaptive Virtual Computing Providers sell computational and communication bandwidth Users run collections of virtual machines (VMs) that are interconnected by overlay networks Replacement for buying machines That continuously adapts… to increase the performance of your existing, unmodified applications and operating systems See virtuoso.cs.northwestern.edu for many papers, talks, and movies See virtuoso.cs.northwestern.edu for many papers, talks, and movies

V3VEE Overview Goal : Create an original open source virtual machine monitor (VMM) framework for x86/x64 that permits per use composition of VMMs optimized for… –High Performance Computing –Computer architecture research –Experimental computer systems research –Teaching Community resource development project (NSF CRI) X-Stack exascale systems software research (DOE) 5

Collaborators 6

7 University of Pittsburgh –Jack Lange, and more… Northwestern University –Jack Lange, Peter Dinda, Russ Joseph, Lei Xia, Chang Bae, Giang Hoang, Fabian Bustamante, Steven Jaconette, Andy Gocke, Mat Wojick, Peter Kamm, Robert Deloatch, Yuan Tang, Steve Chen, Brad Weinberger Mahdav Suresh, and more… University of New Mexico –Patrick Bridges, Zheng Cui, Philip Soltero, Nathan Graham, Patrick Widener, and more… Sandia National Labs –Kevin Pedretti, Trammell Hudson, Ron Brightwell, and more… Oak Ridge National Lab –Stephen Scott, Geoffroy Vallee, and more… People

Virtual Machine Monitor Research in Computer Architecture Research and Use In High Performance Computing Teaching Open Source Community broad user and developer base Why a New VMM? Well-served Research in Systems, Cloud, and Datacenters Commercial Space Cloud / Datacenter

Virtual Machine Monitor Research in Systems, Cloud, and Datacenters Research in Computer Architecture Research and Use In High Performance Computing Teaching Open Source Community broad user and developer base Why a New VMM? Commercial Space Cloud / Datacenter Ill-served

Why a New VMM? Code Scale –Compact codebase by leveraging hardware virtualization assistance Code decoupling –Make virtualization support self-contained –Linux/Windows kernel experience unneeded Make it possible for researchers and students to come up to speed quickly Freedom: BSD license Composition –Enable raw performance at large scales 10

11 Palacios VMM OS-independent embeddable virtual machine monitor Developed at Northwestern and University of New Mexico –And now University of Pittsburgh Open source and freely available Users: –Kitten: Lightweight supercomputing OS from Sandia National Labs –MINIX 3 –Modified Linux Successfully used on supercomputers, clusters (Infiniband and Ethernet), and servers

Palacios compiles to a static library Compact host OS interface allows Palacios to be embedded in different Host OSes –Palacios adds virtualization support to an OS –Palacios + lightweight kernel = traditional “type-I” “hypervisor” VMM Current embeddings –Kitten, GeekOS, Minix 3, Linux 12 Embeddability

13 Kitten: An Open Source LWK Better match for user expectations –Provides mostly Linux-compatible user environment Including threading –Supports unmodified compiler toolchains and ELF executables Better match vendor expectations –Modern code-base with familiar Linux-like organization Drop-in compatible with Linux –Infiniband support End-goal is deployment on future capability system

14 Palacios as an HPC VMM Minimalist interface –Suitable for an LWK Compile and runtime configurability –Create a VMM tailored to specific environments Low noise Contiguous memory pre-allocation Passthrough resources and resource partitioning

15 A Compact Type-I VMM Xen: 580k lines (50k – 80k core) KVM: 50k-60k lines + Kernel dependencies (??) + User level devices (180k)

16 HPC Performance Evaluation Virtualization is very useful for HPC, but… Only if it doesn’t hurt performance Virtualized RedStorm with Palacios –Evaluated with Sandia’s system evaluation benchmarks 17 th fastest supercomputer Cray XT cores ~3500 sq ft 2.5 MegaWatts $90 million

17 Scalability at Small Scales (Catamount) Within 5% Scalable HPCCG: conjugant gradient solver

18 Large Scale Study Evaluation on full RedStorm system –12 hours of dedicated system time on full machine –Largest virtualization performance scaling study to date Measured performance at exponentially increasing scales –Up to 4096 nodes Publicity –New York Times –Slashdot –HPCWire –Communications of the ACM –PC World

19 Scalability at Large Scale (Catamount) CTH: multi-material, large deformation, strong shockwave simulation Within 3% Scalable

20 Infiniband on Commodity Linux 2 node Infiniband Ping Pong bandwidth measurement (Linux guest on IB cluster)

21 Symbiotic Virtualization Virtualization can scale –Near native performance for optimized VMM/guest (within 5%) VMM needs to know about guest internals –Should modify behavior for each guest environment Symbiotic Virtualization –Design both guest OS and VMM to minimize semantic gap –Bidirectional synchronous and asynchronous communication channels –Interfaces are optional and can be dynamically added by VMM

22 Symbiotic Interfaces SymSpy Passive Interface –Internal state already exists but it is hidden –Asynchronous bi-directional communication via shared memory –Structured state information that is easily parsed Semantically rich SymCall Functional Interface –Synchronous upcalls into guest during exit handling –API Function call in VMM System call in Guest –Brand new interface construct SymMod Code Injection Guest OS might lack functionality VMM loads code directly into guest –Extend guest OS with arbitrary interfaces/functionality

VNET Overlay Networking for HPC Virtuoso project built “VNET/U” –User-level implementation –Overlay appears to host to be simple L2 network (Ethernet) –Ethernet in UDP encapsulation (among others) –Global control over topology and routing –Works with any VMM –Plenty fast for WAN (200 Mbps), but not HPC V3VEE project is building “VNET/P” –VMM-level implementation in Palacios –Goal: Near-native performance on Clusters and Supers for at least 10 Gbps –Also: easily migrate existing apps/VMs to specialized interconnect networks 23

Current VNET/P Node Architecture 24 Service VM Application VM

Conclusion Palacios: original open-source VMM for modern architectures –Small and Compact Easy to understand and extend/modify –Framework for exploring alternative VMM architectures 25 V3VEE Project Collaborators Welcome!

Backup Slides 26

27 Current Research Virtualization can scale –Near native performance for optimized VMM/guest (within 5%) VMM needs to know about guest internals –Should modify behavior for each guest environment –Example: Paging method to use depends on guest Black Box inference is not desirable in HPC environment –Unacceptable performance overhead –Convergence time –Mistakes have large consequences Need guest cooperation –Guest and VMM relationship should be symbiotic (Thesis)

28 Semantic Gap VMM architectures are designed as black boxes –Explicit lowlevel OS interface (hardware or paravirtual) –Internal OS state is not exposed to the VMM Many uses for internal state –Performance, security, etc... –VMM must recreate that state “Bridging the Semantic Gap” –[Chen: HotOS 2001] Two existing approaches: Black Box and Gray Box –Black Box: Monitor external guest interactions –Gray Box: Reverse engineer internal guest state –Examples Virtuoso Project (Early graduate work) Lycosid, Antfarm, Geiger, IBMon, many others

29 Example: Swapping Physical Memory Swapped Memory Application Memory Working Set Swap Disk Disk storage for expanding physical memory Only basic knowledge without internal state Guest VMM

30 SymSpy Shared memory page between OS and VMM –Global and per-core interfaces Standardized data structures –Shared state information Read and write without exits

31 SymCall (Symbiotic Upcalls) Conceptually similar to System Calls –System Calls: Application requests OS services –Symbiotic Upcalls: VMM requests OS services Designed to be architecturally similar –Virtual hardware interface Superset of System Call MSRs –Internal OS implementation Share same system call data structures and basic operations Guest OS configures a special execution context –VMM instantiates that context to execute synchronous upcall –Symcalls exit via a dedicated hypercall

32 SymCall Control Flow Handle exit Running in guest Return to VMM Nested Exits

33 Implementation Symbiotic Linux guest OS –Exports SymSpy and SymCall interfaces Palacios –Fairly significant modifications to enable nested VM entries Re-entrant exit handlers Serialize subset of guest state out of global hardware structures

34 SwapBypass Purpose: improve performance when swapping –Temporarily expand guest memory –Completely bypass the Linux swap subsystem Enabled by SymCall –Not feasible without symbiotic interfaces VMM detects guest thrashing –Shadow page tables used to prevent it

35 Symbiotic Shadow Page tables Guest Page TablesShadow Page Tables Page Directory Page Table Physical Memory Page Directory Page Table Physical Memory Swap Disk Cache Swapped out pageSwap Bypass Page

36 Guest 3 Global Swap Disk Cache Guest 1 SwapBypass Concept Guest Physical Memory Swapped Memory Swap Disk Application Working Set VMM physical Memory Swap Disk Guest 2 Guest 3

37 Necessary SymCall: query_vaddr() 1.Get current process ID –get_current(); (Internal Linux API) 2.Determine presence in Linux swap cache –find_get_page(); (Internal Linux API) 3.Determine page permissions for virtual address –find_vma(); (Internal Linux API) Information extremely hard to get otherwise Must be collected while exit is being handled

38 Evaluation Memory system micro-benchmarks –Stream, GUPS, ECT Memperf –Configured to over commit anonymous memory Cause thrashing in the guest OS Overhead isolated to swap subsystem –Ideal swap device implemented as RAM disk I/O occurs at main memory speeds Provides lower bound for performance gains

39 Bypassing Swap Overhead Working set size Ideal I/O improvement Stream: simple vector kernel Performance improves

40 Access standard OS driver API –Dependent on internal OS implementation OS Drivers/Modules

41 Standard Symbiotic Modules Guest exposes standard symbiotic API via SymSpy

42 Secure Symbiotic Modules Symbiotic API, protected from guest –Secure initialization –Virtual memory overlay