Achieving Isolation in Consolidated Environments Jack Lange Assistant Professor University of Pittsburgh.

Slides:



Advertisements
Similar presentations
Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)
Advertisements

Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads Jack Lange Assistant Professor University of Pittsburgh.
Virtualisation From the Bottom Up From storage to application.
SLA-Oriented Resource Provisioning for Cloud Computing
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Minimal-overhead Virtualization of a Large Scale Supercomputer John R. Lange and Kevin Pedretti, Peter Dinda, Chang Bae, Patrick Bridges, Philip Soltero,
HPMMAP: Lightweight Memory Management for Commodity Operating Systems
Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.
NoHype: Virtualized Cloud Infrastructure without the Virtualization Eric Keller, Jakub Szefer, Jennifer Rexford, Ruby Lee ISCA 2010 Princeton University.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
G Robert Grimm New York University Disco.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
Chapter 13 Embedded Systems
Virtualization for Cloud Computing
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Tanenbaum 8.3 See references
Symbiotic Virtualization John R. Lange Thesis Proposal Department of Electrical Engineering and Computer Science Northwestern University June 2009.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
SymCall: Symbiotic Virtualization Through VMM-to-Guest Upcalls John R. Lange and Peter Dinda University of Pittsburgh (CS) Northwestern University (EECS)
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Jakub Szefer, Eric Keller, Ruby B. Lee Jennifer Rexford Princeton University CCS October, 2011 報告人:張逸文.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Improving Network I/O Virtualization for Cloud Computing.
An architecture for space sharing HPC and commodity workloads in the cloud Jack Lange Assistant Professor University of Pittsburgh.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Multi-stack System Software Jack Lange Assistant Professor University of Pittsburgh.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Supporting Multi-Processors Bernard Wong February 17, 2003.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Scott Ferguson Section 1
Microsoft Virtual Server: Overview and Roadmap Mike Neil Product Unit Manager Windows Virtualization microsoft.com Microsoft Corporation.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM
The Role of Virtualization in Exascale Production Systems Jack Lange Assistant Professor University of Pittsburgh.
Introduction to virtualization
Partitioned Multistack Evironments for Exascale Systems Jack Lange Assistant Professor University of Pittsburgh.
Full and Para Virtualization
THAWAN KOOBURAT MICHAEL SWIFT UNIVERSITY OF WISCONSIN - MADISON 1 The Best of Both Worlds with On-Demand Virtualization.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
Memory Resource Management in VMware ESX Server By Carl A. Waldspurger Presented by Clyde Byrd III (some slides adapted from C. Waldspurger) EECS 582 –
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
Virtualization-optimized architectures
Virtualization for Cloud Computing
Is Virtualization ready for End-to-End Application Performance?
Current Generation Hypervisor Type 1 Type 2.
Memory COMPUTER ARCHITECTURE
Oracle Solaris Zones Study Purpose Only
Group 8 Virtualization of the Cloud
Page Replacement.
Brian Kocoloski Jack Lange Department of Computer Science
Shielding applications from an untrusted cloud with Haven
Virtualization Dr. S. R. Ahmed.
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
Efficient Migration of Large-memory VMs Using Private Virtual Memory
Presentation transcript:

Achieving Isolation in Consolidated Environments Jack Lange Assistant Professor University of Pittsburgh

Consolidated HPC Environments The future is consolidation of commodity and HPC workloads – HPC users are moving onto cloud platforms – Dedicated HPC systems are moving towards in-situ Consolidated with visualization and analytics workloads Can commodity OS/R’s effectively support HPC consolidation? – Commodity Design Goals Maximized resource utilization Fairness Graceful degradation under load

Current systems do support this, but… Interference still exists inside the system software – Inherent feature of commodity systems Cores Socket 1 Memory Cores Socket Memory HPC Partition Commodity Partition Hardware Partitioning Current approaches emphasize hardware space sharing

HPC vs. Commodity Systems Commodity systems have fundamentally different focus than HPC systems – Amdahl’s vs. Gustafson’s laws – Commodity: Optimized for common case HPC: Common case is not good enough – At large (tightly coupled) scales, percentiles lose meaning – Collective operations must wait for slowest node – 1% of nodes can make 99% suffer – HPC systems must optimize outliers (worst case)

Multi-stack Approach Dynamic Resource Partitions – Runtime segmentation of underlying hardware resources – Assigned to specific workloads Dynamic Software Isolation – Prevent interference from other workloads – Execute on separate system software stacks – Remove cross stack dependencies Implementation – Independent system software running on isolated resources

Least Isolatable Units Independently managed sets of isolated HW resources Our Approach: Decompose system into sets of isolatable components – Independent resources that do not interfere with other components Workloads execute on dedicated collections of LIUs – Units of allocation – CPU, memory, devices – Each are managed by independent system software stacks

Linux Memory Management Demand Paging – Primary goal is to optimize memory utilization – not performance – Reduce overhead of common application behavior (fork/exec) – Support many concurrent processes Large Pages – Integrated with overall demand paging architecture Implications for HPC – Insufficient resource isolation – System noise – Linux large page solutions contribute to these problems IPDPS 2014 Brian Kocoloski and Jack Lange, HPMMAP: Lightweight Memory Management for Commodity Operating Systems

Transparent Huge Pages Transparent Huge Pages (THP) – Fully automatic large page mechanism – no system administration or application cooperation – (1) Page fault handler uses large pages when possible – (2) khugepaged khugepaged – Background kernel thread – Periodically allocates a large page – “Merges” large page into address space of any process requesting THP support – Requires global page table lock – Driven by OS heuristics – no knowledge of application workload

Transparent Huge Pages Large page faults green, small faults delayed by merges blue Generally periodic, but not synchronized Variability increases dramatically under additional load

HugeTLBfs – RAM-based filesystem supporting large page allocation – Requires pre-allocated memory pools reserved by system administrator – Access generally managed through libhugetlbfs Limitations – Cannot back process stacks and other special regions – VMA permission/alignment constraints – Highly susceptible to overhead from system load

HugeTLBfs Overhead of small page faults increases substantially Due to memory exhaustion HugeTLBfs memory is removed from pools available to small page fault handler

HPMMAP Overview High Performance Memory Mapping and Allocation Platform – Lightweight memory management for unmodified Linux applications HPMMAP borrows from the Kitten LWK to impose isolated virtual and physical memory management layers Provide lightweight versions of memory management system calls Utilize Linux memory offlining to completely manage large contiguous regions Memory available in no less than 128 MB regions

HPMMAP Application Integration

Results

Evaluation – Multi-Node Scaling Sandia cluster (8 nodes, 1Gb Ethernet) – One co-located 4-core parallel kernel build per node No over-committed cores 32 rank improvement: 12% for HPCCG, 9% for miniFE, 2% for LAMMPS miniFE Network overhead past 4 cores Single node variability translates into worse scaling (3% improvement in single node experiment)

HPC in the cloud Clouds are starting to look like supercomputers… But we’re not there yet – Noise issues – Poor isolation – Resource contention – Lack of control over topology Very bad for tightly coupled parallel apps – Require specialized environments that solve these problems Approaching convergence – Vision: Dynamically partition cloud resources into HPC and commodity zones

Multi-stack Clouds With Jiannan Ouyang and Brian Kocoloski Virtualization overhead is not due to hardware costs – Results from underlying Host OS/VMM architectures and policies – Susceptible to performance overhead and Interference Goal to provide isolated HPC VMs on commodity systems – Each zone optimized for the target applications Hardware Kitten (Lightweight Kernel) Isolated VM Linux Commodity VM(s) Palacios VMMKVM

Multi-OS Architecture Goals: – Fully isolated and independent operation – OS Bypass communication – No cross kernel dependencies Needed Modifications: – Boot process that initializes subset of offline resources – Dynamic resource (re)assignment to the Kitten LWK – Cross stack shared memory communication – Block Driver Interface

Isolatable Hardware We view system resources as a collection of Isolatable Units – In terms of both Performance and Management Some hardware makes this easy – PCI (w/MSI, MSI-X) – APIC Some hardware makes this difficult – SATA – IO-APIC – IOMMU Some hardware makes this impossible – Legacy IDE – PCI (w/ Legacy PCI-INTx IRQs) Some hardware cannot be completely isolated – SRIOV PCI devices – HyperThreaded CPU cores

Cores Socket 1 Memory Cores Socket Memory Linux Offline Kitten NIC Infiniband SATA PCI

Cores Socket 1 Memory Cores Socket Memory Linux Offline Kitten NIC Infiniband SATA PCI

Cores Socket 1 Memory Cores Socket Memory Linux Offline Kitten NIC Infiniband SATA PCI

Cores Socket 1 Memory Cores Socket Memory Linux Offline Kitten NIC Infiniband SATA PCI

Cores Socket 1 Memory Cores Socket Memory Linux Offline Kitten NIC Infiniband SATA PCI

Cores Socket 1 Memory Cores Socket Memory Linux Offline Kitten NIC Infiniband SATA PCI

Cores Socket 1 Memory Cores Socket Memory Linux Offline Kitten NIC Infiniband SATA PCI

Cores Socket 1 Memory Cores Socket Memory Linux Offline Kitten NIC Infiniband SATA PCI

Multi-stack Architecture Allow multiple dynamically created enclaves – Based on runtime isolation requirements – Provides flexibility of fully independent OS/Rs Isolated Performance and resource management Linux Hardware Kitten (1) HPC VM Commodity VM(s) Palacios VMM KVM Kitten (2) HPC App Linux Hardware Palacio VMM Kitten LWK (Lightweight Kernel) HPC Application Commodity Application(s)

Performance Evaluation 8 Node Infiniband Cluster – Space shared between commodity and HPC workloads Commodity: Hadoop HPC: HPCCG – Infiniband passthrough for HPC VM – 1Gb Ethernet Passthrough for Commodity VM Compared Multi-stack (Kitten + Palacios) vs. full Linux environment (KVM) – 10 Experiment runs for each configuration CAVEAT: VM disks were all accessed from Commodity partition – Suffers significant interference (Current work) Linux + KVM Multi-stack (Kitten + Palacios) Mean Runtime (s) Std Dev

Conclusion Commodity systems are not designed to support HPC workloads – Different requirements and behaviors than commodity applications A multi stack approach can provide HPC environments in commodity systems – HPC requirements can be met without separate physical systems – HPC and commodity workloads can dynamically share resources – Isolated system software environments are necessary

Thank you Jack Lange Assistant Professor University of Pittsburgh

Multi-stack Operating Systems Future Exascale Systems are moving towards in situ organization Applications traditionally have utilized their own platforms Visualization, storage, analysis, etc Everything must now collapse onto a single platform

Performance Comparison Linux Memory ManagementLightweight Memory Management Occasional Outliers (Large page coalescing) Lowlevel noise