© 2011 VMware Inc. All rights reserved HPC Cloud Bad; HPC in the Cloud Good Josh Simons, Office of the CTO, VMware, Inc. IPDPS 2013 Cambridge, Massachusetts.

Slides:



Advertisements
Similar presentations
虛擬化技術 Virtualization Techniques
Advertisements

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
European Organization for Nuclear Research Virtualization Review and Discussion Omer Khalid 17 th June 2010.
Virtualization for Cloud Computing
RDMA in Virtualized and Cloud Environments #OFADevWorkshop Aaron Blasius, ESXi Product Manager Bhavesh Davda, Office of CTO VMware.
Virtualization Infrastructure Administration Cluster Jakub Yaghob.
Virtualization Performance H. Reza Taheri Senior Staff Eng. VMware.
VMware vSphere 4 Introduction. Agenda VMware vSphere Virtualization Technology vMotion Storage vMotion Snapshot High Availability DRS Resource Pools Monitoring.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Mike Neil General Manager Microsoft Corporation. “Longhorn” RTM Virtualization “Viridian” RTM.
1 Some Context for This Session…  Performance historically a concern for virtualized applications  By 2009, VMware (through vSphere) and hardware vendors.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
Tanenbaum 8.3 See references
Data Center Network Redesign using SDN
© 2010 IBM Corporation Cloudy with a chance of security Information security in virtual environments Johan Celis Security Solutions Architect EMEA IBM.
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
Making the Virtualization Decision. Agenda The Virtualization Umbrella Server Virtualization Architectures The Players Getting Started.
The Era of the Cloud OS: Transform the Datacentre
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 7 2/23/2015.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
SAIGONTECH COPPERATIVE EDUCATION NETWORKING Spring 2010 Seminar #1 VIRTUALIZATION EVERYWHERE.
SAIGONTECH COPPERATIVE EDUCATION NETWORKING Spring 2009 Seminar #1 VIRTUALIZATION EVERYWHERE.
Introduction to VMware Virtualization
© 2010 VMware Inc. All rights reserved From Datacenter to Device: Security in the Enterprise 2012 and Beyond Dr. Stephen Herrod, CTO February 27, 2012.
Microsoft Virtual Academy. 2 Competitive Advantages I - Core VirtualizationII - Private Cloud.
Workload Optimized Processor
An architecture for space sharing HPC and commodity workloads in the cloud Jack Lange Assistant Professor University of Pittsburgh.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
COMS E Cloud Computing and Data Center Networking Sambit Sahu
CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.
© 2012 MELLANOX TECHNOLOGIES 1 Disruptive Technologies in HPC Interconnect HPC User Forum April 16, 2012.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
IT Pro Day Windows Server 2012 Hyper-V – The next chapter Michel Luescher, Senior Consultant Microsoft Thomas Roettinger, Program Manager Microsoft.
Hyper-V Performance, Scale & Architecture Changes Benjamin Armstrong Senior Program Manager Lead Microsoft Corporation VIR413.
VMware vSphere Configuration and Management v6
Introduction to virtualization
Rick Claus Sr. Technical Evangelist,
Full and Para Virtualization
iSER update 2014 OFA Developer Workshop Eyal Salomon
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
© 2015 VMware Inc. All rights reserved. Software-Defined Data Center Module 2.
1 Best Practices for Monitoring Databases on VMware Dean Richards Senior DBA, Confio Software.
Module Objectives At the end of the module, you will be able to:
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
IT Pro Day Windows Server 2012 Hyper-V – The next chapter Michel Luescher, Senior Consultant Microsoft Thomas Roettinger, Program Manager Microsoft.
What is Flexpod? Flexpod is a reference architecture for server, storage and networking components that are pretested and validated to work together as.
1 SQL Server on VMware? Rob Mandeville Senior DBA, Confio Software 1 Virtualizing Our Environment: Lessons Learned Rob Mandeville.
A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town Performance Analysis of Virtualization.
Open Source Virtualization Andrey Meganov RHCA, RHCX Consultant / VDEL
Virtualization for Cloud Computing
Delivering on the Promise of a Virtualized Dynamic Data Center
Organizations Are Embracing New Opportunities
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Virtualization Dr. Michael L. Collard
Prepared by: Assistant prof. Aslamzai
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
Sebastian Solbach Consulting Member of Technical Staff
Virtualization overview
Group 8 Virtualization of the Cloud
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Presentation transcript:

© 2011 VMware Inc. All rights reserved HPC Cloud Bad; HPC in the Cloud Good Josh Simons, Office of the CTO, VMware, Inc. IPDPS 2013 Cambridge, Massachusetts

2 Post-Beowulf Status Quo Enterprise IT HPC IT

3 Closer to True Scale (NASA)

4 Converging Landscape Enterprise IT HPC IT Convergence driven by increasingly shared concerns, e.g.: Scale-out management Power & cooling costs Dynamic resource mgmt Desire for high utilization Parallelization for multicore Big Data Analytics Application resiliency Low latency interconnect Cloud computing

5 Agenda  HPC and Public Cloud Limitations of the current approach  Cloud HPC Performance Throughput Big Data / Hadoop MPI / RDMA  HPC in the Cloud A more promising model

6 Hardware Application Operating System With VirtualizationWithout Virtualization Server Virtualization Hardware virtualization presents a complete x86 platform to the virtual machine Allows multiple applications to run in isolation within virtual machines on the same physical machine Virtualization provides direct access to the hardware resources to give you much greater performance than software emulation

7 HPC Performance in the Cloud

8 Biosequence Analysis: BLAST C. Macdonell and P. Lu, "Pragmatics of Virtual Machines for High-Performance Computing: A Quantitative Study of Basic Overheads, " in Proc. of the High Perf. Computing & Simulation Conf., 2007.

9 Biosequence Analysis: HMMer

10 Molecular Dynamics: GROMACS

11 EDA Workload Example operating system app hardware OS app OS app OS app OS app virtualization layer hardware Virtual 6% slower Virtual 2% faster

12 Memory Virtualization physical virtual machine EPT = Intel Extended Page Tables = hardware page table virtualization = AMD RVI

13 vNUMA ESXi hypervisor Application socket M M M M

14 vNUMA Performance Study Performance Evaluation of HPC Benchmarks on VMware’s ESX Server, Ali Q., Kiriansky, V., Simons J., Zaroo, P., 5 th Workshop on System-level Virtualization for High Performance Computing, 2011

15 Compute: GPGPU Experiment  General Purpose (GP) computation with GPUs  CUDA benchmarks  VM Direct Path I/O  Small kernels: DSP, financial, bioinformatics, fluid dynamics, image processing  RHEL 6  nVidia (Quadro 4000) and AMD GPUs  Generally 98%+ of native performance (worst case was 85%)  Currently looking at larger-scale financial and bioinformatics applications

16 MapReduce Architecture HDFS Reduce HDFS

17 vHadoop Approaches  Why virtualize Hadoop? Simplified Hadoop cluster configuration and provisioning Support Hadoop usage in existing virtualized datacenters Support multi-tenant environments Project Serengeti Node HDFS M M M R R R R R R M VM Node Data Node Compute Node R R M M CN R R

18 vHadoop Benchmarking Collaboration with AMAX  Seven-node Hadoop cluster (AMAX ClusterMax)  Standard tests: PI, DFSIO, Teragen / Terasort  Configurations: Native One VM per host Two VMs per host  Details: Two-socket Intel X5650, 96 GB, Mellanox 10 GbE, 12x 7200rpm SATA RHEL 6.1, 6- or 12-vCPU VMs, vmxnet3 Cloudera CDH3U0, replication=2, max 40 map and 10 reduce tasks per host Each physical host considered a “rack” in Hadoop’s topology description ESXi 5.0 w/dev Mellanox driver, disks passed to VMs via raw disk mapping (RDM)

19 Benchmarks  Pi Direct-exec Monte-Carlo estimation of pi # map tasks = # logical processors 1.68 T samples  TestDFSIO Streaming write and read 1 TB More tasks than processors  Terasort 3 phases: teragen, terasort, teravalidate 10B or 35B records, each 100 Bytes (1 TB, 3.5 TB) More tasks than processors CPU, networking, and storage I/O  ~ 4*R/(R+G) = 22/7

20 Ratio to Native, Lower is Better A Benchmarking Case Study of Virtualized Hadoop Performance on VMware vSphere 5

21 kernel Kernel Bypass Model driver tcp/ip sockets hardware application rdma guest kernel driver tcp/ip sockets vmkernel application hardware user rdma

22 Virtual Infrastructure RDMA  Distributed services within the platform, e.g. vMotion (live migration) Inter-VM state mirroring for fault tolerance Virtually shared, DAS-based storage fabric  All would benefit from: Decreased latency Increased bandwidth CPU offload

23 vMotion/RDMA Performance VMware Total vMotion Time (sec) Pre-copy bandwidth (Pages/sec) Destination CPU UtilizationSource CPU Utilization Time (s)

24 Guest OS RDMA  RDMA access from within a virtual machine  Scale-out middleware and applications increasingly important in the Enterprise memcached, redis, Cassandra, mongoDB, … GemFire Data Fabric, Oracle RAC, IBM pureScale, …  Big Data an important emerging workload Hadoop, Hive, Pig, etc.  And, increasingly, HPC

25 SR-IOV VirtualFunction VM DirectPath I/O  Single-Root IO Virtualization (SR-IOV): PCI-SIG standard  Physical (IB/RoCE/iWARP) HCA can be shared between VMs or by the ESXi hypervisor Virtual Functions direct assigned to VMs Physical Function controlled by hypervisor  Still VM DirectPath, which is incompatible with several important virtualization features VMware RDMA HCA VF Driver I/O MMU PF Device Driver VF PF SR-IOV RDMA HCA Guest OS RDMA HCA VF Driver Guest OS Virtualization Layer OFED Stack RDMA HCA VF Driver

26 Paravirtual RDMA HCA (vRDMA) offered to VM  New paravirtualized device exposed to Virtual Machine Implements “Verbs” interface  Device emulated in ESXi hypervisor Translates Verbs from Guest to Verbs to ESXi “OFED Stack” Guest physical memory regions mapped to ESXi and passed down to physical RDMA HCA Zero-copy DMA directly from/to guest physical memory Completions/interrupts “proxied” by emulation  “Holy Grail” of RDMA options for vSphere VMs vRDMA HCA Device Driver Physical RDMA HCA Device Driver Physical RDMA HCA vRDMA Device Emulation Guest OS OFED Stack ESXi “OFED Stack” I/O Stack

27 InfiniBand Bandwidth with VM DirectPath I/O RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vSphere 5, April

28 Latency with VM DirectPath I/O (RDMA Read, Polling) MsgSize (bytes)NativeESXi ExpA

29 Latency with VM DirectPath I/O (Send/Receive, Polling) MsgSize (bytes)NativeESXi ExpA

30 Intel 2009 Experiments  Hardware Eight two-socket 2.93GHz X5570 (Nehalem-EP) nodes, 24 GB Dual-ported Mellanox DDR InfiniBand adaptor Mellanox 36-port switch  Software vSphere 4.0 (current version is 5.1) Platform Open Cluster Stack (OCS) 5 (native and guest) Intel compilers 11.1 HPCC STAR-CD V _x86

31 HPCC Virtual to Native Run-time Ratios (Lower is Better) Data courtesy of: Marco Righini Intel Italy

32 Point-to-point Message Size Distribution: STAR-CD Source:

33 Collective Message Size Distribution: STAR-CD Source:

34 STAR-CD Virtual to Native Run-time Ratios (Lower is Better) Data courtesy of Marco Righini, Intel Italy

35 Software Defined Networking (SDN) Enables Network Virtualization Networking Telephony Identifier = Location Identifier = Location Wireless Telephony VXLAN

36 Data Center Networks – Traffic Trends WAN/Internet NORTH / SOUTH EAST / WEST

37 Data Center Networks – the Trend to Fabrics WAN/Internet

38 Network Virtualization and RDMA  SDN Decouple logical network from physical hardware Encapsulate Ethernet in IP → more layers Flexibility and agility are primary goals  RDMA Directly access physical hardware Map hardware directly into userspace → fewer layers Performance is primary goal  Is there any hope of combining the two? Converged datacenter supporting both SDN management and decoupling along with RDMA 38

39 VMware vCloud API Users IT Research Group 1Research Group m Public Clouds Programmatic Control and Integrations User Portals Security VMware vShield Research Cluster 1Research Cluster n VMware vCloud Director VMware vCenter Server VMware vSphere Catalogs VMware vCenter Server VMware vCenter Server Secure Private Cloud for HPC

40 Massive Consolidation

41 Run Any Software Stacks App A OS A App B OS B virtualization layer hardware virtualization layer hardware virtualization layer hardware Support groups with disparate software requirements Including root access

42 Separate workloads virtualization layer hardware virtualization layer hardware virtualization layer hardware Secure multi-tenancy Fault isolation …and sometimes performance App A OS A App B OS B

43 Live Virtual Machine Migration (vMotion)

44 Use Resources More Efficiently App A OS A App B OS B virtualization layer hardware virtualization layer hardware virtualization layer hardware App A OS A App C OS B App C OS A Avoid killing or pausing jobs Increase overall throughput

45 Workload Agility hardware operating system app virtualization layer hardware virtualization layer hardware app

46 Multi-tenancy with resource guarantees App A OS A App B OS B virtualization layer hardware virtualization layer hardware virtualization layer hardware App A OS A App C OS B App C OS A Define policies to manage resource sharing between groups App A OS A App B OS B

47 Protect Applications from Hardware Failures virtualization layer hardware virtualization layer hardware virtualization layer hardware Reactive Fault Tolerance: “Fail and Recover” App A OS App A OS

48 Protect Applications from Hardware Failures virtualization layer hardware virtualization layer hardware virtualization layer hardware MPI-0 OS MPI-1 OS MPI-2 OS Proactive Fault Tolerance: “Move and Continue”

49 Unification of IT Infrastructure

50 HPC in the (Mainstream) Cloud Throughput MPI / RDMA

51 Summary  HPC Performance in the Cloud Throughput applications perform very well in virtual environments MPI / RDMA applications will experience small to very significant slowdowns in virtual environments, depending on scale and message traffic characteristics  Enterprise and HPC IT requirements are converging Though less so with HEC (e.g. Exascale)  Vendor and community investments in Enterprise solutions eclipse those made in HPC due to market size differences The HPC community can benefit significantly from adopting Enterprise-capable IT solutions And working to influence Enterprise solutions to more fully address HPC requirements  Private and community cloud deployments provide significantly more value than cloud bursting from physical infrastructure to public cloud