A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town Performance Analysis of Virtualization.

Slides:

Advertisements

Similar presentations

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Advertisements

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

Head-to-TOE Evaluation of High Performance Sockets over Protocol Offload Engines P. Balaji ¥ W. Feng α Q. Gao ¥ R. Noronha ¥ W. Yu ¥ D. K. Panda ¥ ¥ Network.

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula,

Advanced Virtualization Techniques for High Performance Cloud Cyberinfrastructure Andrew J. Younge Ph.D. Candidate Indiana University Advisor: Geoffrey.

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers Ryan E. GrantAhmad Afsahi Pavan Balaji Department of Electrical and Computer Engineering,

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.

Beowulf Supercomputer System Lee, Jung won CS843.

Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.

Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.

Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.

Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.

A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.

Efficient Cloud Computing Through Scalable Networking Solutions.

1 Performance Evaluation of Gigabit Ethernet & Myrinet

1 AppliedMicro X-Gene ® ARM Processors Optimized Scale-Out Solutions for Supercomputing.

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 

P. Balaji, S. Bhagvat, D. K. Panda, R. Thakur, and W. Gropp

CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.

Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

1 Some Context for This Session…  Performance historically a concern for virtualized applications  By 2009, VMware (through vSphere) and hardware vendors.

© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.

Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.

Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.

Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.

Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.

Ishikawa, The University of Tokyo1 GridMPI ： Grid Enabled MPI Yutaka Ishikawa University of Tokyo and AIST.

Reliable Datagram Sockets and InfiniBand Hanan Hit NoCOUG Staff 2010.

1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.

Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.

Emalayan Vairavanathan

HPCS Lab. High Throughput, Low latency and Reliable Remote File Access Hiroki Ohtsuji and Osamu Tatebe University of Tsukuba, Japan / JST CREST.

Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.

An architecture for space sharing HPC and commodity workloads in the cloud Jack Lange Assistant Professor University of Pittsburgh.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.

TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.

Gilad Shainer, VP of Marketing Dec 2013 Interconnect Your Future.

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.

HPCVL High Performance Computing Virtual Laboratory Founded 1998 as a joint HPC lab between –Carleton U. (Comp. Sci.) –Queen’s U. (Engineering) –U. of.

IM&T Vacation Program Benjamin Meyer Virtualisation and Hyper-Threading in Scientific Computing.

InfiniBand in the Lab Erik 1.

© 2012 MELLANOX TECHNOLOGIES 1 Disruptive Technologies in HPC Interconnect HPC User Forum April 16, 2012.

Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.

2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.

2007/11/2 First French-Japanese PAAP Workshop 1 The FFTE Library and the HPC Challenge (HPCC) Benchmark Suite Daisuke Takahashi Center for Computational.

Sockets Direct Protocol Over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda.

Yang Yu, Tianyang Lei, Haibo Chen, Binyu Zang Fudan University, China Shanghai Jiao Tong University, China Institute of Parallel and Distributed Systems.

Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.

Rick Claus Sr. Technical Evangelist,

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.

Use case of RDMA in Symantec storage software stack Om Prakash Agarwal Symantec.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.

Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.

© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Understanding Virtualization Overhead.

Ryousei Takano, Yusuke Tanimura, Akihiko Oota, Hiroki Oohashi, Keiichi Yusa, Yoshio Tanaka National Institute of Advanced Industrial Science and Technology,

REMINDER Check in on the COLLABORATE mobile app Best Practices for Oracle on VMware - Deep Dive Darryl Smith Chief Database Architect Distinguished Engineer.

HPC need and potential of ANSYS CFD and mechanical products at CERN A. Rakai EN-CV-PJ2 5/4/2016.

Balazs Voneki CERN/EP/LHCb Online group

Joint Techs Workshop InfiniBand Now and Tomorrow

System G And CHECS Cal Ribbens

Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.

BlueGene/L Supercomputer

Cluster Computers.

Efficient Migration of Large-memory VMs Using Private Virtual Memory

Presentation transcript:

A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town Performance Analysis of Virtualization for High Performance Computing

Matthew Cawood (UCT) Overview 1.Background 2.Research Objectives 3.HPC 4.Virtualization 5.Performance Tuning 6.The Research Cluster 7.Benchmark Selection 8.Results 9.Conclusions

Matthew Cawood (UCT) 1. Background BSc (Eng) final year research project Based in CHPC’s Advanced Computer Engineering (ACE) Lab Access to research cluster currently being commissioned Project focused on evaluating cluster hardware and software

Matthew Cawood (UCT) 2. Research Objectives 1.Present an in-depth report on the current technologies being developed in the field of High Performance Computing. 2.Provide a quantitative performance analysis of the costs associated with Virtualization, specifically in the field of HPC.

Matthew Cawood (UCT) 3. High Performance Computing HPC data centres are rapidly growing in size and complexity Current emphasis placed on improving efficiency and utilization Wide selection of applications/requirements Bioinformatics Astrophysics Simulation Modelling

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 4. Virtualization

Matthew Cawood (UCT) 5. Performance Optimizations Host memory reservation of Linux huge pages KVM vCPU pinning to improve NUMA cell awareness

Matthew Cawood (UCT)

6. The Research Cluster Compute Nodes: 2x Intel Xeon E5-2690, 20MB L3 cache, 2.90 GHz 256GB, DDR3-1600, CL11 Mellanox ConnectX-3 VPI FDR 56Gbps HCA Gigabit Ethernet NIC Switch Infrastructure: Mellanox SX6036 FDR 36 port Infiniband Switch

Matthew Cawood (UCT) 6. The Research Cluster CentOS 6.4 OFED 2.0 (with SR-IOV) OpenNebula 4.2

Matthew Cawood (UCT) 7. Performance Benchmarks HPC Challenge HPLinpack MPI Random Access STREAM Effective bandwidth & latency OpenFOAM 7 million cell, 5 millisecond transient simulation snappyHexMesh

Matthew Cawood (UCT) 8. Results

Matthew Cawood (UCT) 8.1 Software Comparison HPLinpack throughput comparison of compiler selection

Matthew Cawood (UCT) 8.2 Single Node Evaluation HPLinpack throughput efficiency of virtual machines MPI Random Access Performance STREAM Memory Bandwidth

Matthew Cawood (UCT) 8.3 Cluster Evaluation HPLinpack throughput efficiency of virtual machines

Matthew Cawood (UCT) 8.3 Cluster Evaluation OpenFOAM runtime efficiency of virtual machines

Matthew Cawood (UCT) 8.4 Interconnect Evaluation Typical Verbs Latency of virtual machinesTypical IPoIB Latency of virtual machines Native Verbs Vs. IP over Infiniband

Matthew Cawood (UCT) 8.5 Supplementary Tests Intel ® Hyper-threading HPLinpack throughput

Matthew Cawood (UCT) 9. Conclusions KVM provides good performance for HPC Tuning is necessary to further improve performance Efficiency is highly application dependant SR-IOV for Infiniband effectively reduced I/O Virtualization overheads Synthetic and real-world results often contradict

Matthew Cawood (UCT) Questions ?