Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Slides:



Advertisements
Similar presentations
Netbus: A Transparent Mechanism for Remote Device Access in Virtualized Systems Sanjay Kumar PhD Student Advisor: Prof. Karsten Schwan.
Advertisements

Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
NERCS Users’ Group, Oct. 3, 2005 Interconnect and MPI Bill Saphir.
Advanced Virtualization Techniques for High Performance Cloud Cyberinfrastructure Andrew J. Younge Ph.D. Candidate Indiana University Advisor: Geoffrey.
Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers Ryan E. GrantAhmad Afsahi Pavan Balaji Department of Electrical and Computer Engineering,
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration Xiang Zhang 1,2, Zhigang Huo 1, Jie Ma 1, Dan Meng 1 1. National Research Center.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
ENFORCING PERFORMANCE ISOLATION ACROSS VIRTUAL MACHINES IN XEN Diwaker Gupta, Ludmila Cherkasova, Rob Gardner, Amin Vahdat Middleware '06 Proceedings of.
CON Software-Defined Networking in a Hybrid, Open Data Center Krishna Srinivasan Senior Principal Product Strategy Manager Oracle Virtual Networking.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
Virtualization for Cloud Computing
RDMA in Virtualized and Cloud Environments #OFADevWorkshop Aaron Blasius, ESXi Product Manager Bhavesh Davda, Office of CTO VMware.
Container-based OS Virtualization A Scalable, High-performance Alternative to Hypervisors Stephen Soltesz, Herbert Pötzl, Marc Fiuczynski, Andy Bavier.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
P. Balaji, S. Bhagvat, D. K. Panda, R. Thakur, and W. Gropp
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D FIS Distinguished Professor of Computer Science School of.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.
Current major high performance networking technologies InfiniBand 10G-Ethernet.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
Dynamic Time Variant Connection Management for PGAS Models on InfiniBand Abhinav Vishnu 1, Manoj Krishnan 1 and Pavan Balaji 2 1 Pacific Northwest National.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Improving Network I/O Virtualization for Cloud Computing.
An architecture for space sharing HPC and commodity workloads in the cloud Jack Lange Assistant Professor University of Pittsburgh.
© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
InfiniBand in the Lab Erik 1.
August 22, 2005Page 1 of (#) Datacenter Fabric Workshop Open MPI Overview and Current Status Tim Woodall - LANL Galen Shipman - LANL/UNM.
Multi-stack System Software Jack Lange Assistant Professor University of Pittsburgh.
© 2012 MELLANOX TECHNOLOGIES 1 Disruptive Technologies in HPC Interconnect HPC User Forum April 16, 2012.
Large Scale Parallel File System and Cluster Management ICT, CAS.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Full and Para Virtualization
1 Agility in Virtualized Utility Computing Hangwei Qian, Elliot Miller, Wei Zhang Michael Rabinovich, Craig E. Wills {EECS Department, Case Western Reserve.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
Multiplexing Endpoints of HCA for Scaling MPI Applications: Design and Performance Evaluation with uDAPL Jasjit Singh, Yogeshwar Sonawane C-DAC, Pune,
Open MPI OpenFabrics Update April 2008 Jeff Squyres.
Host Side Dynamic Reconfiguration with InfiniBand TM By Wei Lin Guay*, Sven-Arne Reinemo*, Olav Lysne*, Tor Skeie*, Bjørn Dag Johnsen^ and Line Holen^
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
May 1, 2007 Novell ® Infiniband and Virtualization VM -IB project Patrick Mullaney.
Emerging applications in cloud High performance computing E-Commerce Media hosting Web hosting Content delivery... –from Amazon AWS survey 1 Emulated network.
A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town Performance Analysis of Virtualization.
Enhancements for Voltaire’s InfiniBand simulator
Virtualization for Cloud Computing
Balazs Voneki CERN/EP/LHCb Online group
Is Virtualization ready for End-to-End Application Performance?
Presented by Yoon-Soo Lee
Group 8 Virtualization of the Cloud
Specialized Cloud Architectures
Can (HPC)Clouds supersede traditional High Performance Computing?
Presentation transcript:

Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China Bo Li, Zhigang Huo, Panyong Zhang, Dan Meng { leo, zghuo, zhangpanyong, Presenter: Xiang Zhang

Introduction Virtualization is now one of the enabling technologies of Cloud Computing Many HPC providers now use their systems as platforms for cloud/utility computing, these HPC on Demand offerings include: – Penguin's POD – IBM's Computing On Demand service – R Systems' dedicated hosting service – Amazon’s EC2

Introduction: Virtualizing HPC clouds? Pros: – good manageability – proactive fault tolerance – performance isolation – online system maintenance Cons: – Performance gap Lack low latency interconnects, which is important to tightly- coupled MPI applications VMM-bypass has been proposed to relieve the worry

Introduction: VMM-bypass I/O Virtualization Xen split device driver model only used to setup necessary user access points data communication in the critical path bypasses both the guest OS and the VMM VMM-Bypass I/O (courtesy [7])

Introduction: InfiniBand Overview InfiniBand is a popular high-speed interconnect – OS-bypass/RDMA – Latency: ~1us – BW: 3300MB/s ~41.4% of Top500 now uses InfiniBand as the primary interconnect Interconnect Family / Systems June 2010 Source:

RQ SRQ Introduction: InfiniBand Scalability Problem Reliable Connection (RC) – Queue Pair (QP), Each QP consists of SQ and RQ – QPs require memory Shared Receive Queue (SRQ) eXtensible Reliable Connection (XRC) – XRC domain & SRQ-based addressing Conns/Process: (N-1)×C Conns/Process: (N-1) SRQ5 SRQ6 SRQ7 SRQ8 N: node count C: cores per node

Problem Statement Does scalability gap exist between native and virtualized environments? – C V : cores per VM TransportQPs per ProcessQPs per Node NativeRC(N-1)×C(N-1)×C 2 XRC(N-1)(N-1)×C VMRC(N-1)×C(N-1)×C 2 XRC(N-1)×(C/C V )(N-1)×(C 2 /C V ) Scalability gap exists!

Presentation Outline Introduction Problem Statement Proposed Design Evaluation Conclusions and Future Work

Proposed Design: VM-proof XRC design Design goal is to eliminate the scalability gap – Conns/Process: (N-1)×(C/C V )  (N-1)

Proposed Design: Design Challenges VM-proof sharing of XRC domain –A single XRC domain must be shared among different VMs within a physical node VM-proof connection management –With a single XRC connection, P1 is able to send data to all the processes in another physical node (P5~P8), no matter which VMs those processes reside in

Proposed Design: Implementation VM-proof sharing of XRCD – XRCD is shared by opening the same XRCD file – guest domains and IDD have dedicated, non- shared filesystem – pseudo XRCD file and real XRCD file VM-proof CM – Traditionally IP/hostname was used to identify a node – LID of the HCA is used instead

Proposed Design: Discussions safe XRCD sharing – unauthorized applications from other VMs may share the XRCD the isolation of the sharing of XRCD could be guaranteed by the IDD – isolation between VMs running different MPI jobs By using different XRCD files, different jobs (or VMs) could share different XRCDs and run without interfering with each other XRC migration – main challenge: XRC connection is a process-to-node communication channel. Future work

Presentation Outline Introduction Problem Statement Proposed Design Evaluation Conclusions and Future Work

Evaluation: Platform Cluster Configuration: – 128-core InfiniBand Cluster – Quad Socket, Quad-Core Barcelona 1.9GHz – Mellanox DDR ConnectX HCA, 24-port MT47396 Infiniscale-III switch Implementation – Xen 3.4 with Linux – OpenFabrics Enterprise Edition (OFED) – MVAPICH-1.1.0

Evaluation: Microbenchmark The bandwidth results are nearly the same Virtualized IB performs ~0.1us worse when using blueframe mechanism. – memory copy of the sending data to the HCA's blueframe page IB verbs latency using doorbell IB verbs latency using blueframe MPI latency using blueframe Explanation: Memory copy operations under virtualized case would include interactions between the guest domain and the IDD.

Evaluation: VM-proof XRC Evaluation Configurations – Native-XRC: Native environment running XRC- based MVAPICH. – VM-XRC (C V =n): VM-based environment running unmodified XRC-based MVAPICH. The parameter C V denotes the number of cores per VM. – VM-proof XRC: VM-based environment running MVAPICH with our VM-proof XRC design.

Evaluation: Memory Usage 16 cores/node cluster fully connected – The X-axis denotes the process count – ~12KB memory for each QP 16x less memory usage – 64K processes will consume 13GB/node with the VM-XRC (C V =1) configuration – The VM-proof XRC design reduces the memory usage to only 800MB/node Better 800MB 13GB

Evaluation: MPI Alltoall Evaluation a total of 32 processes 10%~25% improvement for messages < 256B Better VM-proof XRC

Evaluation: Application Benchmarks VM-proof XRC performs nearly the same as Native- XRC – Except BT and EP Both are better than VM-XRC Better little variation for different C V values Cv=8 is an exception Memory allocation not NUMA-aware guaranteed VM-proof XRC

Evaluation: Application Benchmarks (Cont’d) Benchmark Configuration Comm. Peers Avg. QPs/Process Max QPs/Process Avg. QPs/Node FT VM-XRC (Cv=1) VM-XRC (Cv=2) VM-XRC (Cv=4) VM-XRC (Cv=8) VM-proof XRC Native-XRC IS VM-XRC (Cv=1) VM-XRC (Cv=2) VM-XRC (Cv=4) VM-XRC (Cv=8) VM-proof XRC Native-XRC ~15.9x less conns ~14.7x less conns

Conclusion and Future Work VM-proof XRC design converges two technologies – VMM-bypass I/O virtualization – eXtensible Reliable Connection in modern high speed interconnection networks (InfiniBand) the same raw performance and scalability as in native non- virtualized environment with our VM-proof XRC design – ~16x scalability improvement is seen in 16-core/node clusters Future work – evaluations on different platforms with increased scale – add VM migration support to our VM-proof XRC design – extend our work to the newly SRIOV-enabled ConnectX-2 HCAs

Questions? {leo, zghuo, zhangpanyong,

Backup Slides

OS-bypass of InfiniBand OpenIB Gen2 stack