Ryousei Takano, Yusuke Tanimura, Akihiko Oota, Hiroki Oohashi, Keiichi Yusa, Yoshio Tanaka National Institute of Advanced Industrial Science and Technology,

Slides:



Advertisements
Similar presentations
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Advertisements

1 Applications Virtualization in VPC Nadya Williams UCSD.
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.
An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.
11 HDS TECHNOLOGY DEMONSTRATION Steve Sonnenberg May 12, 2014 © Hitachi Data Systems Corporation All Rights Reserved.
Introduction to Virtualization
Virtualization and the Cloud
Introduction to DoC Private Cloud
Virtualization for Cloud Computing
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Virtualization Performance H. Reza Taheri Senior Staff Eng. VMware.
1. Outline Introduction Virtualization Platform - Hypervisor High-level NAS Functions Applications Supported NAS models 2.
Deploying Moodle with Red Hat Enterprise Virtualization Brian McSpadden Director of Network Operations Remote-Learner.net.
Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,
Virtualization and Cloud Computing Research at Vasabilab Kasidit Chanchio Vasabilab Dept of Computer Science, Faculty of Science and Technology, Thammasat.
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 7 2/23/2015.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.
An emerging computing paradigm where data and services reside in massively scalable data centers and can be ubiquitously accessed from any connected devices.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 2.
Ceph Storage in OpenStack Part 2 openstack-ch,
Introduction to Cloud Computing
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
Improving Network I/O Virtualization for Cloud Computing.
An architecture for space sharing HPC and commodity workloads in the cloud Jack Lange Assistant Professor University of Pittsburgh.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
Challenges towards Elastic Power Management in Internet Data Center.
608D CloudStack 3.0 Omer Palo Readiness Specialist, WW Tech Support Readiness May 8, 2012.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.
Sandor Acs 05/07/
Multi-stack System Software Jack Lange Assistant Professor University of Pittsburgh.
Server Virtualization
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
EVGM081 Multi-Site Virtual Cluster: A User-Oriented, Distributed Deployment and Management Mechanism for Grid Computing Environments Takahiro Hirofuchi,
Windows Azure Virtual Machines Anton Boyko. A Continuous Offering From Private to Public Cloud.
VMware vSphere Configuration and Management v6
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
Operating-System Structures
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Operational and Application Experiences with the Infiniband Environment Sharon Brunett Caltech May 1, 2007.
Virtual Machine in HPC PAK MARKTHUB (13M54040) 1 VIRTUAL MACHINE IN HPC.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Auxiliary services Web page Secrets repository RSV Nagios Monitoring Ganglia NIS server Syslog Forward FermiCloud: A private cloud to support Fermilab.
Performance analysis comparison Andrea Chierici Virtualization tutorial Catania 1-3 dicember 2010.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town Performance Analysis of Virtualization.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Virtualization for Cloud Computing
Guide to Operating Systems, 5th Edition
Chapter 6: Securing the Cloud
Cloud Technology and the NGS Steve Thorn Edinburgh University (Matteo Turilli, Oxford University)‏ Presented by David Fergusson.
Traditional Enterprise Business Challenges
Group 8 Virtualization of the Cloud
GGF15 – Grids and Network Virtualization
System G And CHECS Cal Ribbens
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Guide to Operating Systems, 5th Edition
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Presentation transcript:

Ryousei Takano, Yusuke Tanimura, Akihiko Oota, Hiroki Oohashi, Keiichi Yusa, Yoshio Tanaka National Institute of Advanced Industrial Science and Technology, Japan ISGC 20 March AIST Super Green Cloud: Lessons Learned from the Operation and the Performance Evaluation of HPC Cloud

2 A IST S uper G reen C loud This talk is about…

Introduction HPC Cloud is promising HPC platform. Virtualization is a key technology. – Pro: a customized software environment, elasticity, etc – Con: a large overhead, spoiling I/O performance. VMM-bypass I/O technologies, e.g., PCI passthrough and SR-IOV, can significantly mitigate the overhead. “99% of HPC jobs running on US NSF computing centers fit into one rack.” -- M. Norman, UCSD Current virtualization technologies are feasible enough for supporting such a scale. 3

LINPACK on ASGC 4 Performance degradation: % Efficiency* on 128 nodes ・ Physical cluster: 90% ・ Virtual cluster: 84% *) Rmax / Rpeak IEEE CloudCom 2014

Introduction (cont’d) HPC Clouds are heading for hybrid-cloud and multi-cloud systems, where the user can execute their application anytime and anywhere he/she wants. Vision of AIST HPC Cloud: “Build once, run everywhere” AIST Super Green Cloud (ASGC): a fully virtualized HPC system 5

Outline AIST Super Green Cloud (ASGC) and HPC Cloud service Lessons learned from the first six months of operation Experiments Conclusion 6

Vision of AIST HPC Cloud “Build Once, Run Everywhere” 7 Academic Cloud Private Cloud Commercial Cloud Virtual cluster templates Deploy a Virtual Cluster Feature 1: Create a customized virtual cluster easily Feature 2: Build a virtual cluster once, and run it everywhere on clouds

Usage Model of AIST Cloud 8 Allow users to customize their virtual clusters ログインして利用 Web apps BigData HPC 1. Select a template of a virtual machine 2. Install required software package VM template files HPC + Ease of use deploy take snapshots Launch a virtual machine when necessary 3. Save a user-customized template in the repository

9 Elastic Virtual Cluster Cloud controller Login node sgc-tools Image repositor y Virtual cluster template Cloud controller Frontend node cmp node Scale in/ scale out NFSd Job scheduler Virtual Cluster on ASGC InfiniBand/Ethernet Image repositor y Import/ export Create a virtual cluster Virtual Cluster on Public Cloud Frontend node cmp node Ethernet Submit a job In operationUnder development

ASGC Hardware Spec. 10 Compute Node CPUIntel Xeon E5-2680v2/2.8GHz (10 core) x 2CPU Memory128 GB DDR InfiniBandMellanox ConnectX-3 (FDR) EthernetIntel X520-DA2 (10 GbE) DiskIntel SSD DC S GB 155 node-cluster consists of Cray H2312 blade server The theoretical peak performance is TFLOPS Network switch InfiniBandMellanox SX6025 EthernetExtreme BlackDiamond X8

ASGC Software Stack Management Stack – CentOS 6.5 (QEMU/KVM ) – Apache CloudStack our extensions PCI passthrough/SR-IOV support (KVM only) sgc-tools: Virtual cluster construction utility – RADOS cluster storage HPC Stack (Virtual Cluster) – Intel Cluster Studio SP – Mellanox OFED 2.1 – TORQUE job scheduler

Storage Architecture 12 User-attached storage x N VLAN Compute nodes Compute network (Infiniband FDR) Management network (10/1GbE) x 155 BDX8 x 155 x 5 VMDI public SW RGW RADOS VMDI cluster SW x 10 VMDI storage Data network (10GbE) NFS x 2 VM No shared storage/ filesystem VMDI (Virtual Machine Disk Image) storage – RADOS storage cluster – RADOS gateway – NFS secondary staging server User-attached storage primary storage (local SSD) secondary storage user data

Zabbix Monitoring System 13 CPU Usage Power Usage

Outline AIST Super Green Cloud (ASGC) and HPC Cloud service Lessons learned from the first six months of operation – CloudStack on a Supercomputer – Cloud Service for HPC users – Utilization Experiments Conclusion 14

Overview of ASGC Operation The operation started from July Accounts: 30+ – Main users are material scientists and genome scientists. Utilization: < 70% 95% of the total usage time is consumed for running HPC VM instances. Hardware failures: 19 (memory, M/B, power supply) 15

CloudStack on Supercomputer Supercomputer is not designed for cloud computing. – Cluster management software is troublesome. We can launch a highly productive system in a short development time by leveraging open source system software. Software maturity of CloudStack – Our storage architecture is slightly uncommon, that is we use local SSD disk as primary storage, and S3- compatible object store as secondary storage. – We discovered and resolved several serious bugs. 16

Software Maturily CloudStack IssueOur actionStatus cloudstack-agent jsvc gets too large virtual memory space PatchFixed listUsageRecords generates NullPointerExceptions for expunging instances PatchFixed Duplicate usage records when listing large number of records / Small page sizes return duplicate results BackportingFixed Public key content is overridden by template's meta data when you create a instance Bug reportFixed Migration of a VM with volumes in local storage to another host in the same cluster is failing BackportingFixed Negative ref_cnt of template(snapshot/volume)_store_ref results in out-of-range error in MySQL Patch (not merged)Fixed [S3] Parallel deployment makes reference count of a cache in NFS secondary staging store negative(-1) Patch (not merged)Unresolve d Can't create proper template from VM on S3 secondary storage environment PatchFixed Fails to attach a volume (is made from a snapshot) to a VM with using local storage as primary storage Bug reportUnresolve d 17

Cloud service for HPC users SaaS is the best if the target application is clear. IaaS is quite flexible. However, it is difficult to manage an HPC environment from scratch for application users. To bridge this gap, sgc-tools is introduced on top of an IaaS service. We believe it works well, although some minor problems are remained. To improve the ability to maintain VM template, the idea of “Infrastructure as code” can help. 18

Utilization Efficient use of limited resources is required. A virtual cluster dedicates resources whether the user fully utilizes them or not. sgc-tools do not support queuing at system-wide, therefore, the users need to check the availability. Introducing a global scheduler, e.g., Condor VM universe, can be a solution for this problem. 19

Outline AIST Super Green Cloud (ASGC) and HPC Cloud service Lessons learned from the first six months of operation Experiments – Deployment time – Performance evaluation of SR-IOV Conclusion 20

Virtual Cluster Deployment 21 Breakdown (second) Device attach (before OS boot) 90 OS boot90 FS creation (mkfs) 90 transfer from RADOS to SS transfer from SS to local node VM RADOS NFS/SS Compute node VM

Benchmark Programs Micro benchmark – Intel Micro Benchmark (IMB) version Point-to-point Collectives: Allgather, Allreduce, Alltoall, Bcast, Reduce, Barrier Application-level benchmark – LAMMPS Molecular Dynamics Simulator version 28 June 2014 EAM benchmark, 100x100x100 atoms 22

MPI Point-to-point communication GB/s 5.72GB/s 5.73GB/s IMB The overhead is less than 5% with large message, though it is up to 30% with small message.

MPI Collectives 24 Time [microsecond] BM6.87 (1.00) PCI passthrough 8.07 (1.17) SR-IOV9.36 (1.36) ReuceBcast AllreduceAllgatherAlltoall The performance of SR-IOV is comparable to that of PCI passthrough while unexpected performance degradation is often observed Barrier

LAMMPS: MD simulator EAM benchmark: – Fixed problem size (1M atoms) – #proc: VCPU pinning reduces performance fluctuation. Performance overhead of PCI passthrough and SR- IOV is about 13 %. 25

Findings The performance of SR-IOV is comparable to that of PCI passthrough while unexpected performance degradation is often observed. VCPU pinning improves the performance for HPC applications. 26

Outline AIST Super Green Cloud (ASGC) and HPC Cloud service Lessons learned from the first six months of operation Experiments Conclusion 27

Conclusion and Future work ASGC is a fully virtualized HPC system. We can launch a highly productive system in a short development time by leveraging start-of- the-art open source system software. – Extension: PCI passthrough/SR-IOV support, sgc- tools – Bug fixes… Future research direction: data movement is key. – Efficient data management and transfer methods – Federated identity management 28

Question? Thank you for your attention! 29 Acknowledgments: This work was partly supported by JSPS KAKENHI Grant Number

Motivating Observation 30 Performance evaluation of HPC cloud – (Para-)virtualized I/O incurs a large overhead. – PCI passthrough significantly mitigate the overhead. The overhead of I/O virtualization on the NAS Parallel Benchmarks class C, 64 processes. BMM: Bare Metal Machine KVM (virtio) VM1 10GbE NIC VMM Guest driver Physical driver Guest OS KVM (IB) VM1 IB QDR HCA VMM Physical driver Guest OS Bypass Improvement by PCI passthrough