The advances in IHEP Cloud facility

Slides:



Advertisements
Similar presentations
© 2012 IBM Corporation Architecture of Quantum Folsom Release Yong Sheng Gong ( 龚永生 ) gongysh #openstack-dev Quantum Core developer.
Advertisements

Cloud Computing Open source cloud infrastructures Keke Chen.
System Center 2012 R2 Overview
1 Security on OpenStack 11/7/2013 Brian Chong – Global Technology Strategist.
IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
“It’s going to take a month to get a proof of concept going.” “I know VMM, but don’t know how it works with SPF and the Portal” “I know Azure, but.
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Get More out of SQL Server 2012 in the Microsoft Private Cloud environment Guy BowermanMadhan Arumugam DBI208.
BESIII physical offline data analysis on virtualization platform Qiulan Huang Computing Center, IHEP,CAS CHEP 2015.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
BESIII distributed computing and VMDIRAC
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
YAN, Tian On behalf of distributed computing group Institute of High Energy Physics (IHEP), CAS, China CHEP-2015, Apr th, OIST, Okinawa.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
608D CloudStack 3.0 Omer Palo Readiness Specialist, WW Tech Support Readiness May 8, 2012.
Eucalyptus 3 (&3.1). Eucalyptus 3 Product Overview – Govind Rangasamy.
BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
NTU Cloud 2010/05/30. System Diagram Architecture Gluster File System – Provide a distributed shared file system for migration NFS – A Prototype Image.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Scaling the CERN OpenStack cloud Stefano Zilli On behalf of CERN Cloud Infrastructure Team 2.
Vignesh Ravindran Sankarbala Manoharan. Infrastructure As A Service (IAAS) is a model that is used to deliver a platform virtualization environment with.
OpenStack Chances and Practice at IHEP Haibo, Li Computing Center, the Institute of High Energy Physics, CAS, China 2012/10/15.
Building Virtual Scientific Computing Environment with Openstack Yaodong Cheng, CC-IHEP, CAS ISGC 2015.
Virtual Cluster Computing in IHEPCloud Haibo Li, Yaodong Cheng, Jingyan Shi, Tao Cui Computer Center, IHEP HEPIX Spring 2016.
SEMINAR ON.  OVERVIEW -  What is Cloud Computing???  Amazon Elastic Cloud Computing (Amazon EC2)  Amazon EC2 Core Concept  How to use Amazon EC2.
Distributed computing and Cloud Shandong University (JiNan) BESIII CGEM Cloud computing Summer School July 18~ July 23, 2016 Xiaomei Zhang 1.
PaaS services for Computing and Storage
Md Baitul Al Sadi, Isaac J. Cushman, Lei Chen, Rami J. Haddad
OpenStack.
CEPC software & computing study group report
Security on OpenStack 11/7/2013
Cloud Technology and the NGS Steve Thorn Edinburgh University (Matteo Turilli, Oxford University)‏ Presented by David Fergusson.
Deploying Web Application
AWS Integration in Distributed Computing
Elastic Computing Resource Management Based on HTCondor
Introduction to Distributed Platforms
Dag Toppe Larsen UiB/CERN CERN,
Research on an universal Openstack upgrade solution
How to connect your DG to EDGeS? Zoltán Farkas, MTA SZTAKI
Integration of Openstack Cloud Resources in BES III Computing Cluster
Dag Toppe Larsen UiB/CERN CERN,
ATLAS Cloud Operations
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Provisioning 160,000 cores with HEPCloud at SC17
BDII Performance Tests
SCD Cloud at STFC By Alexander Dibbo.
Cloud Computing Platform as a Service
Usage of Openstack Cloud Computing Architecture in COE Seowon Jung Systems Administrator, COE
Discussions on group meeting
Introduction to Cloud Computing
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
Upgrade Nebula to OpenStack Newton
Cloud Technology Group
Management of Virtual Execution Environments 3 June 2008
2017 Real Questions
Xiaomei Zhang On behalf of CEPC software & computing group Nov 6, 2017
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Outline Virtualization Cloud Computing Microsoft Azure Platform
OpenStack-alapú privát felhő üzemeltetés
* Introduction to Cloud computing * Introduction to OpenStack * OpenStack Design & Architecture * Demonstration of OpenStack Cloud.
Future Internet: Infrastructures and Services
Zhihui Sun , Fazhi Qi, Tao Cui
Managing allocatable resources
Presentation transcript:

The advances in IHEP Cloud facility Tao Cui, Yaodong Cheng, CC-IHEP, CAS IHEP, 19B Yuquan Road, Beijing, 100049 China Cuit@ihep.ac.cn

Outline Background The advances in IHEPCloud future plan Conclusion private cloud upgrade advance of Virtual Computing Cluster future plan Conclusion Is cloud suitable for scientific computing? Virtual Cluster Computing in IHEPCloud Li Haibo at Hepix 2016 Spring

The group of cloud computing founded in May 2014 The arms Computing resource virtualization and flexible virtual resource scheduling Hierarchical and heterogeneous virtual system architecture Simplified work nodes environment, supported multiple experiments and completely transparent to users Hierakikou hitre genis

IHEPCloud: Three use scenarios User self-Service virtual machine platform (IaaS) Launched in Nov 2014 User from IHEP unified authentication can launch or destroy VM on-demand Provide UI and testing vm A dynamic Virtual Computing Cluster Launched in May 2015 Job will be allocated to virtual queue automatically when physical queue is busy A distributed computing system Work as a cloud site: Dirac or other applications call cloud interface to start or stop virtual work nodes

Status of IHEP Cloud Version Openstack Kilo CPU ~1K cores ~2K cores recently Storage 180TB provided by EOS, GlusterFS 900TB recently Users ~ 200 Experiments BESIII, LHAASO, JUNO and CPEC

Private Cloud upgrade

Private Cloud old infrastructure Version: Openstack Icehouse Storage: Local Storage Network: Nuetron + OVS + 802.1Q. Authentication: user data comes from IHEP unified certification, Users and projects one by one in order to control the quota Others: modified dashoard, Limited operating privilege, only vm operation, no volume

Network of private cloud Independent network connects to computing network with l3 routing and connects to campus network with access list controlled To open rules in secgroup and vm ip address is accessible in physical network 802.1Q Computing network Openstack Cloud SW Access list Campus network

Authentication in private cloud Synchronic Openstack Ldap from IHEP unified certification database The issues User information depends on IHEP unified certification database more user info not belong cloud appeared No local management privilege Nova VM VM VM API Dashboard Keystone Ldap database IHEP unified certication

Private Cloud Upgrade item Old Cloud New Cloud Upgrade request Maintain virtual machines and users attributes Upgrade from Icehouse to Kilo Improve authentication mechanism Shared storage supported item Old Cloud New Cloud Version Openstack Icehouse Openstack Kilo Storage Local Storage for image and instance Shared storage with GlusterFS Network Neutron+OVS+802.1Q Authentication Keystone(Ldap) Agent mechanism Keystone(Mysql) Local Users table and Agent dashboard modified and IHEP unified certification supported through Ldap synchronization Modified and IHEP unified certification through local users table function Just virtual machine operation Comparing between old cloud and new cloud

Improvement of authentication in private cloud Agent running on the controller Agent responsible for replacing password and comparing user name from Mysql database white list to protect local user such as admin and services … Nova API VM VM VM Keystone Mysql database Dashboard Agent Index Name passwd reserv 1 Admin @#^$## yes 2 user1 @#%#^^ No 3 User2 %$@%$ IHEP unified certication

Openstack pgrade approach The approach Deployed a new Openstack Kilo based on SL7 Configure New environment such as image, flavor Same network information between old and new one Migrate projects/users and their attributes and dashboard modified for this changes IHEP unified certification Local database and local projects/users Migrate virtual machines based on users Migrate only increased images but change backing file location Migration

The result OS from SL6.5 to SL7 Openstack from Icehouse to Kilo To migrate 135 vms and ~100 users Shared storage with GlusterFS 45 hours cost

Advacnes of Virtual Computing Cluster ----VPManager component Is cloud suitable for scientific computing?

IHEP Cloud –Virtual Computing Cluster Launched in May 2015 based on Openstack Kilo Support BESIII, JUNO, LHAASO, CEPC,… The arms Providing virtual machine as work nodes to computing system Supporting multiple experiments and dynamic virtual resources scheduling. Completely transparent to the computing service and end users Status ~700 CPU cores 4 experiments

Infrastructure of Virtual Computing Cluster VPManager virtual resource pool manager PBS/HTCondor Dedicated SGE working physical nodes Openstack Physical machines Virtual machines Virtual resource Pool Manager RMS VPManager hypervisor hypervisor hypervisor

VPManager(Virtual resource Pool Manager) Openstack 1 VM Quota Interface (Socket) API VCondor VPBS Virtual Job Scheduler BES CEPC JUNO LHAASO Application Openstack 2 VM Node Manager Server Accounting System NETDB Get Quota Info VM Pool Create/Delete VM VM Node Agent Get VM Status, Decide to be deleted Image Mngt. NMS/DNS/…

Scheduler and architecture PBS deprecated Migrating from PBS to HTCondor Virtual Computing Cluster resources scheduler based on HTCondor Experiments BESIII JUNO LHAASO Other Scheduler Resources VM VM VM VM VM Physical machines

VPManager(Virtual resource Pool Manager) JUNO Queue Jobs scheduling resources scheduling resources Pool Physical resources HTCondor LHAASO Queue Submit Jobs BESII Queue 1 2 3 VCondor VM Quota VMCtrl 5 4 Accounting Controller VM VM VM NetDB Agent Virtual computing cluster 4 NetDB Agent

VPManager components VCondor VMQuota VMCtrl NetDB Agent A resources scheduler which Increasing or decreasing virtual resources to a queue VMQuota Responsible for resources quota control, vCondor allocates or recycles resources based on maximum or minimum resource limitation. VMCtrl To change virtual machine’s queue attribution according to vCondor’s request NetDB Agent A Collector which collects info of virtual resources A interface which provides resources status to vCondor Accounting system keeps all the usage information of each virtual machine and generate bills to user

VCondor VCondor Resource allocation as demand Based on HTCondor Queue length HTCondor Resources status Openstack Agent Resources allocation policy VMQuota Based on HTCondor Resource available VCondor policy Resource reserved lhaaso Apply resource Resource status VMQuota Openstack Job queuing juno OpenNebula VM starts or deletes … Submit job Github URL: https://github.com/hep-gnu/VCondor

VMQuota JSON VMQuota consists of a web interface and a database BES Queue Name Min Max Available Reserve time(s) BES 100 400 200 600 JUNO 300 VMQuota Client [{“ResID”:”juno”}] [{“ResID”:”juno”,”MIN”:100,”AVAILABLE”:50}] VMQuota Server VMQuota JSON Socket Request Response

VMCtrl VMCtrl function component To increase or decrease HTCondor queue’s computing resources To accept virtual machines info from NetDB Agent To accept queue modification request from Vcondor To aaccept query from virtual machines and reconfigure condor client queue name component A web service interface A mysql database Vmctrl table Vms table A linux script running on virtual machine

VCondor scheduling ~50% Computing Resource Uitlity Resource minimum limit Job queued, automatically create virtual machines Resource maximum limit Histogram

The advance of virtual computing cloud In the last six months Resources dynamic scheduling Experiment resource adjustment Virtual machines amount Hierarchical and heterogeneous system architecture a stable system maintain experience

Virtual machine performance testing BES simulation job ~2% to ~3% time penalties BES reconstruction job ~3% to 6 % time penalties CEPC job MPI Parallel Computing job Very small amount of data and network communication Running well but the performance is 50% of the physical machine

Future plans find a solution to scale the virtual computing cloud system China HEP Community Cloud Plan Sharing resource across different sites improve resource utilization Different data management solution Same storage / file system view across different sites Data access transparently in every site Streaming and cache data mechanism EOS or LEAF (an ongoing system) Leisure site Job EC2,aliyun,… Commercial cloud Unified Storage SiteA SiteB

Conclusion IHEPCloud aims at providing self-service virtual machine platform and virtual computing environment VPManager components provides a dynamic and flexible virtual resources scheduling architecture and the system work well under the architecture More resources will be added to IHEPCloud this year

Thank you! Any Questions?