Download presentation
Presentation is loading. Please wait.
Published bySuzan Townsend Modified over 8 years ago
1
Building Virtual Scientific Computing Environment with Openstack Yaodong Cheng, CC-IHEP, CAS chyd@ihep.ac.cn ISGC 2015
2
2/25 International Symposium on Grids and Clouds (ISGC) 2015 Contents Requirements of scientific computing IHEP cloud platform Virtual machine types Virtual computing cluster Dirac distributed computing Conclusion
3
3/25 International Symposium on Grids and Clouds (ISGC) 2015 Large science facilities IHEP: The largest fundamental research center in China IHEP serves as the backbone of China’s large science facilities Beijing Electron Positron Collider BEPCII/BESIII Yangbajing Cosmic Ray Observatory: ASg & ARGO Daya Bay Neutrino Experiment China Spallation Neutron Source (CSNS) Hard X-ray Modulation Telescope(HXMT) Accelerator-driven Sub-critical System (ADS) Jiangmen Neutrino Underground Observatory (JUNO) Under planning: BAPS, LHAASO, XTP, HERD, …
4
4/25 International Symposium on Grids and Clouds (ISGC) 2015 BEPCII/BESIII 36 Institutions from China, US, Germany, Russian, Japan,… > 5PB in next 5 years ~ 5000 CPU cores simulation, reconstruction, analysis, … long-term data preservation data sharing between partners
5
5/25 International Symposium on Grids and Clouds (ISGC) 2015 Other experiments Daya Bay Neutrino Experiment ~200TB per year JUNO: Jiangmen Neutrino Experiment ~500TB per year LHAASO 2PB per year after 2017, accumulate 20PB+ in 10 years Atlas and CMS Tier2 site 940TB disk, 1088 CPU cores CSNS, HXMT, … 5PB data one year!!
6
6/25 International Symposium on Grids and Clouds (ISGC) 2015 Computing resources status ~ 12000 CPU cores ~ 50 queues, managed by Torque/PBS difficult to share ~ 5PB disk Lustre, Glustre, dCache/DPM, … ~ 5PB LTO4 tape two IBM 3584 tape libraries modified CERN CASTOR 1.7 PC farm built with blades Tape libraries
7
7/25 International Symposium on Grids and Clouds (ISGC) 2015 In the future, … More HEP experiments, need to manage twice or more servers as today but, no possibility of significant increase in staff numbers Is cloud a good solution ? Is cloud suitable for Scientific Computing? Time to change IT strategy!!
8
8/25 International Symposium on Grids and Clouds (ISGC) 2015 What is Cloud? NIST: Best Definitions Essential characteristics On-demand self-service Broad network access Resource pooling Rapid elasticity Measured service Service models IaaS, PaaS, SaaS Deployment models Public, private, hybrid http://csrc.nist.gov/publ ications/nistpubs/800- 145/SP800-145.pdf Is cloud beneficial to scientific computing ?
9
9/25 International Symposium on Grids and Clouds (ISGC) 2015 Easy to Maintain Hardware: services become independent of underlying physical machine Cloud services: single set of services for managing access to computing resources Scientific platforms: become separate layer deployed, controlled and managed by domain experts
10
10/25 International Symposium on Grids and Clouds (ISGC) 2015 Customized Environment Operating systems suited to your application Your applications preinstalled and preconfigured CPU, memory, and swap sized for your needs
11
11/25 International Symposium on Grids and Clouds (ISGC) 2015 Dynamic Provisioning New storage and compute resources in minutes (or less) Resources freed just as quickly to facilitate sharing Create temporary platforms for variable workloads
12
12/25 International Symposium on Grids and Clouds (ISGC) 2015 IHEPCloud: a Private IaaS platform Launched in May 2014 Three use scenario User self-Service virtual machine platform User register and destroy VM on-demand Virtual Computing Cluster Job will be allocated to virtual queue automatically when physical queue is busy Distributed computing system Work as a cloud site: Dirac call cloud interface to start or stop virtual work nodes http://cloud.ihep.ac.cn
13
13/25 International Symposium on Grids and Clouds (ISGC) 2015 IHEPCloud services Who can use? any user who has IHEP email account How many resources for user? By default, each user has 3 CPU cores and 15GB memory VM types testing machine full root privilege, no public storage UI node AFS authentication, No root privilege, public storage No some limitations like memory, CPU time, process OS types SL 55, SL 58, SL 65, SL 7 64 bits, SL 65 32 bits, Win7 64 bits add new types depends on user requirement VM IP address internal IP address (192.168.*.*) is allocated automatically foreign IP address (202.122.35.*) need the approval of administrator
14
14/25 International Symposium on Grids and Clouds (ISGC) 2015 Why does end user need IHEPCloud? Virtual testing machine Develop program or do some testing generates VM in a few minutes Login VM via ssh/VNC, remote desktop Virtual UI node debug program in computing environment login node: lxslcxx.ihep.ac.cn Limitations: Memory, CPU time, user processes, … cputime > 45m && %CPU > 60% KILL it! Affected by other users VMs: owned only by one user; no limitations
15
15/25 International Symposium on Grids and Clouds (ISGC) 2015 Virtual computing cluster If a job queue is busy, the jobs can be allocated to a virtual queue plan to run the service this year junoq: 128 CPU cores Submit job Cloud Scheduler Check Queue load IHEPCloud Start/stop VM Virtual queue Forward job Detailed: see haibo’s talk
16
16/25 International Symposium on Grids and Clouds (ISGC) 2015 Distributed computing Distributed computing has integrated cloud resources based on pilot schema, implementing dynamic scheduling Cloud resources used can be shrunk and extended dynamically according to job requirements VM1, VM2, … Cloud Distributed Computing Job1, Job2, Job3… Cloud Distributed Computing No Jobs User Job Submission Create Get Job VM Cloud Distributed Computing No Jobs Job Finished Delete
17
17/25 International Symposium on Grids and Clouds (ISGC) 2015 Cloud sites 5 cloud sites from Torino, JINR, CERN and IHEP have been set up and connected to distributed computing system About 320 CPU cores, 400GB Memory, 10TB disk
18
18/25 International Symposium on Grids and Clouds (ISGC) 2015 Cloud tests More than 4500 jobs have been done with 96% success rate Failure reason is lack of disk space Disk space will be extended in IHEP cloud
19
19/25 International Symposium on Grids and Clouds (ISGC) 2015 Performance and Physics validation Performance tests has shown that running time in the cloud sites are comparable with other production sites Simulation, Reconstruction, Download random trigger data Physics validation has proved that physics results are highly consistent between clusters and cloud sites
20
20/25 International Symposium on Grids and Clouds (ISGC) 2015 OpenStack Dashboard CEPH NetworkDB Dirac Virtual Cluster API LDAP UMT (IHEP EMAIL ) API interactive Push info. Get info. Storage path DNS API Backend storage Configuration management Register Puppet Get VM info. Register DNS Register Nagios Log Analysis Host Monitor Service monitor authentication UMT (CAS CLOUD) Interoperation Architecture
21
21/25 International Symposium on Grids and Clouds (ISGC) 2015 IHEPCloud components Core middleware: openstack open source cloud management system most popular Configuration management tool: Puppet create VM image manage applications in VM keep the consistency of VM and computing environment Authentication IHEP Email account and password AFS authentication for UI node Network management centric NetworkDB record MAC, IP, hostname, user, … each VM has a hostname, *.v.ihep.ac.cn network traffic accounting External storage Currently, images and instances are stored in local disk evaluating CEPH to support GLANCE, NOVA and Cinder
22
22/25 International Symposium on Grids and Clouds (ISGC) 2015 Network in IHEPCloud multiple IP subnets on one physical machine Vlan mode (Just L2, no router) in Openstack neutron IP gateway and 802.1Q in hardware switch Problem: trunck Big mac table Pre-config Risk of Broadcast storm Future network Core layer: Vxlan(hardware-based) Access layer: Openstack vlan mode
23
23/25 International Symposium on Grids and Clouds (ISGC) 2015 IHEPCloud Current status Released in 18 November, 2014 Built on openstack icehouse 1 control node 7 computing nodes 112 physical CPU cores / 224 Virtual CPU cores, 896GB memory totally Active 96 VMs, 172 CPU cores, 628GB memory by 11 March Control Node Computing Node VM
24
24/25 International Symposium on Grids and Clouds (ISGC) 2015 Conclusion cloud computing is widely accepted by industrial and scientific domain Scientific computing are preparing the move to cloud IHEPCloud aims at providing self-service virtual machine platform for IHEP user IHEPCloud also supports virtual cluster and distributed computing One small Cloud platform has been built and open to IHEP user freely More resources (1000+) will be added to IHEPCloud this year Investigate shibboleth to build federated cloud
25
Thank you! Any Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.