Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Cloud Computing Wrap-Up Fernando Barreiro Megino (CERN IT-ES) Sergey Panitkin.

Slides:



Advertisements
Similar presentations
Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 GridPP 30, Glasgow, 26th March 2013.
Advertisements

University of St Andrews School of Computer Science Experiences with a Private Cloud St Andrews Cloud Computing co-laboratory James W. Smith Ali Khajeh-Hosseini.
Tier-1 Evolution and Futures GridPP 29, Oxford Ian Collier September 27 th 2012.
1 13 Nov 2012 John Hover Virtualized/Cloud-based Facility Resources John Hover US ATLAS Distributed Facility Workshop UC Santa Cruz.
Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Testing as a Service with HammerCloud Ramón Medrano Llamas CERN, IT-SDC
Analysis support issues, Virtual analysis facilities (ie Tier 3s in the cloud) Doug Benjamin (Duke University) on behalf of Sergey Panitkin, Val Hendrix,
Software to Data model Lenos Vacanas, Stelios Sotiriadis, Euripides Petrakis Technical University of Crete (TUC), Greece Workshop.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Grids, Clouds and the Community. Cloud Technology and the NGS Steve Thorn Edinburgh University Matteo Turilli, Oxford University Presented by David Fergusson.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Rackspace Analyst Event Tim Bell
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Successful Common Projects: Structures and Processes WLCG Management.
Tim Bell 24/09/2015 2Tim Bell - RDA.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
1 Resource Provisioning Overview Laurence Field 12 April 2015.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Exploiting Virtualization & Cloud Computing in ATLAS 1 Fernando H. Barreiro.
ALICE Offline Week | CERN | November 7, 2013 | Predrag Buncic AliEn, Clouds and Supercomputers Predrag Buncic With minor adjustments by Maarten Litmaath.
WLCG operations A. Sciabà, M. Alandes, J. Flix, A. Forti WLCG collaboration workshop July , Barcelona.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
The Eucalyptus Open-source Cloud Computing System Daniel Nurmi Rich Wolski, Chris Grzegorczyk, Graziano Obertelli, Sunil Soman, Lamia Youseff, Dmitrii.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Arne Wiebalck -- VM Performance: I/O
Commissioning the CERN IT Agile Infrastructure with experiment workloads Ramón Medrano Llamas IT-SDC-OL
PanDA & BigPanDA Kaushik De Univ. of Texas at Arlington BigPanDA Workshop, CERN October 21, 2013.
Tim Bell 04/07/2013 Intel Openlab Briefing2.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
1 Plans for “Clouds” in the U.S. ATLAS Facilities Michael Ernst Brookhaven National Laboratory.
CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1 how to profit of the ATLAS HLT farm during the LS1 & after Sergio Ballestrero.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI CERN and HelixNebula, the Science Cloud Fernando Barreiro Megino (CERN IT)
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Cloud computing and federated storage Doug Benjamin Duke University.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
INDIGO – DataCloud CERN CERN RIA
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI Work Package 5 Infrastructure.
Using the CMS Higher Level Trigger Farm as a Cloud Resource David Colling Imperial College London.
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI John Gordon EGI Virtualisation and Cloud Workshop Amsterdam 13 th May 2011.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI John Gordon EGI Virtualisation and Cloud Workshop Amsterdam 12 th May 2011.
EGI-InSPIRE RI EGI-InSPIRE RI EGI-InSPIRE Software provisioning and HTC Solution Peter Solagna Senior Operations Manager.
WLCG IPv6 deployment strategy
Review of the WLCG experiments compute plans
Virtualization and Clouds ATLAS position
Dag Toppe Larsen UiB/CERN CERN,
Dag Toppe Larsen UiB/CERN CERN,
ATLAS Cloud Operations
CPU accounting of public cloud resources
WLCG Collaboration Workshop;
Cloud Computing R&D Proposal
Presentation transcript:

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Cloud Computing Wrap-Up Fernando Barreiro Megino (CERN IT-ES) Sergey Panitkin (BNL) Summary of presentations by Tim Bell, Doug Benjamin, Michael Ernst, Joanna Huang & Ryan Tailor 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI Motivation for Cloud Computing HEP is no longer a special case for compute Several sites/national projects are starting to adopt a tool chain model of existing open source tools and build cloud instances to expand their computing resources –Ease of use, flexibility for facility management and VO usage Renting out CPU/storage from commercial providers still prohibitive, but we get frequent possibilities to spill over on external resources Research clouds: FutureGrid, Stratuslab Projects/grants on commercial resources: HelixNebula, Amazon, Google Compute Engine Hopefully experiment HLT farms Several facilities from all levels (T0/1/2/3) have presented their plans to move or expand towards a cloud infrastructure during the jamboree 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI CERN Agile Infrastructure Overview 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI CERN AI OpenStack: Plans Production Preparation High availability of OpenStack Monitoring Ongoing extensions Usage accounting Quota management External block storage (e.g. Gluster or NetApp) Plans Production service in Q in Meyrin data centre Extension to Budapest start of 2013: aim to run 90% virtualised services Target: 100K - 300K VMs on 15K hypervisors by 2015 Migration Existing services will be increasingly run on virtual machines Legacy tools (e.g. Quattor) to be replaced over the next 2 years CERN Virtualisation Infrastructure (CVI) and Lxcloud test bed will be migrated to the CERN Private cloud New centre in Budapest Additional 2.7MW usable power Hands off facility 200Gbit/s network to CERN 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI Experiment tests on CERN AI Experiment tests started end of October. Successful setup after ~2 weeks, including training of technical student (Katarzyna Kucharczyk) PanDA production jobs in 12 hours 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI Virtualization in BNL and US facilities Open Source framework selection in BNL OpenStack Puppet Boxgrinder BNL_CLOUD: Standard production PanDA site WAN I/O: queue can be extended transparently to Amazon or other public academic cloud Steadily running ~200 prod jobs on auto-built VMs for months HC tests, auto-exclude enabled Performance actually better than main BNL prod site Ran hybrid Openstack/EC2 cluster for a week US cloud resources to grow in the near future BNL to 1k – 2k VMs in the next months Cloud resources at 1 US Tier-2 in Q3/ Prerequisite Scalable network architecture Install everything programmatically BNL_CLOUD faster than BNL_CVMFS_1 (wallclock 1740s vs. 1960s) Hammercloud (ATLASG4_trf_7.2...) Setup time (no AFS)? No shared filesystem? Similar spread for other tests (e.g., PFT Evgen ) Anticipate using Iljia’s HC framework to conduct more tests 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI NeCTAR - National eResearch Collaboration Tools and Resources $47M Australian government funded project Build new infrastructure specifically for Australian researchers Augment Australian ATLAS Tier 2 capacity Build federated Tier 3 for high throughput data analysis 25,000 core IaaS setup spanning 8 locations and based on OpenStack 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI Nectar T2 System Framework PanDA production and analysis queues Australia-NECTAR and ANALY_NECTAR 160 cores running ATLAS production jobs PanDA production and analysis queues Australia-NECTAR and ANALY_NECTAR 160 cores running ATLAS production jobs 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI Nectar Federated T3 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI UVic Experience One of the pioneers running production queues on cloud CPUs Early tests Nov. 2011, standard operation since April /14/2015 Cloud Computing Summary

EGI-InSPIRE RI UVic CloudScheduler UVic team can help you enable your cloud resources and manage them through CloudScheduler (see Ryan’s presentation for steps and alternatives) 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI Dynamic Provisioning in APF Under development in BNL Prototype by end February Full support for VM lifecycle Flexible algorithms to boot/terminate running VMs Cascading hierarchies: programmatic scheduling on local  private  commercial clouds based on job priority and cloud cost 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI Experiment HLT farms LS1 opens the opportunity to use the HLT farms to extend offline computing resources with several thousand cores ATLAS and CMS plan to setup an OpenStack cloud Too early to confirm most of the details GridPP built cloud instance at Imperial College Test-bed for CMS in preparation of HLT cloud UK-ATLAS usage of this cloud to be decided Also HLT test-bed? Evaluate root access to/from the cloud and storage at other UK sites? 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI T3 Analysis Clusters US ATLAS users need to move towards central resources: Beyond Pledged resources New technologies on horizon Cloud computing and virtual Tier 3’s Federated storage and caching to reduce data management Sergey and Doug independent work on data access/handling using Xrootd and the federation WAN data access User code must be improved to reduce latencies Decouples Storage and CPU Test-bed: scrounging of resources By necessity and not necessarily by choice Public clouds (EC2, Google) Research clouds (Future grid) A stable Private cloud testbed would be appreciated 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI T3 PanDA clusters PanDA analysis clusters on GCE+FutureGrid+AWS (Doug, Henrik, Val) Usage of puppet Node configurations (CVMFS, xrootd, condor, PanDA, etc.) Cluster orchestration Investigating whether puppet can also be used for scaling the size of the cluster Open questions: What is the scale of the virtual clusters? From personal analysis cluster to multi-institution analysis cluster Masterful vs. master-less setups – what is the easiest for the researcher? Proxies for Panda Pilots (personal or robotic) How is the data handling and caching done? Initial activity used Federation storage as source What is the latency incurred by this dynamic system? What is the performance penalty of virtualization? 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI T3 PROOF clusters T3 type PROOF/Xrootd cluster on Google Compute Engine (Sergey) D3PD based physics analysis 64 nodes with ~500 cores 180TB of ephemeral storage Studying data transfer using DQ2, xrdcp and direct read Plan to study scalability of PROOF workload distribution system at very large number of nodes Interest from UT-Austin ATLAS group in personal cluster solution (Peter Onyisi) Recent activity Looking at very interactive use case (more PROOF than PanDA) TACC Alamo FutureGrid hosted resources 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI Common Tools OpenStack: Currently cloud solution with most momentum Starting to appear in ATLAS environment Room for collaboration: first with general user community and also within HEP We do have influence in future developments: Tim Bell appointed by Board of Directors to establish new User Committee HLT farms during LS1 Puppet as preferred configuration management tool Most deployments depend CVMFS 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI Achievements Historically ATLAS was the first LHC experiment to actively explore the integration of cloud resources Other experiments (CMS, LHCb) have started equivalent activities, opening possibilities for common exploration Running PanDA cloud production queues is a standard operation Use case is best suited for the cloud: jobs have low I/O requirements Examples in UVic, BNL, Australia, CERN… Some experience with XRootD for analysis clusters Data model in analysis clusters still to be defined Would be interesting to run performance benchmarks again CloudScheduler and soon APF to manage dynamic provisioning 10/14/2015 Cloud Computing Summary

EGI-InSPIRE RI Missing Pieces Data management to ensure scalability of above use-cases We have almost no experience and also no concrete plan yet BNL announced effort in cloud storage Waiting for evolution of middleware Data management would help to accelerate analysis use case APEL accounting in the cloud Meeting between several interested parties will happen tomorrow Configuration and dynamic management of cloud resources What information do we need in AGIS? E.g. SW release publication for queues without CE Declaration of downtimes for cloud resources, including services as Cloud Scheduler Interest in creating, managing and deleting all elements (e.g. PanDA sites, DDM endpoints…) programmatically 10/14/2015 Cloud Computing Summary