Cloud Computing R&D Proposal

Slides:



Advertisements
Similar presentations
SLA-Oriented Resource Provisioning for Cloud Computing
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Technical Review Group (TRG)Agenda 27/04/06 TRG Remit Membership Operation ICT Strategy ICT Roadmap.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Assessment of Core Services provided to USLHC by OSG.
1/8 Enhancing Grid Infrastructures with Virtualization and Cloud Technologies Ignacio M. Llorente Business Workshop EGEE’09 September 21st, 2009 Distributed.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Tools for collaboration How to share your duck tales…
Experiment Support ANALYSIS FUNCTIONAL AND STRESS TESTING Dan van der Ster, CERN IT-ES-DAS for the HC team: Johannes Elmsheuser, Federica Legger, Mario.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
CMS Usage of the Open Science Grid and the US Tier-2 Centers Ajit Mohapatra, University of Wisconsin, Madison (On Behalf of CMS Offline and Computing Projects)
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
CERN - IT Department CH-1211 Genève 23 Switzerland t Operating systems and Information Services OIS Proposed Drupal Service Definition IT-OIS.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Why a Commercial Provider should Join the Academic Cloud Federation David Blundell Managing Director 100 Percent IT Ltd Simple, Flexible, Reliable.
The GridPP DIRAC project DIRAC for non-LHC communities.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI CERN and HelixNebula, the Science Cloud Fernando Barreiro Megino (CERN IT)
© 2007 IBM Corporation IBM Software Strategy Group IBM Google Announcement on Internet-Scale Computing (“Cloud Computing Model”) Oct 8, 2007 IBM Confidential.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Bob Jones EGEE Technical Director
ITIL: Service Transition
Introduction to Cloud Technology
Review of the WLCG experiments compute plans
Workload Management Workpackage
WLCG Workshop 2017 [Manchester] Operations Session Summary
Organizations Are Embracing New Opportunities
Eric Peirano, Ph.D., TECHNOFI, COO
Computing Operations Roadmap
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
AWS Integration in Distributed Computing
Virtualization and Clouds ATLAS position
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Future of WAN Access in ATLAS
ATLAS Cloud Operations
POW MND section.
How to enable computing
Tools and Services Workshop Overview of Atmosphere
PanDA in a Federated Environment
Readiness of ATLAS Computing - A personal view
Systems Analysis and Design
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
WLCG Collaboration Workshop;
Introduction to Cloud Computing
Monitoring of the infrastructure from the VO perspective
Input on Sustainability
Cloud Computing Dr. Sharad Saxena.
Future Data Architectures Big Data Workshop – April 2018
Presentation transcript:

Cloud Computing R&D Proposal F. H. Barreiro Megino, K. De. D. van der Ster, R. Walker on behalf of ATLAS ADC ATLAS Software and Computing Workshop, 5 April 2011 9/19/2018 ATLAS SW&C Week – April 2011

Outline Introduction Goals and scope of the project Project Roadmap Proposed development areas Conclusions UPDATE THIS SLIDE WITH NEW STRUCTURE This proposal is meant to start the discussion, and will evolve rapidly as the group starts working (after all this is R&D project) 9/19/2018 ATLAS SW&C Week – April 2011

Introduction ATLAS Computing Model designed around grid computing with >100 WLCG sites Over a year in full operation coping with data-taking requirements Cloud Computing: efficiently share bare hardware resources without sacrificing flexibility in services offered to applications new and increasingly competitive market has emerged to offer these cost effective computing resources It is of interesting for ATLAS Distributed Computing to investigate opportunities for adapting its existing services to make use of cloud computing resources. Design and implement cloud-awareness in the Distributed Data Management (DDM) system, the Production and Distributed Analysis (PanDA) system, and in related tools and services. 9/19/2018 ATLAS SW&C Week – April 2011

Cloud Computing Proposal Project description is available now: “Proposal for Cloud Computing R&D in ATLAS Distributed Computing” https://twiki.cern.ch/twiki/bin/viewauth/Atlas/ TaskForcesAndRnD#Cloud_Computing_RD This talk summarizes the proposal document. 9/19/2018 ATLAS SW&C Week – April 2011

Goals and Scope Primary goals: to evaluate available cloud technologies in relation to the use-cases presented by ATLAS data management, processing and analysis to design a model for transparently integrating cloud computing resources with the ADC software and service stack to implement the ATLAS cloud computing model into DDM, PanDA and related tools and services. Collaborate with related groups in WLCG, CERN-IT and other LHC experiments: Consolidate efforts and leverage commonalities Profit from existing experiences 9/19/2018 ATLAS SW&C Week – April 2011

Potential Use-Cases Potential ADC use-cases: Monte Carlo simulation on the cloud with stage-out to traditional grid storage or long-term storage on the cloud Data reprocessing in the cloud (with a strong caveat related to cost; see next slide) Distributed analysis on the cloud using data which is accessed remotely to the grid sites, or analysis of data which is located in the cloud Resource capacity bursting which is managed centrally (e.g. to handle urgent reprocessing tasks) or regionally (e.g. to handle urgent local analysis requests) In general, all ATLAS use cases should be possible, making cloud computing sites equivalent to grid sites 9/19/2018 ATLAS SW&C Week – April 2011

Cost Awareness Need to pay special attention to the cost of cloud resources: Some will be provided by our collaborators and other resources will be commercial “Free” cloud resources at academic and research institutions: A cloud API (e.g. OpenStack, etc…) would be an alternative to grid middleware and enables sites to better manage local resources ADC should be able to use a WLCG site that makes resources available via a cloud API Commercial cloud resources: These resources (e.g. Amazon, Rackspace, etc…) present a mechanism to rapidly scale up the overall ADC computing capacity, but come with financial costs The ADC cloud computing model will incorporate strategies to optimize cost effectiveness 9/19/2018 ATLAS SW&C Week – April 2011

Roadmap Overall roadmap: Exploratory phase to start now and finish around end of the summer Conclude with a Cloud Computing Model document Development phase to start in fall 9/19/2018 ATLAS SW&C Week – April 2011

Roadmap (1) Basic Research Review the work already carried out within ATLAS, CERN IT, WLCG, sites, EGI and OSG If possible, organize workshop to collect information about existing cloud computing activities in collaborating organisations Implement primitive data management and job execution on the cloud Virtual machines CERNVM as a platform, which gives access to ATLAS tools and software Evaluate potential resources Various sites (e.g. Magellan at ANL, lxcloud at CERN, BNL cloud, other cloud infrastructures related to WLCG (e.g. in Canada), commercial clouds, etc…) Various cloud APIs (Amazon (EC2, S3, etc.), OpenStack, Nimbus, etc…). Need to understand the long-term sustainability of the APIs. Implement primitive functionalities Move data in and out of the cloud Execute basic jobs on the cloud Use-cases study Evaluate the ADC use-cases in relation to both commercial cloud computing. Estimate costs for various models. Explore new use-cases presented by cloud computing. 9/19/2018 ATLAS SW&C Week – April 2011

Roadmap (2) Design of the Cloud Computing Model Development Testing The goal of this document will be to: describe how ATLAS can make use of both pledged and chargeable cloud resources present strategies to minimally impact the existing services so that the usage of cloud resources would be transparent to end-user physicists present cost-effective models for the various ADC use-cases on commercial clouds incorporate legal and security considerations Development Initial ideas are detailed in the section below. Testing Integration of the cloud with existing monitoring services, functional tests (DDM, analysis, production) for stability and reliability evaluations Perform stress tests of the cloud solutions to study performance 9/19/2018 ATLAS SW&C Week – April 2011

Development: DDM Need to implement plugin libraries that support cloud I/O for Site Services and DQ2 clients Probably need to support different cloud APIs depending on the available resources May create requirements for FTS Bookkeeping of location information for ATLAS datasets stored in the cloud Investigate the usage of cloud-based content delivery solutions to automatically address “hot” data use-cases. 9/19/2018 ATLAS SW&C Week – April 2011

Development: Workload Management PanDA pilot needs a mover module which can stage-in (e.g. wget) cloud resident data stage-out and register output files Investigate Condor glide-in technology for pilot factory (send one glide-in – get n pilots) PanDA server: start with a set of cloud-based PanDA sites/queues in later phases evaluate cost-aware job brokerage Automatic cloud resource provisioning: Panda could request new/larger cloud capacity subject to global workload and cost considerations A tool which enables automatic resource provisioning would need to be developed 9/19/2018 ATLAS SW&C Week – April 2011

Development: Integration with monitoring and information systems Add metadata about cloud-based resources into the ADC monitoring and information systems Dashboards, BDII, AGIS, PanDA schedconfig, TiersOfATLAS… This point depends fully on the design of the ADC Cloud Computing Model 9/19/2018 ATLAS SW&C Week – April 2011

Conclusions Comments about the proposal document are welcome We expect development requirements will be placed on many areas in and outside ADC collaboration with the larger team of ADC developers will be necessary collaboration with other groups in WLCG will be emphasized in order to find solutions which can be maintained and therefore sustained by a larger community Next step: Wait for Cloud Computing Workshop announcement. 9/19/2018 ATLAS SW&C Week – April 2011