Evolution of WLCG infrastructure Ian Bird, CERN Overview Board CERN, 30 th September 2011 Accelerating Science and Innovation Accelerating Science and.

Slides:



Advertisements
Similar presentations
Data Management TEG Status Dirk Duellmann & Brian Bockelman WLCG GDB, 9. Nov 2011.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
WLCG Cloud Traceability Working Group progress Ian Collier Pre-GDB Amsterdam 10th March 2015.
Jan 2010 Current OSG Efforts and Status, Grid Deployment Board, Jan 12 th 2010 OSG has weekly Operations and Production Meetings including US ATLAS and.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Ian Bird LHCC Referee meeting 23 rd September 2014.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
Evolution of Grid Projects and what that means for WLCG Ian Bird, CERN WLCG Workshop, New York 19 th May 2012.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
Ian Bird GDB; CERN, 8 th May 2013 March 6, 2013
WLCG operations A. Sciabà, M. Alandes, J. Flix, A. Forti WLCG collaboration workshop July , Barcelona.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons 10/12/2014.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
Summary of TEG outcomes First cut of a prioritisation/categorisation Ian Bird, CERN WLCG Workshop, New York May 20 th 2012.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
Ian Bird GDB CERN, 9 th September Sept 2015
Evolution of storage and data management Ian Bird GDB: 12 th May 2010.
Reflections “from around the block.” (Security) Ian Neilson GridPP Security Officer STFC RAL.
Future computing strategy Some considerations Ian Bird WLCG Overview Board CERN, 28 th September 2012.
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon Kick-off meeting, 24 th October 2011.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
Security Policy Update WLCG GDB CERN, 14 May 2008 David Kelsey STFC/RAL
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
EMI INFSO-RI European Middleware Initiative (EMI) Alberto Di Meglio (CERN)
LHC Computing, CERN, & Federated Identities
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
23 January 2007WLCG workshop, CERN System Management Working Group Alessandra Forti WLCG workshop CERN, 23 January 2007.
Data management demonstrators Ian Bird; WLCG MB 18 th January 2011.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Testing CernVM-FS scalability at RAL Tier1 Ian Collier RAL Tier1 Fabric Team WLCG GDB - September
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon GDB 12 th October 2011, CERN.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI A pan-European Research Infrastructure supporting the digital European Research.
LHCbComputing Update of LHC experiments Computing & Software Models Selection of slides from last week’s GDB
The HEPiX Virtualisation Working Group Towards a Grid of Clouds Tony Cass CHEP 2012 May 24 th 2012.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Evolution of storage and data management
WLCG IPv6 deployment strategy
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
David Kelsey CCLRC/RAL, UK
Database Readiness Workshop Intro & Goals
How to enable computing
WLCG Collaboration Workshop;
Input on Sustainability
Presentation transcript:

Evolution of WLCG infrastructure Ian Bird, CERN Overview Board CERN, 30 th September 2011 Accelerating Science and Innovation Accelerating Science and Innovation

Consider that: Computing models have evolved Far better understanding of requirements now than 10 years ago – Even evolved since large scale challenges Experiments have developed (different!) workarounds to manage weaknesses in middleware Pilot jobs and central task queues (almost) ubiquitous Operational effort often too high; lots of services were not designed for redundancy, fail-over, etc. Technology evolves rapidly, rest of world also does (large scale) distributed computing – don’t need entirely home grown solutions Must be concerned about long term support and where it will come from Background

But remember: Whatever we do, we must evolve whilst not disrupting the ongoing operation We have a grid for a very good reason – we need to integrate the resources provided to us – but we can make use of other technologies We have the world’s largest (only?) worldwide trust federation and a single sign-on scheme covering both authentication and authorization We have also developed a very strong set of policies that are exemplars for all other communities trying to use distributed computing In parallel we have developed the operational security teams that have brought real benefit to the HEP community We have also developed the operational and support frameworks and tools that are able to manage this large infrastructure Although …

WLCG must have an agreed, clear, and documented vision for the future; to: Better communicate needs to EMI/EGI, OSG,… Be able to improve our middleware stack to address the concerns Attempt to re-build common solutions where possible – Between experiments and between grids Take into account lessons learned (functional, operational, deployment, management…) Understand the long term support needs Focus our efforts where we must (e.g. data management), use off-the-shelf solutions where possible Must balance the needs of the experiments and the sites Strategy

To reassess the implementation of the grid infrastructures that we use in the light of the experience with LHC data, and technology evolution, but never forgetting the important successes and lessons, and ensuring that any evolution does not disrupt our successful operation. The work should: – Document a strategy for evolution of the technical implementation of the WLCG distributed computing infrastructure; – This strategy should provide a clear statement of needs for WLCG which can also be used to provide input to any external middleware and infrastructure projects. The work should, in each technical area, take into account the current understanding of: – Experiment and site needs in the light of experience with real data, operational environment(effort, functionality, security, etc.), and constraints; – Lessons learned over several years in terms of deployability of software; – Evolution of technology over the last several years. It should also consider issues of: – Long term support and sustainability of the solutions; – Achieving commonalities between experiments where possible; – Achieving commonalities across all WLCG supporting infrastructures (EGI-related, OSG, NDGF, etc). Deliverables – Assessment of the current situation with middleware, operations, and support structure. – Strategy document setting out a plan and needs for the next 2-5 years. TEG: Mandate

The MB manages the process directly, and appoints several TWGs, one for each area to be addressed, monitors the results of the technical analysis and takes final decisions on the proposals. Each TWG should be co-chaired by 1 experiment and 1 site representative. The chairs and the group members to be proposed and agreed by the MB, perhaps with the assistance of a small steering group. Steering group to define scope, mandate, and membership of each TWG, to be agreed by the MB, and to monitor progress and have editorial ownership of the process. Structure

Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done (e.g. Data management) – But the strategy needs to be documented The work must include consideration/models for long term support and sustainability Some general comments

Distributed data management and tools – Follow on from work on Amsterdam demonstrators etc – Data placement, caching – FTS re-write/replace – “xrootd” – Transfer protocols – Data access security needs/access control – Continued use of POOL? (only ATLAS?, how long?) – Interaction between DM tools and ROOT, Proof? Data Management

More to address site related services – Separation of archives and disk pools/caches – Future storage system needs: e.g. how should dcache, DPM, EOS, etc. evolve? – What are the storage system interfaces? SRM, S3,… – Filesystems / protocols: NFS4.x, S3, … – Security/access controls – These are site-run services: management interfaces? Monitoring? … Storage Management

Encompasses a broad range of topics – Pilots and pilot frameworks – What is needed from a CE now? – Is a generic WMS required by WLCG? – Use cases for clouds – Use of virtualisation – experiment expectations for using CERNVM, arbitrary images? (re- Hepix work) Site expectations for use of virtualisation in managing sites – Security model (MUPJs, etc.) – Whole node scheduling – How to ask for/use GPUs (is it required) – Information services – what is needed, what tools? Workload management

Scope should be use of “grid” database services – Not use of a db within an arbitrary application – LFC? What is still required? Deployment model – Frontier, squids, etc, vs 3D/streams, Goldengate, DataGuard, etc – What long term support is needed for Coral, COOL? – Work started in database workshop in June Databases

Should review risk analysis – what are the real threats now? Where should we focus Is the trust model still appropriate? – E.g. can we simplify the “glexec” issue? Dow e still need open WN’s? X509/VOMS/IGTF have been essential in having a world-wide use of resources – But there are problems associated with proxies Can/should other federated ID management systems be integrated? Security model

Need to document all the other pieces that we need supported – Monitoring (SAM, Nagios, etc.); better monitoring needs? Ability to analyse monitoring data? – Support tools (APEL, GGUS, etc.) – Underlying services (ActiveMQ, etc.) – Operational requirements on middleware – Application software management (e.g. cernvmfs) – Software management of middleware – Deployment management – Configuration management Operations tools and services

Buy in to the process by experiments and sites – at WLCG collaboration meeting in July and over the summer details agreed in MBs Nominations for chairs received and approved Chairs being contacted now Work to start asap, initial reports hopefully by end 2011/early 2012 – Frequent updates in MB/GDB to ensure broad dissemination of what is discussed Status