Progress in Computing Ian Bird ICHEP 2010 28 th July 2010, Paris

Slides:



Advertisements
Similar presentations
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Advertisements

Sue Foffano LCG Resource Manager WLCG – Resources & Accounting LHCC Comprehensive Review November, 2007 LCG.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Welcome to CERN Research Technology Training Collaborating.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Project Status Report Ian Bird Computing Resource Review Board 30 th October 2012 CERN-RRB
Massive Computing at CERN and lessons learnt
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
LHC: An example of a Global Scientific Community Sergio Bertolucci CERN 5 th EGEE User Forum Uppsala, 14 th April 2010.
Resources and Financial Plan Sue Foffano WLCG Resource Manager C-RRB Meeting, 12 th October 2010.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Emmanuel Tsesmelis 2 nd CERN School Thailand 2012 Suranaree University of Technology.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Petabyte-scale computing for LHC Ian Bird, CERN WLCG Project Leader WLCG Project Leader ISEF Students 18 th June 2012 Accelerating Science and Innovation.
User views from outside of Western Europe MarkoBonac, Arnes, Slovenia.
Rackspace Analyst Event Tim Bell
Notur: - Grant f.o.m is 16.5 Mkr (was 21.7 Mkr) - No guarantees that funding will increase in Same level of operations maintained.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Ian Bird LCG Project Leader Project status report WLCG LHCC Referees’ meeting 16 th February 2010.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
SC4 Planning Planning for the Initial LCG Service September 2005.
WelcomeWelcome CSEM – CERN Day 23 rd May 2013 CSEM – CERN Day 23 rd May 2013 to Accelerating Science and Innovation to Accelerating Science and Innovation.
Ian Bird GDB CERN, 9 th September Sept 2015
Procedure to follow for proposed new Tier 1 sites Ian Bird CERN, 27 th March 2012.
Evolution of storage and data management Ian Bird GDB: 12 th May 2010.
Ian Bird LCG Project Leader WLCG Update 6 th May, 2008 HEPiX – Spring 2008 CERN.
Procedure for proposed new Tier 1 sites Ian Bird WLCG Overview Board CERN, 9 th March 2012.
Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB
LHC Computing, CERN, & Federated Identities
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Frédéric Hemmer IT Department 26 th January 2010 Visit of Michael Dell 1 Frédéric.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Ian Bird LCG Project Leader WLCG Status Report CERN-RRB th April, 2008 Computing Resource Review Board.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 1 st March 2011 Visit of Dr Manuel Eduardo Baldeón.
The Mission of CERN  Push back  Push back the frontiers of knowledge E.g. the secrets of the Big Bang …what was the matter like within the first moments.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
WLCG after 1 year with data: Prospects for the future Ian Bird; WLCG Project Leader openlab BoS meeting CERN4 th May 2011.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Evolution of WLCG infrastructure Ian Bird, CERN Overview Board CERN, 30 th September 2011 Accelerating Science and Innovation Accelerating Science and.
WLCG: The 1 st year with data & looking to the future WLCG: Ian Bird, CERN WLCG Project Leader WLCG Project LeaderLCG-France; Strasbourg; 30 th May 2011.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
Global Aluminium Pipe and Tube Market to 2018 (Market Size, Growth, and Forecasts in Nearly 60 Countries) Published Date: Jul-2014 Reports and Intelligence.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Computing Fabrics & Networking Technologies Summary Talk Tony Cass usual disclaimers apply! October 2 nd 2010.
WLCG – Status and Plans Ian Bird WLCG Project Leader openlab Board of Sponsors CERN, 23 rd April 2010.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Evolution of storage and data management
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Project Status Report Computing Resource Review Board Ian Bird
Long-term Grid Sustainability
Dagmar Adamova, NPI AS CR Prague/Rez
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
WLCG Collaboration Workshop;
New strategies of the LHC experiments to meet
Input on Sustainability
LHC Data Analysis using a worldwide computing grid
Presentation transcript:

Progress in Computing Ian Bird ICHEP th July 2010, Paris

Outline Progress in Computing (for LHC) – Where we are we today & how we got here. – Achievements in computing of the experiments – Some representative statistics and plots from parallel sessions … and the outlook? – Grids  clouds? Sustainability? Thanks to: – Contributors to Track on “Advances in Instrumentation and Computing for HEP” – various slides used to illustrate points

Progress... Overall is clear – physics output in very short time Huge effort: Combination of experiment sw & computing and grid infrastructures And a lot of testing !

Worldwide resources Today >140 sites ~150k CPU cores >50 PB disk Today >140 sites ~150k CPU cores >50 PB disk

Lyon/CCIN2P3 Barcelona/PIC De-FZK US-FNAL Ca- TRIUMF NDGF CERN US-BNL UK-RAL Taipei/ASGC 26 June 2009Ian Bird, CERN5 Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. WLCG Collaboration Status Tier 0; 11 Tier 1s; 64 Tier 2 federations WLCG Collaboration Status Tier 0; 11 Tier 1s; 64 Tier 2 federations Amsterdam/NIKHEF-SARA Bologna/CNAF

Ian Bird, CERN6 From testing to data: Independent Experiment Data Challenges Service Challenges proposed in 2004 To demonstrate service aspects: -Data transfers for weeks on end -Data management -Scaling of job workloads -Security incidents (“fire drills”) -Interoperability -Support processes Service Challenges proposed in 2004 To demonstrate service aspects: -Data transfers for weeks on end -Data management -Scaling of job workloads -Security incidents (“fire drills”) -Interoperability -Support processes SC1 Basic transfer rates SC2 Basic transfer rates SC3 Sustained rates, data management, service reliability SC4 Nominal LHC rates, disk  tape tests, all Tier 1s, some Tier 2s CCRC’08 Readiness challenge, all experiments, ~full computing models STEP’09 Scale challenge, all experiments, full computing models, tape recall + analysis Focus on real and continuous production use of the service over several years (simulations since 2003, cosmic ray data, etc.) Data and Service challenges to exercise all aspects of the service – not just for data transfers, but workloads, support structures etc. Focus on real and continuous production use of the service over several years (simulations since 2003, cosmic ray data, etc.) Data and Service challenges to exercise all aspects of the service – not just for data transfers, but workloads, support structures etc. e.g. DC04 (ALICE, CMS, LHCb)/DC2 (ATLAS) in 2004 saw first full chain of computing models on grids

Experiment models have evolved Models all ~based on the MONARC tiered model of 10 years ago Several significant variations, however

Data transfer capability today able to manage much higher bandwidths than expected/feared/planned Ian Bird, CERN8 Data transfer Fibre cut during STEP’09: Redundancy meant no interruption Fibre cut during STEP’09: Redundancy meant no interruption Data transfer: SW: gridftp, FTS (interacts with endpoints, recovery), experiment layer HW: light paths, routing, coupling to storage Operational: monitoring Data transfer: SW: gridftp, FTS (interacts with endpoints, recovery), experiment layer HW: light paths, routing, coupling to storage Operational: monitoring + the academic/research networks for Tier1/2!

Traffic on OPN up to 70 Gb/s! - ATLAS reprocessing campaigns Traffic on OPN up to 70 Gb/s! - ATLAS reprocessing campaigns Sergio Bertolucci, CERN9 In terms of data transfers... Final readiness test (STEP’09) Preparation for LHC startupLHC physics data Nearly 1 petabyte/week 2009: STEP09 + preparation for data Tier 0 traffic: > 4 GB/s input > 13 GB/s served Tier 0 traffic: > 4 GB/s input > 13 GB/s served Data export during data taking: - According to expectations on average Data export during data taking: - According to expectations on average

Data distribution ATLAS: Total throughput T0-T1; T1-T1; T1-T2 G. Stewart, 1225 CMS: T0 – T1 M. Klute, 1223 LHCb: T0 – T1 M. Adinolfi, 1221

Data distribution for analysis For all experiments: early data has been available for analysis within hours of data taking

Use of CPU... Peaks of 1M jobs/day now Use ~100k cores equivalent Tier 2s heavily used wrt Tier 1s 1 M jobs/day ~100 k cores

Processing & re-processing ATLAS processing: P. Onyisi, 1197 First data has been reprocessed many times Reaching the point where this is slowing down now First data has been reprocessed many times Reaching the point where this is slowing down now

Sometimes takes operational effort... Daily operations meetings – all experiments – many sites – address exactly these kind of problems Still a significant level of manual intervention and coordination required But this is now at a level that is sustainable – in CCRC’08 and even STEP09 this was not clear Daily operations meetings – all experiments – many sites – address exactly these kind of problems Still a significant level of manual intervention and coordination required But this is now at a level that is sustainable – in CCRC’08 and even STEP09 this was not clear

Site availability and readiness

Analysis & users LHCb: CPU at Tier 1s 60% user and 40% reconstruction;  200 users  30k jobs/day LHCb: CPU at Tier 1s 60% user and 40% reconstruction;  200 users  30k jobs/day ALICE: >250 users ~1300 jobs on average over 4 months ALICE: >250 users ~1300 jobs on average over 4 months

Analysis & users – 2 A significant fraction of the results shown here was on samples including data taken last week! CMS ATLAS

Resource Evolution Expected needs in 2011 & 2012 Need TDR for T0+1 CPU and Disk for 1 st nominal year NB. In 2005 only 10% of 2008 requirement was available. The ramp-up has been enormous! 

Prospects for next few years We have an infrastructure demonstrated to be able to support LHC data processing and analysis Significant science grid (e-science/ cyberscience) infrastructures spun off and used to provide support – These are now evolving: EGI in Europe, OSG phase 2, etc This is not just software – there are significant operational infrastructures behind it – World wide trust  single authentication/authorization – Coordinated security policies and operational response – Operational processes, monitoring, alarms, reporting,... Must be able to evolve the technical implementation (i.e. Grid middleware) without breaking the overall infrastructure

Need to adapt to changing technologies – Major re-think of storage and data access – Use of many-core CPUs (and other processor types?) – Virtualisation as a solution for job management Brings us in line with industrial technology Integration with public and commercial clouds Network infrastructure – This is the most reliable service we have – Invest in networks and make full use of the distributed system Grid Middleware – Complexity of today’s middleware compared to the actual use cases – Evolve by using more “standard” technologies: e.g. Message Brokers, Monitoring systems are first steps But: retain the WLCG infrastructure – Global collaboration, service management, operational procedures, support processes, etc. – Security infrastructure – this is a significant achievement both the global A&A service and trust network (X509) and the operational security & policy frameworks Ian Bird, CERN20 Evolution and sustainability

1 st workshop held in June – Recognition that network as a very reliable resource can optimize the use of the storage and CPU resources The strict hierarchical MONARC model is no longer necessary – Simplification of use of tape and the interfaces – Use disk resources more as a cache – Recognize that not all data has to be local at a site for a job to run – allow remote access (or fetch to a local cache) Often faster to fetch a file from a remote site than from local tape Data management software will evolve – A number of short term prototypes have been proposed – Simplify the interfaces where possible; hide details from end-users Experiment models will evolve – To accept that information in a distributed system cannot be fully up-to- date; use remote access to data and caching mechanisms to improve overall robustness Timescale: 2013 LHC run Ian Bird, CERN21 Evolution of Data Management

Some observations Experiments have truly distributed models Needs a lot of support and interactions with sites – heavy but supportable Network traffic far in excess of what was anticipated, but it is supportable at the moment – Must plan for the future Limited amount of data has allowed many reprocessings – LHCb already on their nominal model... Today resources are plentiful, and not yet full. This will surely change... Significant numbers of people successfully doing analysis

Conclusions Distributed computing for LHC is a reality and enables physics output in a very short time Experience with real data and real users suggests areas for improvement – – The infrastructure of WLCG can support evolution of the technology 23