Perspectives on LHC Computing José M. Hernández (CIEMAT, Madrid) On behalf of the Spanish LHC Computing community Jornadas CPAN 2013, Santiago de Compostela.

Slides:

Advertisements

Similar presentations

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

DISTRIBUTED COMPUTING

José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.

Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)

PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…

Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.

DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab

1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.

ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

ATLAS Grid Data Processing: system evolution and scalability D Golubkov, B Kersevan, A Klimentov, A Minaenko, P Nevski, A Vaniachine and R Walker for the.

Ian Bird GDB; CERN, 8 th May 2013 March 6, 2013

Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 1 Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

1 Evolving ATLAS Computing Model and Requirements Michael Ernst, BNL With slides from Borut Kersevan and Karsten Koeneke U.S. ATLAS Distributed Facilities.

Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.

Future computing strategy Some considerations Ian Bird WLCG Overview Board CERN, 28 th September 2012.

Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.

Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.

LHC Computing, CERN, & Federated Identities

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.

Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.

WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.

Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: /zenodo.45449, CC-BY-SA,

Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.

Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 

1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.

Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.

LHCbComputing Update of LHC experiments Computing & Software Models Selection of slides from last week’s GDB

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.

ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.

CEPC software & computing study group report

Organizations Are Embracing New Opportunities

Ian Bird WLCG Workshop San Francisco, 8th October 2016

Computing models, facilities, distributed computing

Overview of the Belle II computing

evoluzione modello per Run3 LHC

for the Offline and Computing groups

Dagmar Adamova, NPI AS CR Prague/Rez

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

Thoughts on Computing Upgrade Activities

Cloud Computing R&D Proposal

The latest developments in preparations of the LHC community for the computing challenges of the High Luminosity LHC Dagmar Adamova (NPI AS CR Prague/Rez)

New strategies of the LHC experiments to meet

Presentation transcript:

Perspectives on LHC Computing José M. Hernández (CIEMAT, Madrid) On behalf of the Spanish LHC Computing community Jornadas CPAN 2013, Santiago de Compostela

José Hernández The LHC Computing Challenge  The Large Hadron Collider (LHC) delivered in Run 1 ( ) billions of recorded collisions to the experiments  ~ 100 PB of data stored at CERN on tape  The Worldwide LHC Computing Grid (WLCG) provides compute and storage resources for data processing, simulation and analysis  ~ 300k cores, ~200 PB disk, ~200 PB tape  The computing challenge resulted in a great success  Unprecedented data volume analyzed in record time delivering great scientific results (e.g. Higgs boson discovery) LHC Computing Perspectives 2

José Hernández Global effort, global success LHC Computing Perspectives 3

José Hernández Computing is part of the global effort 28 October 2013, Seoul, Korea CMS Computing Upgrade and Evolution 4 Computing

José Hernández WLCG (initial) computing model  Distributed computing resources managed using Grid technologies that needed to be developed  Centers interconnected via private and national high-capacity Ethernet networks  Centers provide mass storage (disk/tape servers) and CPU resources (x86 CPUs)  Hierarchical tiered structure  Detector data prompt reconstruction and calibration at the Tier-0 at CERN  Data intensive processing at Tier-1’s  User analysis and simulation production at Tier-2’s (LHCb only simulation)  Data tape archival at Tier-0 and Tier-1’s  Data caches at Tier-2s (except LHCb) LHC Computing Perspectives 5 All available WLCG resources have been intensively used during LHC Run 1

José Hernández ATLAS Computing scale in LHC Run 1 LHC Computing Perspectives 6  150k slots continuously utilized  ~1.4M jobs/day completed 10GB/s  More than 5 GB/s transfer rate worldwide

José Hernández CMS Computing scale in LHC Run 1 LHC Computing Perspectives 7  ~100 PB transferred between sites  ~2/3 for data analysis at T2s  Resource usage saturation. In 2012:  70k slots continuously utilized  ~500k jobs/day completed

José Hernández Computing challenges for Run2  Computing in LHC Run1 was very successful but Run 2 from 2015 poses new challenges  Increased energy and luminosity delivered by LHC in Run 2  More complex events to process  Event reconstruction time (CMS ~2x)  Higher output rate to record  Maintain similar trigger thresholds and sensitivity to Higgs physics and to potential new physics  ATLAS, CMS event rate to storage 2.5x  Need a substantial increase of computing resources that we probably cannot afford LHC Computing Perspectives 8

José Hernández Upgrading LHC Computing in LS1  The shutdown period is a valuable opportunity to asses  Lessons and operational experiences of Run 1  Computing demands of Run 2  The technical and cost evolution of computing  Undertake intensive planning and development to prepare LHC Computing for 2015 and beyond  While sustaining steady state full scale operations  With an assumption of constrained funding  This has been happening internally to the experiments and collaboratively with CERN IT, WLCG, common software and computing projects  Upgrade in parallel to accelerator and detector upgrades to push the frontiers of HEP LHC Computing Perspectives 9

José Hernández Computing strategy for Run2  Increase resources in WLCG as much as possible  Try to conform to constrained budget situation  Make a more efficient and flexible use of the available resources  Reduce CPU and storage needs  Less reprocessing passes, less simulated events, more compact data format, reduce data replication factor  Intelligent dynamic data placement  Automatic replication of hot data and deletion of cold data  Break down the boundaries between the computing tiers  Run reconstruction, simulation and analysis at Tier-1/Tier-2 indistinctly  Tier-1s extension of the Tier-0  Keep higher service level and custodial tape storage at Tier-1  Centralized production of group analysis datasets  Shrink ‘chaotic analysis’ to only what really is user specific  Remove redundancies in processing and storage, reducing operational workloads while improving turnaround for users LHC Computing Perspectives 10

José Hernández Access to new resources for Run 2  Access to opportunistic resources  HPC clusters, academic or commercial clouds, volunteer computing  Significant increase in capacity with low cost (satisfy capacity peaks)  Use HLT farm for offline data processing  A significant resource (>10k slots)  During extended periods with no data taking and even inter-fill periods  Adopt advanced architectures  Processing in Run1 done under Enterprise Linux on x86 CPUs  Many-core processors, low-power CPUs, GPU environments  Challenging heterogeneous environment  Parallelization of processing application will be key LHC Computing Perspectives 11

José Hernández Computing resources increase LHC Computing Perspectives 12  ~25% yearly growth preliminary requests for Run 2  Benefit from technology evolution to buy more capacity with same money HS06 PB

José Hernández Processing evolution  Sustaining throughput growth by replacing ever faster processors with a higher number of cores, co-processors, concurrency features  New environment: high concurrency, modest memory/core, GPUs  Multi-core now  many-core soon  finer grained parallelism needed  Many or most of our codes require extensive overhauls  Being adapted: geant4, root, reconstruction code, exp. frameworks LHC Computing Perspectives 13 Transistor count growth is holding up… …but clock speed growth suffered a heat death…

José Hernández Data Management  Where is LHC in Big Data Terms? LHC Computing Upgrade and Evolution 14 Business s sent 3000PB/year (Doesn’t count; not managed as a coherent data set) Google search 100PB Facebook uploads 180PB/year Digital health 30PB LHC data 15PB/yr YouTube 15PB/yr US Census Lib of Congress Climate DB Nasdaq Wired Magazine 4/2013 Big Data in 2012 We are big… Current LHC data set, all data products: ~250 PB Reputed capacity of NSA’s new Utah data center: 5000 PB ( MW, $2 billion)

José Hernández Data Management evolution  Data access model during LHC Run1  Pre-locate and replicate data at sites, send jobs to the data  We need more efficient distributed data handling, lower disk storage demands and better use of available CPU resources  The network has been very reliable and has experimented a large increase in bandwidth  (Aspire to) send only the data you need, only where you need it (and cache it when it arrives)  Towards transparent distributed data access enabled by the network  Industry has been at this approach for years, in content delivery networks  Already successful approaches during Run 1… LHC Computing Perspectives 15

José Hernández Data Management evolution in Run 1  Scalable access to conditions data  Frontier for Scalable Distributed DB Access  Caching web proxies provide hierarchical, highly scalable cache based data access  Experiment software provisioning to the worker nodes  CERNVM File System (CVMFS)  Evolve towards a distributed data federation… LHC Computing Perspectives 16

José Hernández Data Management evolution  Distributed data federation  A collection of disparate storage resources transparently accessible across a wide area via a common namespace (CMS AAA, ATLAS FAX)  Needs efficient remote I/O  CMS has invested heavily in I/O optimizations within the application to allow efficient reading of the data over the (long latency) network using the xrootd technology while maintaining a high CPU efficiency  Extending initial use cases: fallback on local access failure, overflow busy sites, allow interactive access to data, use diskless sites  Interesting approach: ATLAS event service  Ask for exactly what you need, have it delivered by a service that knows how to get it to you efficiently  Return the outputs in a ~steady stream, such that a WN can be lost with little lost processing  Well suited to transient opportunistic resources, volunteer computing where preemption cannot be avoided  Well suited for high-CPU low I/O workflows LHC Computing Perspectives 17

José Hernández From Grid to Clouds  Turning computing into a utility providing infrastructure as a service  Clouds evolve, complement and extend the Grid  Decrease heterogeneity seen by the user (hardware virtualization)  VMs provide a uniform user interface to resources  Integrate diverse resources manageably  Isolate software from physical hardware  Dynamic provision of resources  New resources (commercial, research clouds)  Huge community behind Cloud software  Grid of clouds already used by LHC exps  Several sites provide Cloud interface  ATLAS ~450k production jobs from Google over a few weeks  Tests on amazon EC spot pricing ~economically viable LHC Computing Perspectives 18

José Hernández Conclusions  LHC computing performed extremely well at all levels in Run 1  We know how to deliver, adapting where necessary  Excellent networks, flexible and adaptable computing models and software systems paid off in exploiting resources  LHC computing needs to face new challenges for LHC Run 2  Large increase of computing resources required from 2015  Live within constrained budgets  Use resources we own as fully and efficiently as possible  Support major development program required  Access to opportunistic and cloud resources, explore new computer and processing architectures  Evolve towards dynamic data access & distributed parallel computing  Explosive growth in data and (highly granular) processors in the wider world gives us a powerful ground for success in our evolution path  Evolve towards a more dynamic, efficient and flexible system LHC Computing Perspectives 19