IT deputy Department Head

IT deputy Department Head
Computing at CERN Maite Barroso Lopez IT deputy Department Head CERN Accelerating Science and Innovation 2

Large Hadron Collider Accelerating Science and Innovation 3

Data acquisition 100 million channels 40 million pictures a second
Synchronise signals from all detector parts Store data on detector in memory pipelines

Pick the interesting events
40 million per second Fast, simple information Hardware trigger in a few micro seconds 100 thousand per second Fast algorithms in local computer farm Software trigger in <1 second Few 100 per second Recorded for study Muon tracks Energy deposits

Pick the interesting events: Data size
~1 Petabyte per second? Cannot afford to store it 1 year’s worth of LHC data at 1 PB/s would cost few hundred trillion dollars/euros Have to filter in real time to keep only “interesting” data We keep 1 event in a million Yes, % is thrown away ~1 Gigabyte per second 40 million per second Fast, simple information Hardware trigger in a few micro seconds 100 thousand per second Fast algorithms in computers Software trigger Few 100 per second Recorded for study

Data path Level-1 Trigger LHC & Experiments Readout Tier 0 & Grid
Event Builder Tier 0 & Grid Filter Farm (HLT) 8

LHC Data processing The experiments send to the CERN data centre over 30 petabytes per year, the equivalent of more than 5 kilometers of DVDs stacked on top of each other The LHC data are aggregated in the CERN data centre LHCb ~50 MB/sec ATLAS ~320 MB/sec 2015: ~700 MB/sec ALICE 1.25 GB/sec (ions) 2015: ~1 GB/sec 2015: ~4GB/sec CMS ~220 MB/sec 2015: ~600 MB/sec

CERN Data Centre Built in the 70s on the CERN site (Meyrin-Geneva), 3.5 MW for equipment New extension located at Wigner (Budapest), 2.7 MW for equipment Connected to the Geneva CC with 2x100Gb links (21 and 24 ms RTT) Hardware generally based on commodity 15,000 servers, providing 190,000 processor cores 80,000 disk drives providing 250,000 TB disk space 104 tape drives, providing 138,000 TB

CERN Data Centre: upgrade after Run 1 and Run 2
CERN private cloud, using OpenStack To provide computing infrastructure powering most of grid services Open source (90 contributions) Storage: Disk storage (EOS, open source) scaling to 100s PB disk in a distributed environment Tape: migration to higher density cartridges Experiments optimized their computing models And performance and efficiency of core software This led to only double computing/storage required for run 2 (initial estimation of 10 times!) Use of volunteer computing (e.g. for Atlas equivalent to a site) Providing a self-service cloud has allows resources to be made available to the physicists in the time it takes to get a coffee rather than waiting weeks for physical hardware allocation. Provide computing resources in a flexible way OPEN SOURCE note: CERN released the World Wide Web as open source in the 1990s, and with the world wide LHC computing grid of hundreds of collaborating sites, we have used open source software in a large scale for decades. In addition to open source code availability, we look for strong sustainable communities, open design, and an opportunity for CERN to contribute. Around OpenStack, we have found several projects which are part of our production cloud solution. The Puppet configurations for OpenStack ensure that all of our hypervisors, either KVM or Hyper-V, are configured in a consistent way and these configurations can be dynamically updated as we evolve the cloud. Infrastructure monitoring and management software such as Elastic Search, Kibana, Jenkins, Rundeck, and Foreman are all integrated into the new tool chain. EOS: EOS started its production phase in 2011 and currently holds 20PB of data and 158M files. It is a disk-only storage solution mainly focused on analysis and fast data processing with a very low access latency (ms to s) thanks to the multi-replication across nodes and JBOD disk layout. Fast metadata access is guaranteed by the in-memory resident per instance namespace. Xroot is the principal access protocol. Protocols such as gridftp, fuse mount and http are also supported. Authentication is done by Kerberos/X509. The four big LHC experiments are using EOS, and a shared instance for non-LHC experiments started its production phase recentl

LHC Data Distribution: WLCG Worldwide LHC Computing Grid
Tier-0 (CERN): Initial data reconstruction Data distribution Data recording & archiving Tier-1 (13 centres): Permanent storage Re-processing Analysis Tier-2 (~140 centres): Simulation End-user analysis The Worldwide LHC Computing Grid (WLCG) is a global collaboration of 170 data centres around the world, in 42 countries The CERN data centre (Tier-0) distributes the LHC data worldwide to the other WLCG sites (Tier-1 and Tier-2) WLCG provides global computing resources to store, distribute and analyse the LHC data The resources are distributed – for funding and sociological reasons

WLCG: World wide infrastructure
Facts about WLCG: A community of 10,000 physicists are the WLCG users On average around 250,000 jobs running concurrently 600,000 processing cores 15% of the WLCG computing resources are at CERN’s data centre 500 petabytes storage available worldwide 20-40 Gbit/s optical-fiber links connect CERN to each of the 13 Tier- 1 institutes M Barroso Lopez & P S Wells

LHC: Big data Few PB of raw data becomes ~100 PB!  Duplicate raw data
Simulated data Many derived data products Recreate as software gets improved Replicate to allow physicists to access it LHC: Big data Few PB of raw data becomes ~100 PB!  Some of the world’s largest and most interesting data sets (ref 2013) LHC data in run 2 estimated 50 PB per year 1 Pb = 1000 millions of Gygabytes 1 Gigabyte: A pickup truck filled with paper OR A symphony in high-fidelity sound OR A movie at TV quality El tráfico anual de Internet se estima en entre 5 y 8 exabytes. El tamaño de Internet (entendido como almacenamiento) se estima en cerca de 500 exabytes.

Data: Outlook for HL-LHC
PB We are here Estimating 400 Pb/year for 2023

CPU: Online + Offline Very rough estimate of new CPU requirements for online and offline processing per year of data taking, using a simple extrapolation of Run 1 performance scaled by the number of events. 50x today’s levels MHS06 Historical growth of 25%/year Room for improvement

What’s next in computing
Of course larger amounts of data, largest computing resources needed, faster, cheaper For the LHC upgrades For the next generation accelerators and detectors Software complexity and performance Modern CPU architectures require significant software re-engineering Existing computing models will not scale on the 10-year timescale Must live within ~flat budgets Perhaps one of the biggest challenges is data preservation “We are nonchalantly throwing all of our data into what could become an information black hole without realising it.” Vint Cerf, vice president of Google and an early internet pioneer, February 2015 How to ensure that all the data collected and published is still readable by the next generations … and how to make sense out of it CERN is leading a global effort for HEP, that others will inevitably face soon or later

Could the Web or the Grid have originated elsewhere?
Perhaps, but it remains that often we are faced to challenges 5-10 years before others Pushed by the physics needs: Need for collaboration tools for Global Science led to adopt the Web Need for collaboration of computing resources for the Global LHC led to adopt Grid Computing Need for sharing the results had led CERN to pave to way to open access to documents and now data All this has been openly released (for free) for the benefits of others as well He described CERN as being ahead of the curve, and said the technologies and processes developed — as well as the lessons learned — at CERN can be applied in other fields. However, he emphasised that CERN’s advanced capabilities are not acquired by happenstance — the organisation spends a great deal of effort in growing the skills needed to develop cutting-edge data solutions. “Education is a key element of CERN’s mission. For those working at CERN, we have technical and management training programmes and series of computing seminars as well as the CERN School of Computing. We are constantly recruiting young scientists, engineers and technicians who also bring new skills and ideas into CERN’s environment. Engagement with leading IT companies through CERN openlab has been a source of many new developments and helps train successive generations of personnel in the latest techniques,” he said.

How is data processed? Online Repeated from time to time
Repeated very frequently Anna Sfyrla CERN/ATLAS

How to make a discovery ? Analysis+ Simulation (Computing)
Grid / Cloud Big Data Accelerator Experiment Discovery

Run 1 Magnet splice update Run 2 at ~full design energy Phase I upgrades (injectors) Run 3  original design lumi Phase II upgrades (final focus) HL-LHC: ten times design lumi Full exploitation of LHC is top priority in Europe & US for high energy physics Operate HL-LHC with 5 (nominal) to 7.5 (ultimate) x1034cm-2s-1 to collect 3000/fb in order ten years.

Questions? Accelerating Science and Innovation 22

IT deputy Department Head

Similar presentations

Presentation on theme: "IT deputy Department Head"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IT deputy Department Head

Similar presentations

Presentation on theme: "IT deputy Department Head"— Presentation transcript:

Similar presentations

About project

Feedback