Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: 10.5281/zenodo.45449, CC-BY-SA,

Slides:



Advertisements
Similar presentations
A successful public- private partnership Alberto Di Meglio CERN openlab Deputy Head.
Advertisements

A successful public- private partnership Alberto Di Meglio CERN openlab Head.
A successful public- private partnership Alberto Di Meglio CERN openlab CTO.
Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.
CERN openlab IT Challenges Workshop Alberto Di Meglio CERN openlab CTO office.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
Information Technology Research Areas Alberto Di Meglio – CERN openlab Data acquisition and filteringComputing platforms, data analysis, simulationData.
Core Skills for a Digital Economy
Big Data Power and Responsibility HEPTech, Budapest, 30 March 2015
Exa-Scale Data Preservation in HEP
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Niko Neufeld, CERN/PH-Department
Advanced Computing Services for Research Organisations Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
CERN openlab V Technical Strategy Fons Rademakers CERN openlab CTO.
CERN openlab Phase V Objectives CERN openlab Deputy Head 08/07/2014 Alberto Di Meglio.
Tim 18/09/2015 2Tim Bell - Australian Bureau of Meteorology Visit.
6 May 2014 CERN openlab IT Challenges workshop, Kacper Szkudlarek, CERN Manuel.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Technical Workshop 5-6 November 2015 Alberto Di Meglio CERN openlab Head.
Innovation & Entrepreneurship Alberto Di Meglio CERN openlab Head CERN – 26/11/2015 The Globe of Science and Innovation.
The Challenges of Scientific Computing
Niko Neufeld, CERN/PH. ALICE – “A Large Ion Collider Experiment” Size: 26 m long, 16 m wide, 16m high; weight: t 35 countries, 118 Institutes Material.
A successful public- private partnership Alberto Di Meglio CERN openlab Head.
CERN openlab Overview CERN openlab Introduction Alberto Di Meglio.
Future computing strategy Some considerations Ian Bird WLCG Overview Board CERN, 28 th September 2012.
Dr. Andreas Wagner Deputy Group Leader - Operating Systems and Infrastructure Services CERN IT Department The IT Department & The LHC Computing Grid –
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
LHC Computing, CERN, & Federated Identities
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
Ian Bird, CERN WLCG LHCC Referee Meeting 1 st December 2015 LHCC; 1st Dec 2015 Ian Bird; CERN1.
CERN openlab Overview CERN openlab Summer Students 2015 Fons Rademakers.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
A successful public- private partnership Fons Rademakers CERN openlab CTO.
16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Perspectives on LHC Computing José M. Hernández (CIEMAT, Madrid) On behalf of the Spanish LHC Computing community Jornadas CPAN 2013, Santiago de Compostela.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
A successful public- private partnership Alberto Di Meglio CERN openlab Head.
Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
J. Templon Nikhef Amsterdam Physics Data Processing Group Large Scale Computing Jeff Templon Nikhef Jamboree, Utrecht, 10 december 2012.
A successful public- private partnership Maria Girone CERN openlab CTO.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
Collaboration Board Meeting 11 March 2016 Alberto Di Meglio CERN openlab Head.
A successful public-private partnership
Knowledge Sharing and Innovation
Ian Bird WLCG Workshop San Francisco, 8th October 2016
A successful public-private partnership
evoluzione modello per Run3 LHC
Innovation & Entrepreneurship 5 November 2015
A successful public-private partnership
A successful public-private partnership
FUTURE ICT CHALLENGES IN SCIENTIFIC COMPUTING
for the Offline and Computing groups
Dagmar Adamova, NPI AS CR Prague/Rez
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Thoughts on Computing Upgrade Activities
An IT R&D Collaboration Framework
Presentation transcript:

Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: /zenodo.45449, CC-BY-SA, images courtesy of CERN

The Large Hadron Collider (LHC) Alberto Di Meglio – CERN LHC - HPC & Big Data 20162

Burning Haystacks The Detectors: 7000 tons “microscopes” 150 million sensors Data generated 40 million times per second Low-level Filters and Triggers 100,000 selections per second High-level filters (HLT) 100 selections per second  Peta Bytes / sec !  Tera Bytes / sec !  Giga Bytes / sec ! Alberto Di Meglio – CERN LHC - HPC & Big Data 20163

Storage, Reconstruction, Simulation, Distribution 6 GB/s Alberto Di Meglio – CERN LHC - HPC & Big Data 20164

Worldwide LHC Computing Grid Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-1 (12 centres): Permanent storage Re-processing Analysis Tier-2 (68 Federations, ~140 centres): Simulation End-user analysis 525,000 cores 450 PB 5

1 PB/s of data generated by the detectors Up to 30 PB/year of stored data A distributed computing infrastructure of half a million cores working 24/7 An average of 40M jobs/month An continuous data transfer rate of 4-6 GB/s across the Worldwide LHC Grid (WLCG) Alberto Di Meglio – CERN LHC - HPC & Big Data

The Higgs Boson completes the Standard Model, but the Model explains only about 5% of our Universe What is the other 95% of the Universe made of? How does gravity really works? Why there is no antimatter in nature? Exotic particles? Dark matter? Extra dimensions? 7

LHC Schedule First runLS1Second runLS2Third runLS3HL-LHC ? … LHC startup 900 GeV 7 TeV L=6x10 33 cm -2 s -1 Bunch spacing = 50 ns Phase-0 Upgrade (design energy, nominal luminosity) 14 TeV L=1x10 34 cm -2 s -1 Bunch spacing = 25 ns Phase-1 Upgrade (design energy, design luminosity) 14 TeV L=2x10 34 cm -2 s -1 Bunch spacing = 25 ns Phase-2 Upgrade (High Luminosity) 14 TeV L=1x10 35 cm -2 s -1 Spacing = 12.5 ns FCC? 50 times more data than today in the next 10 years 50 PB/s out of the detectors 5 PB/day to be stored Alberto Di Meglio – CERN LHC - HPC & Big Data 20168

Information Technology Research Areas Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 Data acquisition and filteringComputing platforms, data analysis, simulationData storage and long-term data preservationCompute provisioning (cloud) Data analytics Networks 9

Flat Budget How can we Address the Challenges? Alberto Di Meglio – CERN LHC - HPC & Big Data

Technology Evolution  More performance, less cost  Redesign of DAQ Less custom hardware, more off-the-shelf CPUs, fast interconnects, optimized SW  Redesign of SW to exploit modern CPU/GPU platforms (vectorization, parallelization)  New architectures (e.g. automata)  More performance, less cost  Redesign of DAQ Less custom hardware, more off-the-shelf CPUs, fast interconnects, optimized SW  Redesign of SW to exploit modern CPU/GPU platforms (vectorization, parallelization)  New architectures (e.g. automata) Alberto Di Meglio – CERN LHC - HPC & Big Data Computing Storage Infrastructure Data Analytics

Technology Evolution  Reduce data to be stored Move computation from offline to online  More tapes, less disks Dynamic data placements  New media Object-disks, shingled  Persistent meta-data, in-memory ops Flash/SSD, NVRAM, NVDIMM  Reduce data to be stored Move computation from offline to online  More tapes, less disks Dynamic data placements  New media Object-disks, shingled  Persistent meta-data, in-memory ops Flash/SSD, NVRAM, NVDIMM Alberto Di Meglio – CERN LHC - HPC & Big Data Computing Storage Infrastructure Data Analytics

Technology Evolution  Economies of scale  Large-scale, joint procurement of services and equipment  Hybrid infrastructure models Allow commercial resource procurement Large-scale cloud procurement models are not yet clear  Economies of scale  Large-scale, joint procurement of services and equipment  Hybrid infrastructure models Allow commercial resource procurement Large-scale cloud procurement models are not yet clear Alberto Di Meglio – CERN LHC - HPC & Big Data Computing Storage Infrastructure Data Analytics

Technology Evolution  Reduce analysis cost, decrease time  More efficient operations of the LHC systems Log analysis, proactive maintenance  Infrastructure monitoring, optimization  Data reduction, event filtering and identification on large data sets (~10PB)  Reduce analysis cost, decrease time  More efficient operations of the LHC systems Log analysis, proactive maintenance  Infrastructure monitoring, optimization  Data reduction, event filtering and identification on large data sets (~10PB) Alberto Di Meglio – CERN LHC - HPC & Big Data Computing Storage Infrastructure Data Analytics

Collaborations Alberto Di Meglio – CERN openlab Industrial Collaboration Frameworks HEP Community Joint Infrastructure Initiatives Laboratories, academia, research 15

New educational requirements Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 Multicore CPU programming, graphical processors (GPU), multithreaded software Software & Computing Engineers Data analysis technologies, tools, data storage, visualization, monitoring, security, etc. Data Scientists Applications of physics to other domains (hadron therapy, etc.), Knowledge transfer Multidisciplinary applications 16

Take-Away Messages  LHC is a long-term project Need to adapt and evolve in time  Look for cost-effective solutions Continuous technology tracking Aim for “just as good as necessary” based on concrete scientific objectives  Collaboration and education People is the most important asset to make this endeavour sustainable over time Alberto Di Meglio – CERN LHC - HPC & Big Data

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 EXECUTIVE CONTACT Alberto Di Meglio, CERN openlab Head TECHNICAL CONTACT Maria Girone, CERN openlab CTO Fons Rademakers, CERN openlab CRO COMMUNICATION CONTACT Andrew Purcell, CERN openlab Communication Officer Mélissa Gaillard, CERN IT Communication Officer ADMIN CONTACT Kristina Gunne, CERN openlab Administration Officer 18 This work is licensed under a Creative Commons Attribution- ShareAlike 3.0 Unported License. It includes photos, models and videos courtesy of CERN and uses contents provided by CERN and CERN openlab