Data Federation & Data Management for the CMS Experiment at the LHC

Slides:



Advertisements
Similar presentations
Distributed Xrootd Derek Weitzel & Brian Bockelman.
Advertisements

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.
A Large Hadron Collider Case Study - Where HPC and Big Data Converge Frank Würthwein Professor of Physics University of California San Diego November 15th,
Storage Tank in Data Grid Shin, SangYong(syshin, #6468) IBM Grid Computing August 23, 2003.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Data Logistics in Particle Physics Ready or Not, Here it Comes… Prof. Paul Sheldon Vanderbilt University Prof. Paul Sheldon Vanderbilt University.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Storage, Networks, Data Management Report on Parallel Session OSG Meet 8/2006 Frank Würthwein (UCSD)
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
LHC Computing, CERN, & Federated Identities
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
Cloud computing and federated storage Doug Benjamin Duke University.
Opportunistic Computing Only Knocks Once: Processing at SDSC Ian Fisk FNAL On behalf of the CMS Collaboration.
Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
Data Formats and Impact on Federated Access
Ian Bird WLCG Workshop San Francisco, 8th October 2016
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Grid site as a tool for data processing and data analysis
Overview of the Belle II computing
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
dCache “Intro” a layperson perspective Frank Würthwein UCSD
Dagmar Adamova, NPI AS CR Prague/Rez
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
The LHC Computing Grid Visit of Her Royal Highness
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Southwest Tier 2.
Brookhaven National Laboratory Storage service Group Hironori Ito
Ákos Frohner EGEE'08 September 2008
ALICE Computing Upgrade Predrag Buncic
Computing Infrastructure for DAQ, DM and SC
Oracle Storage Performance Studies
Grid Canada Testbed using HEP applications
Haiyan Meng and Douglas Thain
ExaO: Software Defined Data Distribution for Exascale Sciences
Qingbo Zhu, Asim Shankar and Yuanyuan Zhou
This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.
The LHC Computing Grid Visit of Professor Andreas Demetriou
The LHCb Computing Data Challenge DC06
Presentation transcript:

Data Federation & Data Management for the CMS Experiment at the LHC Frank Würthwein SDSC/UCSD 11/16/16 SC16

I will restrict myself to CMS Mont Blanc Four experimental collaborations: ATLAS, CMS, LHCb, ALICE Lake Geneva LHCb ATLAS I will restrict myself to CMS CMS ALICE 11/16/16 SC16

“Big bang” in the laboratory We gain insight by colliding protons at the highest energies possible to measure: Production rates Masses & lifetimes Decay rates From this we derive the “spectroscopy” as well as the “dynamics” of elementary particles. Progress is made by going to higher energies and more proton proton collisions per beam crossing. More collisions => increased sensitivity to rare events More energy => probing higher masses, smaller distances & earlier times 11/16/16 SC16

Data Volume LHC data taking from 2010-12 & 2015-18 & 2021-23 increased data taking rate by x3 after Run 1. CMS = 80M pixel camera taking a “picture” every 25ns roughly 10 PB of data produced per second out of 40MHz event rate, 1kHz is kept in Run 2. ~ 50 PB per year archived to tape dominated by physics quality data & simulation after reconstruction. only primary datasets for the entire collaboration are archived. Centrally organized production of physics quality data for the 2000+ physicists in each collaboration. 2000+ physicists from 180 institutions in 40 countries 11/16/16 SC16

Data to Manage Datasets => distributed globally Calibration Releases => distributed globally Software Releases => distributed globally A typical physicist doing data analysis uses custom software & configs on top of a standardized software release re-applies some high level calibrations does so uniformly across all primary datasets used in the analysis. produces private secondary datasets 11/16/16 SC16

Largest national contribution is only 24% of total resources. Global Distribution Open Science Grid Largest national contribution is only 24% of total resources. 11/16/16 SC16

Software & Calibrations Both are distributed via systems that use Squid caches. Calibrations: Frontier system is backended by an Oracle DB Software: CVMFS is backended by a filesystem Data distribution achieved via globally distributed caching infrastructure. 11/16/16 SC16

CMS Ops for last 6 months 180,000 cores 150,000 cores Routine operations across ~50-100 clusters worldwide 100,000 cores Daily average for the last 6 months

Google Compute Engine SC16 Google = 153.7k jobs CMS global = 124.9k jobs Details see: Burt Holzman 3:30pm in Google booth https://cloudplatform.googleblog.com/2016/11/Google-Cloud-HEPCloud-and-probing-the-nature-of-Nature.html 500TB of “PU data” staged into GCE. Run simulation & digitization & reconstruction in one step. Export output files end of job to FNAL via xrdcp. 11/16/16 SC16

Dataset Distribution ~ 50PB per year Disk Space 2017 = 150 Petabytes Tape space 2017 = 246 Petabytes 11/16/16 SC16

Dataset Distribution Strategies Managed Pre-staging of datasets to clusters managed based on human intelligence managed based on “data popularity” Data Transfer integrated with processing workflow determine popularity dynamically based on pending workloads in WMS. Remote file open & reads via data federation Dynamic caching just like for calibrations & software. 11/16/16 SC16

… making the case for WAN reads. Any Data, Any Time, Anywhere: Global Data Access for Science http://arxiv.org/abs/1508.01443 … making the case for WAN reads. 11/16/16 SC16

Optimize Data Structure for Partial Reads 11/16/16 SC16

Data Federation for WAN Reads Applications connect to local/regional redirector. Redirect upwards only if file does not exist in tree below. Minimizing WAN read access latency this way. Global Redirector Xrootd US Redirector Xrootd EU Redirector … … Data Server XRootd Xrootd local Redirector Data Server XRootd Xrootd local Redirector Xrootd local Redirector Data Server XRootd Xrootd local Redirector Data Server XRootd Data Server XRootd Data Server XRootd Many Clusters in US Many Clusters in EU 11/16/16 SC16

XRootd Data Federation servers can be connected into arbitrary tree structure. application can connect at any arbitrary node in the tree. application read pattern is vector of byte ranges, chosen by the application IO layer for optimized read performance. 11/16/16 SC16

A Distributed XRootd Cache Global Data Federation of CMS Applications can connect at local or top level cache redirector. Test the system as individual or joint cache. Redirector top level cache UCSD Caltech Provisioned test systems: UCSD: 9 x 12 SATA disk of 2TB @ 10Gbps for each system. Caltech: 30 SATA disk of 6TB 14 SSD of 512GB @ 2x40Gbps per system Redirector Redirector … … Cache Server Cache Server Cache Server Cache Server Production Goal: Distributed cache that sustains 10k clients reading simultaneously from cache at up to 1MB/s/client without loss of ops robustness.

Caching behavior Application client requests file open Cache client requests file open from higher level redirector if file not in cache Application client requests vector of byte ranges to read Cache provides subset of bytes that exist in cache, and fetches the rest from remote. if simultaneous writes below configured threshold then write the fetched data to cache. else fetched data stays in RAM, flows through to application, and gets discarded. Cache client fills in missing pieces of file while application processes vector of bytes requested, as long as simultaneous writes below configured threshold. 11/16/16 SC16

Initial Performance Tests Write as measured at NIC Read as measured at NIC 30 Gbps Up to 5000 clients reading from 108 SATA disks across 9 servers Focusing on just one of the servers: Disk IO NIC write NIC read by design cache does not always involve disk when load gets high NIC write/read >> Disk IO => Robust serving of clients more important than cache hits

Caching at University Clusters There are ~80 US Universities participating in the LHC. Only a dozen have clusters dedicated and paid for by LHC project funds. Ideally, many others would want to use their University shared clusters to do LHC science. Data Management is the big hurdle to be effective. Notre Dame was first to adopt the XRootd caching technology as a solution to this problem. The ND CMS group uses the 25k+ core ND cluster for their science. 11/16/16 SC16

Summary & Conclusions LHC experiments have changed their Data Management strategies over time. Initially great distrust in global networks and thus rather static strategies. Spent multiple FTE years debugging global End-to-End transfer performance. Over time becoming more and more agile, aggressively using caching & remote reads to minimize disk storage costs. 11/16/16 SC16