CMS computing: model, status and plans

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
CMS Alignment and Calibration Yuriy Pakhotin on behalf of CMS Collaboration.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
WLCG/8 July 2010/MCSawley WAN area transfers and networking: a predictive model for CMS WLCG Workshop, July 7-9, 2010 Marie-Christine Sawley, ETH Zurich.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.
LHCb computing in Russia Ivan Korolko (ITEP Moscow) Russia-CERN JWGC, October 2005.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
Introduction to CMS computing J-Term IV 8/3/09 Oliver Gutsche, Fermilab.
1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007.
The Computing Project Technical Design Report LHCC Meeting June 29, 2005.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.
The LHCb CERN R. Graciani (U. de Barcelona, Spain) for the LHCb Collaboration International ICFA Workshop on Digital Divide Mexico City, October.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
SC4 Planning Planning for the Initial LCG Service September 2005.
The CMS Computing System: getting ready for Data Analysis Matthias Kasemann CERN/DESY.
CMS Computing Model summary UKI Monthly Operations Meeting Olivier van der Aa.
David Stickland CMS Core Software and Computing
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
Computing Model José M. Hernández CIEMAT, Madrid On behalf of the CMS Collaboration XV International Conference on Computing in High Energy and Nuclear.
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Oct 16, 2009T.Kurca Grilles France1 CMS Data Distribution Tibor Kurča Institut de Physique Nucléaire de Lyon Journées “Grilles France” October 16, 2009.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
1 M. Paganoni, 17/1/08 Modello di calcolo di CMS M. Paganoni Workshop Storage T2 - 17/01/08.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.
Real Time Fake Analysis at PIC
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Overview of the Belle II computing
Pasquale Migliozzi INFN Napoli
evoluzione modello per Run3 LHC
Data Challenge with the Grid in ATLAS
Main next computing activities
Status and Prospects of The LHC Experiments Computing
CMS transferts massif Artem Trunov.
LHCb computing in Russia
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Artem Trunov and EKP team EPK – Uni Karlsruhe
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
Computing Overview Topics here: CSA lessons (briefly) PADA
ILD Ichinoseki Meeting
US ATLAS Physics & Computing
N. De Filippis - LLR-Ecole Polytechnique
R. Graciani for LHCb Mumbay, Feb 2006
Heavy Ion Physics Program of CMS Proposal for Offline Computing
Grid Computing in CMS: Remote Analysis & MC Production
ATLAS DC2 & Continuous production
The ATLAS Computing Model
Development of LHCb Computing Model F Harris
The LHCb Computing Data Challenge DC06
Presentation transcript:

CMS computing: model, status and plans C. Charlot / LLR Modèle CMS (top-down) Mise en oeuvre avec CSA06 (bottom-up) Moyen en France Inclure CDF et D0 summary de HCP06 Insister sur le data modl => échantillons actuels, data sream CSA06, phedex Dire qu’il faut transférer!! Au CC et au GRIF French Computing facilities pourrait être changer en: un exemplde de T1 => CCIN2P3 et un exemple de T2 = GRIF

The problem: data volume RAW Detector data +L1, HLT results after online formatting Includes factors for poor understanding of detector, compression, .. 1.5MB/evt @ 150Hz  ~4.5PB/year (two copies, one distributed) RECO Reconstructed objects with their associated hits 250kB/evt  ~1.5PB/year (including 3 reproc. versions) AOD The main analysis format: clusters, tracks, particle id, 50kB/evt  ~2PB/year - whole copy at each T1 (e.g. CC-IN2P3) TAG High level physics objects, run info (event directory), <10kB/evt FEVT Bundling of RAW+RECO for distribution, storage + MC data in estimated 1:1 ratio with experiment data Large facteur de commisioning du détecteur (x2.5) Incertitude ~facteur 2 sur raw data size C. Charlot, 2nd LCG-France Colloquium, mars 2007

Data processing We aim for prompt data reconstruction and analysis Backlogs are the real killer Prioritisation will be important At the begining, computing system will not be 100% Cope with backlogs without delaying critical data Reserve possibility of ‘prompt calibration’ using low latency data Streaming Rule #1 of hadron collider physics: understand your trigger and selection is everything LHC analyses rarely mix inclusive triggers Classifying events early allows prioritisation Crudest example: express-line of ‘hot’ / calib events Propose o(50) ‘primary datasets’, immutable but Can have overlapp (10% assumed) Large facteur de commisioning du détecteur (x2.5) Incertitude ~facteur 2 sur raw data size C. Charlot, 2nd LCG-France Colloquium, mars 2007

Tier-0 Centre Prompt reco (24/200), FEVT storage, data distribution Provided by IT division CPU: 4.6MSI2K, Disk: 0.4PB, MSS:4,9PB, WAN: >5Gbps C. Charlot, 2nd LCG-France Colloquium, mars 2007

Tier-1 Centres Data storage, heavy processing (Re-Reco, skim, AOD extraction), raw data access, Tier-2 support 7 Tier-1: ASCC, CCIN2P3, FNAL, GridKa, CNAF, PIC, RAL Nominally, CPU: 2.5MSI2K, Disk: 1.2PB, MSS: 2.8PB, WAN: >10Gbps C. Charlot, 2nd LCG-France Colloquium, mars 2007

Tier-1 Centres Analysis, MC production, specialised support tasks Local + common use Nominally, CPU: 0.9MSI2K, Disk: 0.2PB, No MSS, WAN: >1Gbps C. Charlot, 2nd LCG-France Colloquium, mars 2007

CMS-CAF Latency critical services, analysis, Tier-1 functionality CERN responsability, open to all collaborators Roughly: Tier-1 MSS + 2 Tier-2 C. Charlot, 2nd LCG-France Colloquium, mars 2007

Ressource evolution Revised LHC planning 2008* 2009* 2010 Rate (Hz) 300 200 150 sec/an 5.106 7.5 106 107 Revised LHC planning Keep integrated data volume ~same by increased trigger rate 2008 2009 2010 CPU (MSI2K) 35.3 54.4 104.2 Disk (PB) 11.4 19.2 30.8 Tape (PB) 18.2 32.9 41.6 We should be frightened by these numbers C. Charlot, 2nd LCG-France Colloquium, mars 2007

Tier-1/Tier-2 Associations Associated Tier-1: hosting MC prod + reference for AOD serving Full AOD sample at Tier-1 (after T1T1 transfers for re-recoed AODs) Stream “allocation” ~ available disk storage at centre CCIN2P3-AF, GRIF C. Charlot, 2nd LCG-France Colloquium, mars 2007

Transfer Rates OPN in: FEVT (T0T1)+AOD (T1T1) OPN out: AOD (T1T1) T2 in: FEVTsim+AODsim (T2T1) T2 out: FEVT+AOD (T1T2) MB/s These are raw rates: no catchup, no overhead T1T1; total AOD size, replication period (currently 14 days) T1T2: T2 capacity; refresh period at T2 (currently 30 days) Average rate, worst-case peak for T1 is sum of T2 transfer capacities Weighted by data fraction at T1 Voir que l’échange des AODs est aussi couteux que le transfert des data depuis le CERN et que le transfert des data vers les T2 pour le CC => Conséquence du nbre stream (storage) vs nbre T2 C. Charlot, 2nd LCG-France Colloquium, mars 2007

Tier-0 Status (CSA06) Prompt Reconstruction at 40 Hz 50 Hz for 2 weeks, then 100 Hz Peak rate: >300 Hz for >10 hours 207M events total Uptime: 80% of best 2 weeks Achieved 100% of 4 weeks Use of Frontier for DB access to prompt reconstruction conditions The CSA challenge was the first opportunity to test this on a large scale with developed reconstruction software Initial difficulties encountered during commissioning, but patches and reduced logging allowed full inclusion into CSA C. Charlot, 2nd LCG-France Colloquium, mars 2007

Data Processing & Placement Reminder: in CMS model, each Tier-1 gets only a fraction of total RAW+RECO Chose Tier-1 destinations to meet analysis interest while not exceeding site storage capacity or bandwidth from Tier-0 Express Slides Kasman GDB 21/01/2007 C. Charlot, 2nd LCG-France Colloquium, mars 2007

Tier-0Tier-1 Transfers Goal was to sustain 150 MB/s to T1s Twice the expected 40 Hz output rate Last week’s averages hit 350MB/s (daily) 650MB/s (hourly) i.e. exceeded 2008 levels for ~10 days (with some backlog observed) Monthly T1 Transfer plot signals start Target rate Min bias only @ start Debut difficile Puis on s‘est maintenu a l‘objectif = 25MB/s (car supplément prévu de serveurs pas opérationel) T0 rate: 54 110 170 160 Hz C. Charlot, 2nd LCG-France Colloquium, mars 2007

Tier-1 Transfer Performance goals 6 of 7 Tier-1s exceed 90% availability for 30 days U.S. Tier-1 (FNAL) hit 2x goal 5 sites stored data to MSS (tape) C. Charlot, 2nd LCG-France Colloquium, mars 2007

Tier-1 Skim Jobs Tested workflow to reduce primary datasets to manageable sizes for analyses Computing provided centralized skim job workflow at T1 4 production teams Secondary datasets are registered into Dataset Bookkeeping Service and accessed like any other data Common skim job tools prepared based on “MC Truth” or Reconstruction (both types tested) Overwhelming response from CSA analysis demos About 25 filters producing ~37 (+ 21 jet) datasets ! Variety of output formats (FEVT, RECO, AOD, custom) Selected events range from <1% to 100% (for Jets split) Sizes range from <0.001 TB to 2.5 TB On a eu du mal a avoir les skims => intérêt d‘être partie prenante de la prod C. Charlot, 2nd LCG-France Colloquium, mars 2007

Jobs Execution on the Grid >50K jobs/day submitted on all but one day in final week >30K/day robot jobs 90% job completion efficiency Robot jobs have same mechanics as user job submissions via CRAB 2 submission teams set up Mostly T2 centers as expected OSG carries large proportion Scaling issues encountered, but subsequently solved C. Charlot, 2nd LCG-France Colloquium, mars 2007

CMS Tier-2: data transfers CSA06 GRIF 24 Tier-2 sites Large facteur de commisioning du détecteur (x2.5) Incertitude ~facteur 2 sur raw data size /CSA06-106-os-Jets0-0/RECO/CMSSW_1_0_6-RECO Fake rate e- from jets ~3TB C. Charlot, 2nd LCG-France Colloquium, mars 2007

T1 Re-Reconstruction Demonstrated re-reconstruction at T1 centers with access to offline DB using new constants 4 teams set up to run 100K events at each T1 Re-reconstruction demonstrated on >100K events at 6 T1s 100% efficiency at CCIN2P3 (although small sample) Initially ran into a problem with a couple reconstruction modules when first attempted Had to drop pixel tracks and vertices out of ~100 modules due to technical issue with getting products stored in Event For the Tracker and ECAL calibration exercises, new constants inserted into DB were used for re-reconstruction, and dataset published/accessed Full reprocessing workflow! C. Charlot, 2nd LCG-France Colloquium, mars 2007

2007 MC production Stageout pbs 1_2_0 validation production completed 03/07 prod for HLT (1_3_0) 04-05/07 prod for physics (1_4_0) Stageout pbs C. Charlot, 2nd LCG-France Colloquium, mars 2007

CMS Computing timeline 2007 Computing support for 2008 papers preparation Large scale MC production: march  may 2007 Analysis  autumn 2007 Core software final procedure and algos  autumn 2007 Computing, Analysis and Software Challenge, CSA07 Computing model at ~50% scale Data production, distribution at Tier-1s Skimming, re-reco at Tier-1, distribution to Tier-2 Analysis at Tier-2s together with MC production  july 2007 Data taking: end 2007 C. Charlot, 2nd LCG-France Colloquium, mars 2007

Conclusions CMS se prépare pour le data taking Les activités au niveau du Tier-1 CC-IN2P3 vont se rencentrer sur ses missions premières CSA07 est l’objectif no 1 du premier semestre Également participation à la production MC L’emphase porte maintenant sur les Tier-2 Montée en puissance de GRIF Production MC Analyse locale Tier-2 au CC-IN2P3 Besoins importants pour l’analyse en Q2-Q3 2007 C. Charlot, 2nd LCG-France Colloquium, mars 2007