Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMS computing: model, status and plans

Similar presentations


Presentation on theme: "CMS computing: model, status and plans"— Presentation transcript:

1 CMS computing: model, status and plans
C. Charlot / LLR Modèle CMS (top-down) Mise en oeuvre avec CSA06 (bottom-up) Moyen en France Inclure CDF et D0 summary de HCP06 Insister sur le data modl => échantillons actuels, data sream CSA06, phedex Dire qu’il faut transférer!! Au CC et au GRIF French Computing facilities pourrait être changer en: un exemplde de T1 => CCIN2P3 et un exemple de T2 = GRIF

2 The problem: data volume
RAW Detector data +L1, HLT results after online formatting Includes factors for poor understanding of detector, compression, .. 150Hz  ~4.5PB/year (two copies, one distributed) RECO Reconstructed objects with their associated hits 250kB/evt  ~1.5PB/year (including 3 reproc. versions) AOD The main analysis format: clusters, tracks, particle id, 50kB/evt  ~2PB/year - whole copy at each T1 (e.g. CC-IN2P3) TAG High level physics objects, run info (event directory), <10kB/evt FEVT Bundling of RAW+RECO for distribution, storage + MC data in estimated 1:1 ratio with experiment data Large facteur de commisioning du détecteur (x2.5) Incertitude ~facteur 2 sur raw data size C. Charlot, 2nd LCG-France Colloquium, mars 2007

3 Data processing We aim for prompt data reconstruction and analysis
Backlogs are the real killer Prioritisation will be important At the begining, computing system will not be 100% Cope with backlogs without delaying critical data Reserve possibility of ‘prompt calibration’ using low latency data Streaming Rule #1 of hadron collider physics: understand your trigger and selection is everything LHC analyses rarely mix inclusive triggers Classifying events early allows prioritisation Crudest example: express-line of ‘hot’ / calib events Propose o(50) ‘primary datasets’, immutable but Can have overlapp (10% assumed) Large facteur de commisioning du détecteur (x2.5) Incertitude ~facteur 2 sur raw data size C. Charlot, 2nd LCG-France Colloquium, mars 2007

4 Tier-0 Centre Prompt reco (24/200), FEVT storage, data distribution
Provided by IT division CPU: 4.6MSI2K, Disk: 0.4PB, MSS:4,9PB, WAN: >5Gbps C. Charlot, 2nd LCG-France Colloquium, mars 2007

5 Tier-1 Centres Data storage, heavy processing (Re-Reco, skim, AOD extraction), raw data access, Tier-2 support 7 Tier-1: ASCC, CCIN2P3, FNAL, GridKa, CNAF, PIC, RAL Nominally, CPU: 2.5MSI2K, Disk: 1.2PB, MSS: 2.8PB, WAN: >10Gbps C. Charlot, 2nd LCG-France Colloquium, mars 2007

6 Tier-1 Centres Analysis, MC production, specialised support tasks
Local + common use Nominally, CPU: 0.9MSI2K, Disk: 0.2PB, No MSS, WAN: >1Gbps C. Charlot, 2nd LCG-France Colloquium, mars 2007

7 CMS-CAF Latency critical services, analysis, Tier-1 functionality
CERN responsability, open to all collaborators Roughly: Tier-1 MSS + 2 Tier-2 C. Charlot, 2nd LCG-France Colloquium, mars 2007

8 Ressource evolution Revised LHC planning
2008* 2009* 2010 Rate (Hz) 300 200 150 sec/an 5.106 107 Revised LHC planning Keep integrated data volume ~same by increased trigger rate 2008 2009 2010 CPU (MSI2K) 35.3 54.4 104.2 Disk (PB) 11.4 19.2 30.8 Tape (PB) 18.2 32.9 41.6 We should be frightened by these numbers C. Charlot, 2nd LCG-France Colloquium, mars 2007

9 Tier-1/Tier-2 Associations
Associated Tier-1: hosting MC prod + reference for AOD serving Full AOD sample at Tier-1 (after T1T1 transfers for re-recoed AODs) Stream “allocation” ~ available disk storage at centre CCIN2P3-AF, GRIF C. Charlot, 2nd LCG-France Colloquium, mars 2007

10 Transfer Rates OPN in: FEVT (T0T1)+AOD (T1T1) OPN out: AOD (T1T1)
T2 in: FEVTsim+AODsim (T2T1) T2 out: FEVT+AOD (T1T2) MB/s These are raw rates: no catchup, no overhead T1T1; total AOD size, replication period (currently 14 days) T1T2: T2 capacity; refresh period at T2 (currently 30 days) Average rate, worst-case peak for T1 is sum of T2 transfer capacities Weighted by data fraction at T1 Voir que l’échange des AODs est aussi couteux que le transfert des data depuis le CERN et que le transfert des data vers les T2 pour le CC => Conséquence du nbre stream (storage) vs nbre T2 C. Charlot, 2nd LCG-France Colloquium, mars 2007

11 Tier-0 Status (CSA06) Prompt Reconstruction at 40 Hz
50 Hz for 2 weeks, then 100 Hz Peak rate: >300 Hz for >10 hours 207M events total Uptime: 80% of best 2 weeks Achieved 100% of 4 weeks Use of Frontier for DB access to prompt reconstruction conditions The CSA challenge was the first opportunity to test this on a large scale with developed reconstruction software Initial difficulties encountered during commissioning, but patches and reduced logging allowed full inclusion into CSA C. Charlot, 2nd LCG-France Colloquium, mars 2007

12 Data Processing & Placement
Reminder: in CMS model, each Tier-1 gets only a fraction of total RAW+RECO Chose Tier-1 destinations to meet analysis interest while not exceeding site storage capacity or bandwidth from Tier-0 Express Slides Kasman GDB 21/01/2007 C. Charlot, 2nd LCG-France Colloquium, mars 2007

13 Tier-0Tier-1 Transfers
Goal was to sustain 150 MB/s to T1s Twice the expected 40 Hz output rate Last week’s averages hit 350MB/s (daily) 650MB/s (hourly) i.e. exceeded 2008 levels for ~10 days (with some backlog observed) Monthly T1 Transfer plot signals start Target rate Min bias start Debut difficile Puis on s‘est maintenu a l‘objectif = 25MB/s (car supplément prévu de serveurs pas opérationel) T0 rate: Hz C. Charlot, 2nd LCG-France Colloquium, mars 2007

14 Tier-1 Transfer Performance
goals 6 of 7 Tier-1s exceed 90% availability for 30 days U.S. Tier-1 (FNAL) hit 2x goal 5 sites stored data to MSS (tape) C. Charlot, 2nd LCG-France Colloquium, mars 2007

15 Tier-1 Skim Jobs Tested workflow to reduce primary datasets to manageable sizes for analyses Computing provided centralized skim job workflow at T1 4 production teams Secondary datasets are registered into Dataset Bookkeeping Service and accessed like any other data Common skim job tools prepared based on “MC Truth” or Reconstruction (both types tested) Overwhelming response from CSA analysis demos About 25 filters producing ~37 (+ 21 jet) datasets ! Variety of output formats (FEVT, RECO, AOD, custom) Selected events range from <1% to 100% (for Jets split) Sizes range from <0.001 TB to 2.5 TB On a eu du mal a avoir les skims => intérêt d‘être partie prenante de la prod C. Charlot, 2nd LCG-France Colloquium, mars 2007

16 Jobs Execution on the Grid
>50K jobs/day submitted on all but one day in final week >30K/day robot jobs 90% job completion efficiency Robot jobs have same mechanics as user job submissions via CRAB 2 submission teams set up Mostly T2 centers as expected OSG carries large proportion Scaling issues encountered, but subsequently solved C. Charlot, 2nd LCG-France Colloquium, mars 2007

17 CMS Tier-2: data transfers
CSA06 GRIF 24 Tier-2 sites Large facteur de commisioning du détecteur (x2.5) Incertitude ~facteur 2 sur raw data size /CSA os-Jets0-0/RECO/CMSSW_1_0_6-RECO Fake rate e- from jets ~3TB C. Charlot, 2nd LCG-France Colloquium, mars 2007

18 T1 Re-Reconstruction Demonstrated re-reconstruction at T1 centers with access to offline DB using new constants 4 teams set up to run 100K events at each T1 Re-reconstruction demonstrated on >100K events at 6 T1s 100% efficiency at CCIN2P3 (although small sample) Initially ran into a problem with a couple reconstruction modules when first attempted Had to drop pixel tracks and vertices out of ~100 modules due to technical issue with getting products stored in Event For the Tracker and ECAL calibration exercises, new constants inserted into DB were used for re-reconstruction, and dataset published/accessed Full reprocessing workflow! C. Charlot, 2nd LCG-France Colloquium, mars 2007

19 2007 MC production Stageout pbs 1_2_0 validation production completed
03/07 prod for HLT (1_3_0) 04-05/07 prod for physics (1_4_0) Stageout pbs C. Charlot, 2nd LCG-France Colloquium, mars 2007

20 CMS Computing timeline 2007
Computing support for 2008 papers preparation Large scale MC production: march  may 2007 Analysis  autumn 2007 Core software final procedure and algos  autumn 2007 Computing, Analysis and Software Challenge, CSA07 Computing model at ~50% scale Data production, distribution at Tier-1s Skimming, re-reco at Tier-1, distribution to Tier-2 Analysis at Tier-2s together with MC production  july 2007 Data taking: end 2007 C. Charlot, 2nd LCG-France Colloquium, mars 2007

21 Conclusions CMS se prépare pour le data taking
Les activités au niveau du Tier-1 CC-IN2P3 vont se rencentrer sur ses missions premières CSA07 est l’objectif no 1 du premier semestre Également participation à la production MC L’emphase porte maintenant sur les Tier-2 Montée en puissance de GRIF Production MC Analyse locale Tier-2 au CC-IN2P3 Besoins importants pour l’analyse en Q2-Q3 2007 C. Charlot, 2nd LCG-France Colloquium, mars 2007


Download ppt "CMS computing: model, status and plans"

Similar presentations


Ads by Google