1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007.

Slides:

Advertisements

Similar presentations

1 14 Feb 2007 CMS Italia – Napoli A. Fanfani Univ. Bologna A. Fanfani University of Bologna MC Production System & DM catalogue.

Advertisements

Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)

Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.

Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.

Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.

CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.

Distributed Computing and Data Analysis for CMS in view of the LHC startup Peter Kreuzer RWTH-Aachen IIIa International Symposium on Grid Computing (ISGC)

Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.

1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.

1 High PT Status of data workflow and first analyses plans for October exercise K. Kousouris (Fermilab) K. Rabbertz (University of Karlsruhe)

Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.

DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006.

The CMS CERN Analysis Facility (CAF) Peter Kreuzer (RWTH Aachen) - Stephen Gowdy (CERN), Jose Afonso Sanches (UERJ Brazil) on behalf.

Claudio Grandi INFN Bologna CERN - WLCG Workshop 13 November 2008 CMS - Plan for shutdown and data-taking preparation Claudio Grandi Outline: Global Runs.

US-CMS T2 Centers US-CMS Tier 2 Report Patricia McBride Fermilab GDB Meeting August 31, 2007 Triumf - Vancouver.

Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.

The CMS Computing System: getting ready for Data Analysis Matthias Kasemann CERN/DESY.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.

INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.

David Stickland CMS Core Software and Computing

Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.

The Worldwide LHC Computing Grid Introduction & Housekeeping Collaboration Workshop, Jan 2007.

Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.

04/09/2007 Reconstruction of LHC events at CMS Tommaso Boccali - INFN Pisa Shahram Rahatlou - Roma University Lucia Silvestris - INFN Bari On behalf of.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.

8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.

Computing Model José M. Hernández CIEMAT, Madrid On behalf of the CMS Collaboration XV International Conference on Computing in High Energy and Nuclear.

Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.

Oct 16, 2009T.Kurca Grilles France1 CMS Data Distribution Tibor Kurča Institut de Physique Nucléaire de Lyon Journées “Grilles France” October 16, 2009.

ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?

DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.

CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.

1 M. Paganoni, 17/1/08 Modello di calcolo di CMS M. Paganoni Workshop Storage T2 - 17/01/08.

Data Challenge with the Grid in ATLAS

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

Readiness of ATLAS Computing - A personal view

ALICE Computing Upgrade Predrag Buncic

N. De Filippis - LLR-Ecole Polytechnique

Grid Computing in CMS: Remote Analysis & MC Production

ATLAS DC2 & Continuous production

Presentation transcript:

1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

2 M. Paganoni, HCP2007 Outline CMS Computing and Analysis Model CMS workflow components 25 % capacity test (CSA06 challenge) CMSSW validation LoadTest07, Site Availability Monitor and Grid gLite 3.1 The goals for 2007 Physics validation with high statistics Full detector readout during commissioning 50 % capacity test (CSA07 challenge) Analysis workflow

3 M. Paganoni, HCP2007 CMS schedule March April May June July Aug. Sep. Oct. Nov. 1) Detector Installation, Commissioning & Operation First Global Readout Test Barrel ECAL Inserted Tracker Inserted Trigger/DAQ Ready for System Commissioning CMS Ready to Close 2) Preparation of Software, Computing and Physics Analysis HLT exercise complete Pre-CSA07 Computing Software Analysis Challenge 2007 Physics Analyses completed CSA07 All CMS Systems Ready for Global Data Taking

4 M. Paganoni, HCP2007 The present status of CMS computing From development service/data challenges (both WLCG wide and experiment specific) of increasing scale and complexity to operations data distribution MC production physics analysis Primary needs: Smoothly running Tier1’s and Tier2’s, concurrent with other experiments Streamlined and automatic operations to ease the operation load Full monitoring to have early detection of Grid and site problems and reach stability Sustainable operations in terms of DM, WM, user support, site configuration and availability, continouous significant load

5 M. Paganoni, HCP2007 The CMS computing Model Tier-0:  Accepts data from DAQ  Prompt reconstruction  Data archive and distribution to T1s Tier-1’s:  Data and MC archiving  Re-processing  Skimming and other data- intensive analysis tasks  Data serving to T2s Tier-2’s:  User data Analysis  MC production  Calibration/alignment and detector studies ~30

6 M. Paganoni, HCP2007 CMS data formats and data flow RAW RECO AOD TAG CMS: ~1.5 MB/ev 2 copies: 1 at T0 and 1 over T1s 4.5 PB/yr CMS: ~250 kB/ev 1 copy spread over T1s 2.1 PB/yr CMS: ~50 kB/ev 1 copy to each T1, data serving to T2s 2.6 PB/yr CMS: ~1-10 kB/ev MC in 1:1 ratio with data

7 M. Paganoni, HCP2007 The MC production Production of 200M events (50M/month), for HLT and Physics Notes, started at T2s with new MC Production System less man-power consuming, better handling of Grid-sites unreliability, better use of resources, automatic retrials, better error report/handling More flexible and automated architecture ProdManager (PM) (+ the policy piece) –manage the assignment of requests to 1+ ProdAgents and tracks the global completion of the task ProdAgent (PA) –Job creation, submission and tracking, management of merges, failures, resubmissions Tier-0/1 Policy/scheduling controller PM Official MC Prod Develop. MC Prod PA Tier-1/2 PA PM

8 M. Paganoni, HCP2007 CMS Remote Analysis Builder CRAB is a user oriented tool for Grid submission and handling of physics analysis jobs data discovery (DBS/DLS) interactions with the Grid (also error handling, resubmission) output retrieval routinely used since 2004 on both EGEE and OSG (MTCC, PTDR, CSA06, tracker commissioning…) New client-server architecture improve scalability increase automatization

9 M. Paganoni, HCP2007 The data placement (PhEDEx) Data placement system for CMS (in production since > 3 years) large scale reliable dataset/fileblock replication multi-hop routing following a transfer topology (T0  T1’s  T2’s), data pre-stage from tape, data archiving to tape, monitoring, bookkeeping, priorities and policy, fail-over tactics PhEDEx made of a sets of independent agents, integrated with gLite File Transfer Service (FTS) It works with both EGEE and OSG Automatic subscription to DBS/DLS

10 M. Paganoni, HCP2007 Data processing workflow

11 M. Paganoni, HCP2007 Computing Simulation Analysis challenge 2006 The first test of the complete CMS workflow and dataflow 60M events exercise to ramp up at 25% of the 2008 capacity T0: prompt reconstruction 207M events reconstructed (RECO, AOD) applying alignment/calibration from offline DB 0.5 PB transferred to 7 T1s T1s: skimming (to get manageable datasets), re-reconstruction automatic data serving to T2s via injection to PhEDEx and registration in DBS/DLS T2s: access to the skimmed data, alignment/calibration jobs, Physics analysis jobs submission of analysis jobs to the Grid with CRAB by single users and groups insertion of new constants in offline DB

12 M. Paganoni, HCP2007 CSA06: T0 and T0 -> T1 Prompt T0 peak rate: >300 Hz for >10 hours uptime: 100% over 4 weeks best eff.: 96% (1400 CPUs) for ~12 h T0 -> T1 transfer average rate: 250 MB/s peak rate: 650 MB/s

13 M. Paganoni, HCP2007 CSA06: job submission >50K jobs/day in final week  30K/day robot jobs  production jobs managed by Production Agent  analysis jobs submitted via CRAB to the Grid 90% job efficiency a typical CSA06 day CRAB submissions

14 M. Paganoni, HCP2007 CSA06: calibration minimum bias  -symmetry single electrons W ➔ e ν Z mass reconstruction ECAL calibration with  -symmetry of energy deposits in minimum bias few hours of data calibration workflow

15 M. Paganoni, HCP2007 CSA06: alignment TIB modules - positions Closing the loop: analysis of re-reconstructed Z ➔ μμ data at T1/T2 site Determine new alignment: run HIP algo on multiple CPU’s over dedicated alignement skim from T0 1M events ~ 4 h on 20 CPU write new alignment into offline DB at T0 distribute offline DB to T1/T2’s for re-reconstruction Reconstructed Z mass

16 M. Paganoni, HCP2007 CMSSW validation: tracking Reproduce with CMSSW framework (1.2M lines of simulation, reconstruction and analysis software) the detector performance reported in PTDR vol. 1 Muons (CMSSW) CMSSW Pixel Seeding

17 M. Paganoni, HCP2007 CMSSW validation: electrons electron classification momentum at vertex electron/supercluster matching Already improving PTDR results in many areas (forward tracking, electron reconstruction …)

18 M. Paganoni, HCP2007 Site Availability Monitor Measure the site availability by testing:  Analysis submission  Production  Dbase caching  Data transfer With Site Availability Monitor (SAM) infrastructure, developed in collaboration with LCG and CERN/IT The goal is 90 % for T1s and 80 % for T2s Run tests at each EGEE sites every 2 hours now 5 CMS specific tests, more under development Feedback to site administrators and targeting individual components

19 M. Paganoni, HCP2007 WMS acceptance tests on gLite jobs submitted in 7 days on a single WMS instance ~16000 jobs/day well exceeding acceptance criteria ~0.3% of jobs with problems, well below the required threshold Recoverable using a proper command by the user The WMS dispatched jobs to computing elements with no noticeable delay

20 M. Paganoni, HCP2007 An infrastructure by CMS to help Tiers to exercise transfers Based on a new traffic load generator Coordination within the CMS Facilities/Infrastructure project Exercises T0  T1(tape), T1  T1, T1  T2 (‘regional’), T1  T2 (‘non-regional’) CMS LoadTest 2007 Important achievements routinely transferring all Tiers report it’s useful higher participation of Tiers less efforts, improved stability Automatic, streamlined operations T0-T1 only CMS LoadTest Cycles ~2.5 PB in 1.5 months CMS CSA06

21 M. Paganoni, HCP2007 Goals of Computing in 2007 Support of global data taking during detector commissioning commissioning of the end-to-end chain: P5 --> T0 --> T1s (tape) data transfers and access through the complete DM system 3-4 days every month starting in May Demonstrate Physics Analysis performance using final software with high statistics. Major MC production of up to 200M events started in March Analysis starts in June, finishes by September Ramping up of the distributed computing at scale (CSA07) 50 % challenge of 2008 system scale Adding new functionalities HLT farm (DAQ storage manager -> T0) T1 - T1 and non regional T1 - T2 Increase the user load for physics analysis

22 M. Paganoni, HCP2007 CSA07 workflow

23 M. Paganoni, HCP2007 CSA07 success metrics

24 M. Paganoni, HCP2007 CSA07 and Physics Analysis We have roughly T2s that have sufficient storage and CPU resources to support multiple datasets Skims in CSA06 were about ~500 GB, the largest of the raw samples was ~8 TB Improvements in site availability with SAM Improve non-regional Tier-1 - Tier-2 transfers Publish data hosting proposals for Tier-1 and Tier-2 sites User Analysis Distributed analysis through CRAB to Tier-2 centers Dynamic use of the Tier-2 storage Calibration workflow activities

25 M. Paganoni, HCP2007 Ingredients for analysis workflows Event Filters Pre-select Analysis output Event Producer Can create new contents to be included for Analysis output EDM output configurability Can keep or drop any collections Flexibility in the event content Flexibility with different steps of data reduction Input Output Analysis job Can be mixed in any combination

26 M. Paganoni, HCP2007 Analysis workflow at Tier0/CAF HLT Output RECO AOD RAW (optional) RECO AOD RAW (optional) one in-time processed stream, or HLT primary streams Early discovery express stream Physics Data Quality Monitoring Physics Data Quality Monitoring Standard Model ‘Candles’ Object ID efficiency Calibration with Control samples Dedicated stream(s) for fast calibration Initial fast calibration Actual output of HLT farm still to be detailed…

27 M. Paganoni, HCP2007 Conclusions Commissioning, integration remain major tasks in 2007 To balance the needs for physics, computing, detector will be a logistics challenge Transition to Operations has started. Scaling at production level, while keeping high efficiency is the critical point Continuous effort to be monitored in detail Keep as flexible as possible in the analysis model An increasing number of CMS people will be involved in the facilities, commissioning and operations to prepare for CMS physics analysis