Heavy Ion Physics Program of CMS Proposal for Offline Computing

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.
LHCb computing in Russia Ivan Korolko (ITEP Moscow) Russia-CERN JWGC, October 2005.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
PHENIX and the data grid >400 collaborators Active on 3 continents + Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
CMS Computing Model summary UKI Monthly Operations Meeting Olivier van der Aa.
David Stickland CMS Core Software and Computing
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
Belle II Computing Fabrizio Bianchi INFN and University of Torino Meeting Belle2 Italia 17/12/2014.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
May 23, 2007ALICE DOE Review - Computing1 ALICE-USA Computing Overview of Hard and Soft Computing Resources Needed to Achieve Research Goals 1.Calibration.
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
Hall D Computing Facilities Ian Bird 16 March 2001.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Emanuele Leonardi PADME General Meeting - LNF January 2017
The CMS-HI Computing Plan Vanderbilt University
Software Architecture
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Xiaomei Zhang CMS IHEP Group Meeting December
CMS-HI Offline Computing
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
evoluzione modello per Run3 LHC
LHC experiments Requirements and Concepts ALICE
Data Challenge with the Grid in ATLAS
The CMS-HI Computing Plan Vanderbilt University
Vanderbilt Tier 2 Project
Bernd Panzer-Steindel, CERN/IT
Status and Prospects of The LHC Experiments Computing
Offline data taking and processing
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Artem Trunov and EKP team EPK – Uni Karlsruhe
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
ILD Ichinoseki Meeting
Preparations for the CMS-HI Computing Workshop in Bologna
Near Real Time Reconstruction of PHENIX Run7 Minimum Bias Data From RHIC Project Goals Reconstruct 10% of PHENIX min bias data from the RHIC Run7 (Spring.
Nuclear Physics Data Management Needs Bruce G. Gibbard
Preparations for Reconstruction of Run6 Level2 Filtered PRDFs at Vanderbilt’s ACCRE Farm Charles Maguire et al. March 14, 2006 Local Group Meeting.
Status of CMS-HI Compute Proposal for USDOE
Status of CMS-HI Compute Proposal for USDOE
Heavy Ion Physics Program of CMS Proposal for Offline Computing
CMS-HI Offline Computing
Preparations for Reconstruction of Run7 Min Bias PRDFs at Vanderbilt’s ACCRE Farm (more substantial update set for next week) Charles Maguire et al. March.
ATLAS DC2 & Continuous production
The ATLAS Computing Model
Development of LHCb Computing Model F Harris
Expanding the PHENIX Reconstruction Universe
Software Architecture
Presentation transcript:

Heavy Ion Physics Program of CMS Proposal for Offline Computing Charles F. Maguire Vanderbilt University for the US CMS HI Collaboration April 25, 2009 CMS-HI Meeting at Santa Fe

CMS-HI Meeting at Santa Fe Outline Impact of CMS-HI Research Plan CMS-HI Compute Model Guiding principles Actual implementation Computing Requirements (within a known budget constraint) Wide area networking Compute power and local area networking Disk and tape storage Capital Cost and Personnel Summary Offline Organization, Operations, and Oversight April 25, 2009 CMS-HI Meeting at Santa Fe

Impact of the CMS-HI Research Plan CMS-HI Research Program Goals Focus is on the unique advantages of the CMS detector for detecting high pT jets, Z0 bosons, quarkonia, D and B mesons Early studies at low luminosity will concentrate on establishing the global properties of heavy ion central collisions at the LHC Later high luminosity runs with sophisticated high level triggering will allow for in-depth rare probe investigations of strongly interacting matter Projected luminosity growth Only one HI run is known for certain, at the end of 2010 After 2010 run LHC may shut down for an extended period Conditioning work needed to achieve design beam energy This proposal assumes a simple 3-year luminosity growth model Computer hardware resource acquisition is tailored to the model April 25, 2009 CMS-HI Meeting at Santa Fe

Impact of the CMS-HI Research Plan Projected HI Luminosity and Data Acquistion for the LHC 2010-2012 CMS-HI Run Ave. L (cm-2s-1) Uptime (s) Events taken Raw data (TB) 2010 (FY11) 2.5 x 1025 105 1.0 x 107 22 2011 (FY12) 2.5 x 1026 5 x 105 2.5 x 107 110 2012 (FY13) 5.0 x 1026 106 5.0 x 107 225 Caveats 1) First year running may achieve greater luminosity and uptime resulting in factors of 2 or 3 more events taken than assumed here 2) Second year running may not occur in 2011, but may shift to 2012 3) Third year running is the planned “nominal” year case when the CMS DAQ writes at the design 225 MB/s for the planned 106 s of HI running April 25, 2009 CMS-HI Meeting at Santa Fe

CMS-HI Meeting at Santa Fe CMS-HI Compute Model CMS-HI Compute Model Guiding Principles CMS-HI computing will follow, as much as feasible, the existing design and framework of CMS computing TDR (2005) CMS-HI community is much too small to embark on independent software development outside the mainstream of rest of CMS Size of CMS-HI community also mandates that we adapt the CMS multi-tiered computing grid to be optimum for our work CMS-HI Compute Model Implementation Raw data is transferred to and tape-archived at Vanderbilt site Reconstruction passes will also be done at Vanderbilt site Some reconstruction output will be copied to overseas sites (Moscow, Paris, Budapest, Seoul) as is practical Analysis passes will be done by all CMS-HI institutions using CMS’s remote job batch submission system (CRAB) Simulation production and support will be done at MIT site April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Wide Area Networking Nominal Year Running Specifications for HI CMS DAQ writes at 225 MB/s for 106 s = 225 TB Calibration and fast reconstruction at Tier0 = 75 TB Total nominal year data transfer from Tier0 = 300 TB Note: DAQ may be able to write faster eventually Nominal Year Raw Data Transport Scenario Tier0 holds raw data briefly (days): calibrations, preliminary reco Data are written to Tier0 tape archive, not designed for re-reads Above mandates a continuous transfer of data to a remote site 300 TB x 8 bits/Byte / (30 days x 24 hours/day x 3600 sec/hour) = 0.93 Gbps DC rate (no outages) Same rate calculation as for pp data except pp runs for ~5 month A safety margin must be provided for outages, e.g. 4 Gbps burst April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Wide Area Networking Nominal Year Raw Data Transport Scenario Tier0 criteria mandate a continuous transfer of data to a remote site 300 TB x 8 bits/Byte / (30 days x 24 hours/day x 3600 sec/hour) = 0.93 Gbps DC rate (no outages) A safety margin must be provided for outages, e.g. 4 Gbps burst Plan is to transfer this raw data to the Vanderbilt tape archive site Raw Data Transport Network Proposal for CMS-HI CMS-HEP and ATLAS will use USLHCNet to FNAL and BNL FNAL estimates that CMS-HI traffic will be ~2% of all USLHCNet Propose to use USLHCNet with modest pro-rated cost to DOE-NP Have explored Internet2 alternatives to the use of USLHCNet CMS management strongly discourages use of a new raw data path A new raw data path would have to be solely supported by CMS-HI It is not obvious that there would be any cost savings to DOE-NP April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Annual Compute Power Budget for All Tasks Annual Compute Tasks for Available Compute Power Reconstruction passes (CMS standard is 2) Analysis passes (scaled to take 50% of reconstruction time) Simulation production and analysis (takes 50% of reco time) Simulation event samples at 5% of real event totals (RHIC experience) Simulation event generation and processing takes 10x that of real event Constraint: Accomplish All Processing in One Year Offline processing must keep up with the annual data streams Would like to process all the data within one year, on average Essential to have analysis guidance for future running Computer Power 12 Month Allocation A single, complete reconstruction pass will take 4 months This proposal will allow for 1.5 reconstruction passes = 6 months Analysis passes of these reconstruction passes in 3 months Simulation production and analysis in 3 months April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Compute Time for Single Reconstruction Pass Nominal year DAQ HI Bandwidth Partition and Associated CPU Times Channel BW (MB/s) Size/Evt (MB) Rate (Hz) CPU time/evt (s) Annual total events Annual CPU time (s) Min Bias 33.75 2.5 13.5 100 1.35 x 107 1.35 x 109 Jet-100 24.75 5.8 4.27 450 4.27 x 106 1.92 x 109 Jet-75 27.00 5.7 4.74 4.74 x 106 2.13 x 109 Jet-50 27.50 5.4 5.00 5.00 x 106 2.25 x 109 J/y 67.50 4.9 13.78 1000 1.38 x 107 1.38 x 1010 Y 2.25 0.46 4.59 x 105 4.59 x 108 e-g-10 40.50 6.98 6.98 x 106 3.14 x 109 Ultra-per 1.0 2.25 x 106 1.25 x 109 Sum 225 51 x 106 Evts/yr 2.53 x 1010 seconds/yr CPU times from the jet-g simulation study, and scaled to a 1600 SpecInt2000 processor April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Net Compute Power Requirement Determination of the Total CPU Number Ncpu A complete reconstruction pass takes 25 x 109 seconds (scaled to a 1600 SpecInt2000 processor) A single reconstruction pass must be completed in 4 months Assume that there is an effective duty cycle of 80% The US CMS-HI Compute Center is Set for 3,000 CPUs Real data reconstruction and analyses will consume 75% of the annual integrated compute power, i.e. ~2,250 CPUs Simulation production and analyses will consume 25% of the annual integrated compute power, i.e. ~750 CPUs A satellite simulation center is proposed at the MIT Tier2 CMS faciltiy, taking advantage of that local support and opportunistic cycles April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Annual Allocations of CMS-HI Compute Power Distribution of Compute Power For 3,000 CPUs Processing Task Duration (months) Total CPU Use (CPU-months) Partial reconstruction (50% of events) 2 6,000 Partial analysis pass (50% of events) 1 3,000 Complete reconstruction pass 4 12,000 Complete analysis pass Simulation generation and analysis 3 9,000 Total 12 36,000 Simulation generation and analysis can be done asynchronously with real data analysis, i.e. 750 CPUs x 12 months = 9,000 CPU-months Comparison of CMS-HI Compute Center with CMS-HEP Compute Centers 1) CMS-HI with 3000 CPUs @ 1600 SpecInt2000 = 4.8 MSpecInt2000 2) CMS-HEP with 7 Tier1 and 36 Tier2 had a quota of 49.2 MSpecInt2000 (from TDR-2005) 3) 10% relative size scales with 10% of running and raw data output April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Local Disk Storage Considerations Disk Storage Categories (there is never enough disk space) Raw data buffer for transfer from Tier0, staging to tape archive Reconstruction (RECO) output Analysis Object Data (AOD) output Physics Analysis Group (PAG) output Simulation production (MC) Each category class assigned a relative scale factor Disk Storage Acquisition Timelines Model dependent according to luminosity growth Need to minimize tape re-reads, most "popular" files kept on disk Model dependent according to pace of publication RHIC experience: deadlines by which older data are removed April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Local Disk Storage Annual Allocations Annual Disk Storage Allocated to Real Data Reconstruction and Analysis FY Events (Millions) Raw Data Buffer (TB) RECO (TB) AOD (TB) PAG (TB) Total (TB) 2010 20 2011 10 50 30 6 3 89 2012 25 75 21 11 157 2013 51 153 46 23 271 2014 55 44 302 Assumptions for the Allocation Growth of Real Data Needs 1) In 2010 there is no real data but a 20 TB buffer is allocated for transfer testing All other disk storage needs are met out of simulation allocations on next page 2) Event growth in 2011-2013 as per model on page 3 3) Relative sizes of RECO, AOD, and PAG categories according to present experience April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Local Disk Storage Annual Allocations Annual Disk Storage Allocated to Simulation Production and Analysis FY Events (Millions) Event Size (TB) Raw (TB) RECO (TB) AOD PAG (TB) Total 2010 0.5 15 3.0 1.0 0.2 0.1 19 2011 30 5.9 2.0 0.4 38 2012 1.25 37 7.4 2.5 0.7 48 2013 2.55 75 5.1 1.3 97 2014 1.6 0.8 98 Assumptions for the Allocation Growth of Simulation Needs 1) GEANT (MC) event size is taken to be five times a real event size 2) MC event number scales as 5% of real event number in 2012 and beyond 3) Relative sizes of RECO, AOD, and PAG categories according to present experience 4) Total real data + MC storage after 5 years = 400 TB April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Local Disk Storage Allocation Overview Storage Allocation Comparisons and Risk Analysis CMS-HI proposes to have 400 TB after 5 years when steady state data production is 300 TB per year CMS-HEP TDR-2005 proposed 15.6 PB combined Tier1 + Tier2 Ratio is 0.026 for disk storage as compared with 0.10 for CPUs PHENIX had 800 TB disk storage at RCF in 2007 when 600 TB of raw data was written, i.e. same ratio as proposed by CMS-HI Very painful for allocating disk space among users and production PHENIX has access to significant disk space at remote sites (CCJ, CCF, Vanderbilt, ...) for real data production and simulations CMS-HI is not counting on signifcant disk space at other US sites Serious Concern of Insufficient Disk Storage Allocation May need to change balance of CPUs/Disk/Tape in outer years Possibility of significant disk storage overseas (network rates?) May have access to other disk resources (REDDNet from NSF) April 25, 2009 CMS-HI Meeting at Santa Fe

Computing Requirements: Tape Storage Allocations Tape Storage Categories and Annual Allocations Tape storage largely follows annual data production statistics First priority is archiving securely the raw data for analysis Access to tape will be limited to archiving and production teams Guidance from experiences at RHIC's HPSS Unfettered, unsynchronized read requests to tape drives leads to strangulation FY Real Evts. (Millions) Raw (TB) RECO (TB) AOD PAG (TB) Real (TB) MC (TB) Real+MC (TB) D (TB) 2010 0.0 39 2011 10 60 30 6 3 99 34 172 133 2012 25 150 75 15 7.5 248 48 467 295 2013 51 300 115 23 11.5 449 97 1013 546 2014 1559 April 25, 2009 CMS-HI Meeting at Santa Fe

Capital Cost and Personnel Summary Real Data Processing and Tape Archiving Center at Vanderbilt Category FY10 FY11 FY12 FY13 FY14 Total CPU cores 440 456 480 360 288 2024 Total CPUs 896 1376 1736 Disk (TB) 20 70 110 30 300 Total Disk 90 160 270 Tape (TB) 40 130 540 560 1570 Total Tape 170 470 1010 CPU cost $151,800 $125,400 $120,000 $81,000 $64,800 $543,000 Disk cost $10,000 $31,500 $28,000 $38,500 $9,000 $117,000 Tape cost $66,500 $23,085 $58,449 $60,720 $61,650 $270,404 Total hard. $228,300 $179,985 $206,449 $180,220 $135,450 $930,404 Staff (DOE cost) $119,534 $155,394 $161,609 $201,688 $244,715 $882,940 Total cost $347,834 $335,379 $368,059 $381,908 $380,165 $1,813,344 April 25, 2009 CMS-HI Meeting at Santa Fe

Capital Cost and Personnel Summary Simulation Production and Analysis Center at MIT Category FY10 FY11 FY12 FY13 FY14 Total CPU cores 152 160 120 96 680 Total CPUs 304 464 584 Disk (TB) 20 10 50 100 Total Disk 40 CPU cost $52,440 $41,800 $40,000 $27,000 $21,600 $182,000 Disk cost $10,000 $9,000 $4,000 $17,500 $0 $40,500 Total hard. $62,440 $50,800 $44,000 $44,500 $222,500 Staff (DOE cost) $37,040 $37,280 $39,400 $40,600 $197,120 Total cost $99,480 $89,080 $83,400 $85,100 $63,400 $420,460 MIT group experience providing simulation production for all CMS-HI institutions April 25, 2009 CMS-HI Meeting at Santa Fe

Capital Cost and Personnel Summary Cumulative Cost Comparisons Category FY10 FY11 FY12 FY13 FY14 Total Real data center at Vanderbilt $347,834 $335,379 $368,059 $381,908 $380,165 $1,813,344 Simulation center at MIT $99,480 $89,080 $83,400 $85,100 $63,400 $420,460 Vanderbilt + MIT Total $447,314 $424,459 $451,459 $467,008 $443,565 $2,249,560 Single center at Vanderbilt $440,157 $417,257 $440,380 $460,023 $436,725 $2,198,452 The small cost difference between having a single center at Vanderbilt and two separate centers at Vanderbilt and MIT is due to the slight difference in FTE charges at the two institutions. April 25, 2009 CMS-HI Meeting at Santa Fe

Organization, Operations, and Oversight Impact of CMS-HI Research Plan CMS-HI Compute Model Integration into existing CMS compute model Service role to the total CMS-HI community Computing Requirements Wide area networking Compute power and local area networking Disk and tape storage Capital Cost and Personnel Summary Offline Organization, Operations, and Oversight April 25, 2009 CMS-HI Meeting at Santa Fe