Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heavy Ion Physics Program of CMS Proposal for Offline Computing

Similar presentations


Presentation on theme: "Heavy Ion Physics Program of CMS Proposal for Offline Computing"— Presentation transcript:

1 Heavy Ion Physics Program of CMS Proposal for Offline Computing
Charles F. Maguire Vanderbilt University for the US CMS HI Collaboration April 25, 2009 CMS-HI Meeting at Santa Fe

2 CMS-HI Meeting at Santa Fe
Outline Impact of CMS-HI Research Plan CMS-HI Compute Model Guiding principles Actual implementation Computing Requirements (within a known budget constraint) Wide area networking Compute power and local area networking Disk and tape storage Capital Cost and Personnel Summary Offline Organization, Operations, and Oversight April 25, 2009 CMS-HI Meeting at Santa Fe

3 Impact of the CMS-HI Research Plan
CMS-HI Research Program Goals Focus is on the unique advantages of the CMS detector for detecting high pT jets, Z0 bosons, quarkonia, D and B mesons Early studies at low luminosity will concentrate on establishing the global properties of heavy ion central collisions at the LHC Later high luminosity runs with sophisticated high level triggering will allow for in-depth rare probe investigations of strongly interacting matter Projected luminosity growth Only one HI run is known for certain, at the end of 2010 After 2010 run LHC may shut down for an extended period Conditioning work needed to achieve design beam energy This proposal assumes a simple 3-year luminosity growth model Computer hardware resource acquisition is tailored to the model April 25, 2009 CMS-HI Meeting at Santa Fe

4 Impact of the CMS-HI Research Plan
Projected HI Luminosity and Data Acquistion for the LHC CMS-HI Run Ave. L (cm-2s-1) Uptime (s) Events taken Raw data (TB) 2010 (FY11) 2.5 x 1025 105 1.0 x 107 22 2011 (FY12) 2.5 x 1026 5 x 105 2.5 x 107 110 2012 (FY13) 5.0 x 1026 106 5.0 x 107 225 Caveats 1) First year running may achieve greater luminosity and uptime resulting in factors of 2 or 3 more events taken than assumed here 2) Second year running may not occur in 2011, but may shift to 2012 3) Third year running is the planned “nominal” year case when the CMS DAQ writes at the design 225 MB/s for the planned 106 s of HI running April 25, 2009 CMS-HI Meeting at Santa Fe

5 CMS-HI Meeting at Santa Fe
CMS-HI Compute Model CMS-HI Compute Model Guiding Principles CMS-HI computing will follow, as much as feasible, the existing design and framework of CMS computing TDR (2005) CMS-HI community is much too small to embark on independent software development outside the mainstream of rest of CMS Size of CMS-HI community also mandates that we adapt the CMS multi-tiered computing grid to be optimum for our work CMS-HI Compute Model Implementation Raw data is transferred to and tape-archived at Vanderbilt site Reconstruction passes will also be done at Vanderbilt site Some reconstruction output will be copied to overseas sites (Moscow, Paris, Budapest, Seoul) as is practical Analysis passes will be done by all CMS-HI institutions using CMS’s remote job batch submission system (CRAB) Simulation production and support will be done at MIT site April 25, 2009 CMS-HI Meeting at Santa Fe

6 Computing Requirements: Wide Area Networking
Nominal Year Running Specifications for HI CMS DAQ writes at 225 MB/s for 106 s = 225 TB Calibration and fast reconstruction at Tier0 = 75 TB Total nominal year data transfer from Tier0 = 300 TB Note: DAQ may be able to write faster eventually Nominal Year Raw Data Transport Scenario Tier0 holds raw data briefly (days): calibrations, preliminary reco Data are written to Tier0 tape archive, not designed for re-reads Above mandates a continuous transfer of data to a remote site 300 TB x 8 bits/Byte / (30 days x 24 hours/day x 3600 sec/hour) = Gbps DC rate (no outages) Same rate calculation as for pp data except pp runs for ~5 month A safety margin must be provided for outages, e.g. 4 Gbps burst April 25, 2009 CMS-HI Meeting at Santa Fe

7 Computing Requirements: Wide Area Networking
Nominal Year Raw Data Transport Scenario Tier0 criteria mandate a continuous transfer of data to a remote site 300 TB x 8 bits/Byte / (30 days x 24 hours/day x 3600 sec/hour) = Gbps DC rate (no outages) A safety margin must be provided for outages, e.g. 4 Gbps burst Plan is to transfer this raw data to the Vanderbilt tape archive site Raw Data Transport Network Proposal for CMS-HI CMS-HEP and ATLAS will use USLHCNet to FNAL and BNL FNAL estimates that CMS-HI traffic will be ~2% of all USLHCNet Propose to use USLHCNet with modest pro-rated cost to DOE-NP Have explored Internet2 alternatives to the use of USLHCNet CMS management strongly discourages use of a new raw data path A new raw data path would have to be solely supported by CMS-HI It is not obvious that there would be any cost savings to DOE-NP April 25, 2009 CMS-HI Meeting at Santa Fe

8 Computing Requirements: Annual Compute Power Budget for All Tasks
Annual Compute Tasks for Available Compute Power Reconstruction passes (CMS standard is 2) Analysis passes (scaled to take 50% of reconstruction time) Simulation production and analysis (takes 50% of reco time) Simulation event samples at 5% of real event totals (RHIC experience) Simulation event generation and processing takes 10x that of real event Constraint: Accomplish All Processing in One Year Offline processing must keep up with the annual data streams Would like to process all the data within one year, on average Essential to have analysis guidance for future running Computer Power 12 Month Allocation A single, complete reconstruction pass will take 4 months This proposal will allow for 1.5 reconstruction passes = 6 months Analysis passes of these reconstruction passes in 3 months Simulation production and analysis in 3 months April 25, 2009 CMS-HI Meeting at Santa Fe

9 Computing Requirements: Compute Time for Single Reconstruction Pass
Nominal year DAQ HI Bandwidth Partition and Associated CPU Times Channel BW (MB/s) Size/Evt (MB) Rate (Hz) CPU time/evt (s) Annual total events Annual CPU time (s) Min Bias 33.75 2.5 13.5 100 1.35 x 107 1.35 x 109 Jet-100 24.75 5.8 4.27 450 4.27 x 106 1.92 x 109 Jet-75 27.00 5.7 4.74 4.74 x 106 2.13 x 109 Jet-50 27.50 5.4 5.00 5.00 x 106 2.25 x 109 J/y 67.50 4.9 13.78 1000 1.38 x 107 1.38 x 1010 Y 2.25 0.46 4.59 x 105 4.59 x 108 e-g-10 40.50 6.98 6.98 x 106 3.14 x 109 Ultra-per 1.0 2.25 x 106 1.25 x 109 Sum 225 51 x 106 Evts/yr 2.53 x 1010 seconds/yr CPU times from the jet-g simulation study, and scaled to a 1600 SpecInt2000 processor April 25, 2009 CMS-HI Meeting at Santa Fe

10 Computing Requirements: Net Compute Power Requirement
Determination of the Total CPU Number Ncpu A complete reconstruction pass takes 25 x 109 seconds (scaled to a 1600 SpecInt2000 processor) A single reconstruction pass must be completed in 4 months Assume that there is an effective duty cycle of 80% The US CMS-HI Compute Center is Set for 3,000 CPUs Real data reconstruction and analyses will consume 75% of the annual integrated compute power, i.e. ~2,250 CPUs Simulation production and analyses will consume 25% of the annual integrated compute power, i.e. ~750 CPUs A satellite simulation center is proposed at the MIT Tier2 CMS faciltiy, taking advantage of that local support and opportunistic cycles April 25, 2009 CMS-HI Meeting at Santa Fe

11 Computing Requirements: Annual Allocations of CMS-HI Compute Power
Distribution of Compute Power For 3,000 CPUs Processing Task Duration (months) Total CPU Use (CPU-months) Partial reconstruction (50% of events) 2 6,000 Partial analysis pass (50% of events) 1 3,000 Complete reconstruction pass 4 12,000 Complete analysis pass Simulation generation and analysis 3 9,000 Total 12 36,000 Simulation generation and analysis can be done asynchronously with real data analysis, i.e. 750 CPUs x 12 months = 9,000 CPU-months Comparison of CMS-HI Compute Center with CMS-HEP Compute Centers 1) CMS-HI with SpecInt2000 = 4.8 MSpecInt2000 2) CMS-HEP with 7 Tier1 and 36 Tier2 had a quota of 49.2 MSpecInt2000 (from TDR-2005) 3) 10% relative size scales with 10% of running and raw data output April 25, 2009 CMS-HI Meeting at Santa Fe

12 Computing Requirements: Local Disk Storage Considerations
Disk Storage Categories (there is never enough disk space) Raw data buffer for transfer from Tier0, staging to tape archive Reconstruction (RECO) output Analysis Object Data (AOD) output Physics Analysis Group (PAG) output Simulation production (MC) Each category class assigned a relative scale factor Disk Storage Acquisition Timelines Model dependent according to luminosity growth Need to minimize tape re-reads, most "popular" files kept on disk Model dependent according to pace of publication RHIC experience: deadlines by which older data are removed April 25, 2009 CMS-HI Meeting at Santa Fe

13 Computing Requirements: Local Disk Storage Annual Allocations
Annual Disk Storage Allocated to Real Data Reconstruction and Analysis FY Events (Millions) Raw Data Buffer (TB) RECO (TB) AOD (TB) PAG (TB) Total (TB) 2010 20 2011 10 50 30 6 3 89 2012 25 75 21 11 157 2013 51 153 46 23 271 2014 55 44 302 Assumptions for the Allocation Growth of Real Data Needs 1) In 2010 there is no real data but a 20 TB buffer is allocated for transfer testing All other disk storage needs are met out of simulation allocations on next page 2) Event growth in as per model on page 3 3) Relative sizes of RECO, AOD, and PAG categories according to present experience April 25, 2009 CMS-HI Meeting at Santa Fe

14 Computing Requirements: Local Disk Storage Annual Allocations
Annual Disk Storage Allocated to Simulation Production and Analysis FY Events (Millions) Event Size (TB) Raw (TB) RECO (TB) AOD PAG (TB) Total 2010 0.5 15 3.0 1.0 0.2 0.1 19 2011 30 5.9 2.0 0.4 38 2012 1.25 37 7.4 2.5 0.7 48 2013 2.55 75 5.1 1.3 97 2014 1.6 0.8 98 Assumptions for the Allocation Growth of Simulation Needs 1) GEANT (MC) event size is taken to be five times a real event size 2) MC event number scales as 5% of real event number in 2012 and beyond 3) Relative sizes of RECO, AOD, and PAG categories according to present experience 4) Total real data + MC storage after 5 years = 400 TB April 25, 2009 CMS-HI Meeting at Santa Fe

15 Computing Requirements: Local Disk Storage Allocation Overview
Storage Allocation Comparisons and Risk Analysis CMS-HI proposes to have 400 TB after 5 years when steady state data production is 300 TB per year CMS-HEP TDR-2005 proposed 15.6 PB combined Tier1 + Tier2 Ratio is for disk storage as compared with 0.10 for CPUs PHENIX had 800 TB disk storage at RCF in 2007 when 600 TB of raw data was written, i.e. same ratio as proposed by CMS-HI Very painful for allocating disk space among users and production PHENIX has access to significant disk space at remote sites (CCJ, CCF, Vanderbilt, ...) for real data production and simulations CMS-HI is not counting on signifcant disk space at other US sites Serious Concern of Insufficient Disk Storage Allocation May need to change balance of CPUs/Disk/Tape in outer years Possibility of significant disk storage overseas (network rates?) May have access to other disk resources (REDDNet from NSF) April 25, 2009 CMS-HI Meeting at Santa Fe

16 Computing Requirements: Tape Storage Allocations
Tape Storage Categories and Annual Allocations Tape storage largely follows annual data production statistics First priority is archiving securely the raw data for analysis Access to tape will be limited to archiving and production teams Guidance from experiences at RHIC's HPSS Unfettered, unsynchronized read requests to tape drives leads to strangulation FY Real Evts. (Millions) Raw (TB) RECO (TB) AOD PAG (TB) Real (TB) MC (TB) Real+MC (TB) D (TB) 2010 0.0 39 2011 10 60 30 6 3 99 34 172 133 2012 25 150 75 15 7.5 248 48 467 295 2013 51 300 115 23 11.5 449 97 1013 546 2014 1559 April 25, 2009 CMS-HI Meeting at Santa Fe

17 Capital Cost and Personnel Summary
Real Data Processing and Tape Archiving Center at Vanderbilt Category FY10 FY11 FY12 FY13 FY14 Total CPU cores 440 456 480 360 288 2024 Total CPUs 896 1376 1736 Disk (TB) 20 70 110 30 300 Total Disk 90 160 270 Tape (TB) 40 130 540 560 1570 Total Tape 170 470 1010 CPU cost $151,800 $125,400 $120,000 $81,000 $64,800 $543,000 Disk cost $10,000 $31,500 $28,000 $38,500 $9,000 $117,000 Tape cost $66,500 $23,085 $58,449 $60,720 $61,650 $270,404 Total hard. $228,300 $179,985 $206,449 $180,220 $135,450 $930,404 Staff (DOE cost) $119,534 $155,394 $161,609 $201,688 $244,715 $882,940 Total cost $347,834 $335,379 $368,059 $381,908 $380,165 $1,813,344 April 25, 2009 CMS-HI Meeting at Santa Fe

18 Capital Cost and Personnel Summary
Simulation Production and Analysis Center at MIT Category FY10 FY11 FY12 FY13 FY14 Total CPU cores 152 160 120 96 680 Total CPUs 304 464 584 Disk (TB) 20 10 50 100 Total Disk 40 CPU cost $52,440 $41,800 $40,000 $27,000 $21,600 $182,000 Disk cost $10,000 $9,000 $4,000 $17,500 $0 $40,500 Total hard. $62,440 $50,800 $44,000 $44,500 $222,500 Staff (DOE cost) $37,040 $37,280 $39,400 $40,600 $197,120 Total cost $99,480 $89,080 $83,400 $85,100 $63,400 $420,460 MIT group experience providing simulation production for all CMS-HI institutions April 25, 2009 CMS-HI Meeting at Santa Fe

19 Capital Cost and Personnel Summary
Cumulative Cost Comparisons Category FY10 FY11 FY12 FY13 FY14 Total Real data center at Vanderbilt $347,834 $335,379 $368,059 $381,908 $380,165 $1,813,344 Simulation center at MIT $99,480 $89,080 $83,400 $85,100 $63,400 $420,460 Vanderbilt + MIT Total $447,314 $424,459 $451,459 $467,008 $443,565 $2,249,560 Single center at Vanderbilt $440,157 $417,257 $440,380 $460,023 $436,725 $2,198,452 The small cost difference between having a single center at Vanderbilt and two separate centers at Vanderbilt and MIT is due to the slight difference in FTE charges at the two institutions. April 25, 2009 CMS-HI Meeting at Santa Fe

20 Organization, Operations, and Oversight
Impact of CMS-HI Research Plan CMS-HI Compute Model Integration into existing CMS compute model Service role to the total CMS-HI community Computing Requirements Wide area networking Compute power and local area networking Disk and tape storage Capital Cost and Personnel Summary Offline Organization, Operations, and Oversight April 25, 2009 CMS-HI Meeting at Santa Fe


Download ppt "Heavy Ion Physics Program of CMS Proposal for Offline Computing"

Similar presentations


Ads by Google