Download presentation
Presentation is loading. Please wait.
1
T1 visit to IN2P3 Computing
Topics here: Resources User Support (-> questionnaire) CSA lessons (briefly) PADA CCRC’08 = CSA08 Matthias Kasemann November 2007 November 28, 2007 T1-visit to IN2P3
2
CMS Computing Organization Chart
Resource Board Computing Offline Matthias Kasemann Patricia McBride Chair - Dave Newbold Common Coordination Integration/CSA07: Ian Fisk/Neil Geddes Resource Coordination: Lucas Taylor 2nd convener identified for Facilities Operation: - awaiting CMS approval Computing Facilities / Infrastructure Operations Commissioning Stefano Belforte / Frank Wuerthwein Daniele Bonacorsi/ Peter Kreuzer User Support Data Operations Kati Lassila-Perini/ Akram Khan Christoph Paus / Interim - Lothar Bauerdick 2nd convener identified for Facilities Operation: - awaiting CMS approval Data operations: -looking for a person At FNAL November 28, 2007 T1-visit to IN2P3
3
CMS Computing Resource requirements
Resource planning (2008 and beyond): Resources (cpu, disk, tape) need to be adjusted where possible to match the CMS requirements. Adjustments seem feasible, but details have to be optimized and negotiated. Currently: 30% deficit in tape resources for 2008 Resource estimate recently updated based on DataModel and Software Performance Promised resources for 2008 Center CPU kSI2k Disk TB CPU/disk Tape TB CPU /tape Expected # streams Associated T2 FZK 1200 650 1.8 900 1.3 5 German T2, Poland, Switzerland IN2P3 1490 780 1.9 1180 6 French T2, China, Belgium PIC 760 350 2.2 835 0.9 2 Spain T2, Portugal CNAF 1925 875 2.3 735 2.6 7 INFN T2 ASGC 1530 675 585 Taipei, India, Pakistan, Korea? RAL 1330 620 2.1 1280 1.0 UK T2, Estonia, Finland FNAL 4256 1986 4700 20 US T2, Brazil CERN Russia, Ukraine Total: 10610 5245 2.0 9506 1.1 50 November 28, 2007 T1-visit to IN2P3
4
CMS computing resources (2008 pledged)
CMS needs all the T1 and the T2 resources for successful data analysis. Total T2 is: MSI2k, 4700 TB T2(F): about 4% CPU and disk T2(Be): about 6% CPU and disk T2(China): about 3(4)% CPU (disk) For the CMS planning we work with resource numbers (pledges) from the WLCG MoU. CMS recently increased the estimates for storage required (disk and tape). CMS is short of resources at T1 centers, especially for storage. Risk to impact performance significantly November 28, 2007 T1-visit to IN2P3
5
Tier-1 Resources Outlook
The shortfall is largely in disk, which of course has the largest unit cost. However, disk is also the most flexible resource, in that we can tune the caching ratio against tape for different data types. T1 might like to comment on how they actually plan to achieve such control over caching. Required by CMS Pledged for CMS Main shortfall will be in disk storage (even in 2008) Have to search for flexibility in the model here Substantial increase required for 2010/2011 (with high LHC luminosity running) November 28, 2007 T1-visit to IN2P3
6
Tier-2 Resources Outlook
We have far more T2 resource than originally foreseen, and increasing all the time; we do not foresee a ‘resource problem’ at T2 level. This of course puts large demands on T1 storage. The ‘requirement’ here is articificial, in that the T2 replicas factor is tuned to take account of what we actually have. If we chose to have less replicas, the ‘requirement’ would go down, and we could use T2 CPU for other purposes (reco?). However, four copies of the dataset (on average) does not seem an unreasonable assumption. Required by CMS Pledged for CMS Big increase required for high luminosity analysis starting 2010 2010 numbers not final yet Some T2 pledges known to be missing or altering November 28, 2007 T1-visit to IN2P3
7
MoAs for Computing and Offline - status
Detailed project plan for MoAs Completed July 2007 Breakdown of all tasks to Level 4 Resource-loaded with 165 named people and their FTE fractions Lucas Taylor CMS-FB 19 Sep 07 November 28, 2007 T1-visit to IN2P3
8
User Support (2) Kati developed a short questionnaire and asks each T1 center to fill it, see: All T1 centres are asked to fill this out to get an overview of User support situation at remote centers November 28, 2007 T1-visit to IN2P3
9
CSA07 Goals Test and validate the components the CMS Computing Model in a simultaneous exercise the Tier-0, Tier-1 and Tier-2 workflows Test the CMS software: particularly the reconstruction and HLT packages Test the CMS production systems at 50% scale of expected 2008 operation workflow management, data management, facilities, transfers Test the computing facilities and mass storage systems. Demonstrate that data will transfer between production and analysis sites in a timely way. Test the Alignment and Calibration stream (AlcaReco) Produce, deliver and store AODs + skims for analysis by physics groups November 28, 2007 T1-visit to IN2P3
10
Prompt Reconstruction
CSA07 Workflows Prompt Reconstruction CASTOR HLT TIER-0 CAF Calibration Express-Stream Analysis 300MB/s Re-Reco Skims TIER-1 TIER-1 TIER-1 TIER-1 20-200MB/s ~10MB/s Simulation Analysis TIER-2 TIER-2 TIER-2 TIER-2 November 28, 2007 T1-visit to IN2P3
11
Preparing for the CSA07 (Jul-Sep)
CMSSW - software releases organized by offline team Releases are tested by data operations teams Distributed and installed to the sites (This is not an easy process.) Steps for preparing data for physics (pre-challenge workflows): Generation and Simulation with Geant4 (at the Tier-2 centers) Digitization Digi2RAW - format change to look like data input to HLT HLT processing Data are split into 7 Primary Data Sets (PDS) based on the HLT information This was a big addition in CSA07. The data samples more accurately reflect what will come from the detector, but are harder to produce. November 28, 2007 T1-visit to IN2P3
12
Preparing for the CSA07 (Jul-Sep)
Planned Workflows for the Challenge: Reconstruction - HLT + RECO output (~1 MB) AOD production - (200 kB) Skims for physics analysis at the Tier-1 centers Re-Reco (and redoing AOD production/skims) at the Tier-1 centers Analysis at the Tier-2 centers Lessons from CSA07 preparations: It turned out that there was insufficient time for testing the components since some of the components were coming at the latest moment.. CSA08: We have to devote more time for testing the components November 28, 2007 T1-visit to IN2P3
13
November 28, 2007 T1-visit to IN2P3
14
MC Production summary Substantial (… more resources used)
November 28, 2007 T1-visit to IN2P3
15
CSA07 Issues and Lessons There are clearly areas that are going to need development Need to work on the CMSSW application Reduce the number of workflows (in 170, 180 and 200) Reduce the memory footprint to increase the number of events we can run and increase the available resources Goal: CMSSW applications should stay in 1 GB memory Several area should be improved Access and manipulation of IOV constant (over Xmas) HLT data model (on going) New huge increase in memory seen in 170 to be address immediately (mainly in DPG’s code) November 28, 2007 T1-visit to IN2P3
16
CSA07 Issues and Lessons Increase the speed of IO on mass storage
test using new ROOT version Improve our testing and validation procedures for the applications and workflows. Reduce event size RAW/DIGI and RECO size AOD size Mini-workshop with physics and DPG Groups on February (CERN) Two task forces has been created in order to prepare this workshop RECO Task Force chair (Shahram Rahatlou) Analysis Task Force chair (Roberto Tenchini) FW support for handing of RAW, RECO versus FEVT (This is foreseen for version 2_0_0) November 28, 2007 T1-visit to IN2P3
17
CSA07 Issues and Lessons Need to work on the CMS Tools Augment the Production tools to be able to better handle continuous operations Roll back to known good points. Modify workflows more simply Increase the speed of Bookkeeping System under specific load conditions Optimize the data transfers in PhEDEx for data availability Improve the analysis Tool (CRAB) Planning a Workshop in January (Lyon?). Will be announced soon Goals: Review Data and Workload management components Improvement integration (communication) between operation and development teams Will include also Tier0 components Define work plan for 2008 November 28, 2007 T1-visit to IN2P3
18
CSA07 Issues and Lessons Facility Lessons: We learned a lot about operating Castor and dCache under load Need to improve the rate of file opens Need to decrease the rate of errors. Need to improve the scalability of some components Need to work on the stability of services at CERN and Tier-1 centers Need to work on the transfer quality when the farms are under heavy processing load General lessons: Much work is needed to achieve simultaneous, sustainable and stable operations November 28, 2007 T1-visit to IN2P3
19
PADA: processing and data access taskforce
Draft Mandate: Integrate developments and services to bring our centers and services to production quality for processing and analysis The Processing And Data Access Task Force is an initiative in the Integration Program Designed to transition services developed in Offline to Operations Elements of integration and testing for Production, Analysis, and Data Management tools Designed to ensure services and sites used in operations are production quality Elements in the commissioning program for links and sites Verify that items identified in the CSA07 are solved Development work is primarily in offline, but verification in Integration Plan is: To build on the expertise of the distributed MC production teams, extend scope We need the expertise in proximity of the centers to help us here For 2008 we want to make this a recognized service contribution in the MoA scheme, Initial time frame: 1 year until we have seen the first data We need to define steps, milestones, recruit people, hope for MC-OPS, DDT, .... November 28, 2007 T1-visit to IN2P3
20
Final Check before Data taking starts: CCRC’08 = CSA08CMS
A combined challenge by all Experiments must be used to demonstrate the readiness of the WLCG Computing infrastructure before start of data taking at a scale comparable to the data taking in 2008. CMS fully supports the plan, to execute this CCRC in two phases: a set of functional tests in February 2008 the final challenge in May 2008 at 100% scale, starting with the readout of the experiment We must do this challenge as WLCG collaboration: Centers and Experiments together Combined planning has started: Mailing list created: Agenda pages: Phone conference every Monday afternoon (difficult time for APR…) Monthly session in pre-GDB meeting November 28, 2007 T1-visit to IN2P3
21
CCRC’08 Schedule Phase 1 - February 2008:
Possible scenario: blocks of functional tests, Try to reach 2008 scale for tests at… Phase 2: - May 2008: Full workflows at all centers executed simultaneously by all 4 LHC experiments Use data from cosmics data run, add artificial load to reach 100% Duration of challenge: 1 week setup, 4 weeks challenge November 28, 2007 T1-visit to IN2P3
22
1) Detector Installation, Commissioning & Operation
2) Preparation of Software, Computing & Physics Analysis Aug Sep Oct Nov Dec Jan Feb Mar Apr May S/w Release 1_6 (CSA07) V36 Schedule (Nov’07) CSA07 Cooldown of Magnet: Test S/w Release 1_7 (CCR_0T, HLT Validation) Tracker Insertion 2007 Physics Analyses First Results Out CMS Cosmic Run CCR_0T Several short periods Dec-Mar) Last Heavy Element Lowered S/w Release 1_8 (Lessons of ‘07) Test Magnet at low current Functional Tests CSA08 (CCRC) Beam-pipe Closed and Baked-out S/w Release 2_0 (CCR_4T, Production startup MC samples) 1 EE endcap Installed, Pixels installed MC Production for Startup Cosmic Run CCR_4T CSA08 (CCRC) Combined Computing Readiness Challenge Master Contingency 2nd ECAL Endcap Ready for Installation end Jun’08 November 28, 2007 T1-visit to IN2P3
23
CCRC’08 Phase 1: February 2008 Possible scenario: blocks of functional tests, Try to reach 2008 scale for tests at… CERN: data recording, processing, CAF, data export Tier-1’s: data handling (import, mass-storage, export), processing, analysis Tier-2’s: Data Analysis, Monte Carlo, data import and export Proposed Goals for CMS: Verify (not simultaneously) solutions to CSA07 issues and lessons and attempt to reach ‘08 scale on individual tests Computing&Software challenge, no physics delivery attached to CCRC’08/1 tests Cosmics run and MC production have priority if possible Tests should be as independent from each other as possible Tests can be done in parallel Individual test successful if sustained for n days If full ‘08 scale is not possible (hardware) scale down to hardware limit November 28, 2007 T1-visit to IN2P3
24
CCRC’08/1: proposed scope
CERN: data recording, processing, CAF, data export data recording: 250Hz: from P5, HLT, streams, SM to T0, repacking, CASTOR Processing: 250Hz at T0: CASTOR, CMSSW.x.x, 20out-streams, CASTOR CAF: to be defined CERN data export: 600MB/s aggregate to all T1’s MSS Tier-1’s: data handling (import, mass-storage, export), processing, analysis Data import: T0-T1 to MSS at full ‘08 scale to tape Data handling: IO for processing and skimming at full ‘08 scale from tape Processing: re-reconstruction (incl. output streams) at full ‘08 scale from tape Skimming: develop an executable able to run with >20 skims, run it at T1’s Data export: T1 to all T1 at full ‘08 scale from tape/disk Data export: T1 to > 5 T2 at full ‘08 scale from tape/disk Jobs: handle 50k jobs/day Data import: >5 T2 to T1 at 20MB/s to tape Tier-2’s: Data Analysis, Monte Carlo, data import and export Links commissioned: have 40 T2’s with at least 1 commissioned up- and downlink have 30 T2’s with at least 3(or 5) commissioned up- and downlink Data transfer: import data from 3 T1’s at 20 MB/s Data transfer: export data to 2 T1’s at 10 MB/s Data analysis: handle 150k jobs/day (… hard to reach) Reminder: IN2P3 T1 ~ 15% of CMS T1’s November 28, 2007 T1-visit to IN2P3
25
Summary (1/2) In CSA07 a lot was learned and a lot was achieved..
We hit most of metrics - but separately and intermittently Several steps accomplished simultaneously Many workflow steps hit metric routinely Now work on accomplishing all steps simultaneously, and providing stability in a sustainable way. Global connectivity between T1-T2 sites is still an important issue. The DDT task force has been successful in increasing the # of working links. This effort must continue and work must be done to automate the process of testing/commissioning the links. We still have to increase the number of people involved in facilities, commissioning and operations. Some recent actions: New (2nd) L2 appointed to lead facility operations (based at CERN) New Production And Data Access (PADA) Task Force starting - will include some of the people from DDT task force and MC production teams. November 28, 2007 T1-visit to IN2P3
26
Summary (2/2) ~ 200M Events processed and re-processed Calibration, MC production, Reconstruction, skimming, merging all tested successfully. Still need time to test the analysis model. CSA07 Goals for providing data for physics will be accomplished … albeit delayed due to schedule slips Processing continues to complete the data samples for physics and detector studies. We are keeping the challenge infrastructure alive and trying to keep it stable, going forward... Continue to support global detector commissioning and physics studies. We have to prepare for the ‘Combined Computing Readiness Challenge’, CCRC = CSA08 Without testing the software and infrastructure we are not prepared… We depend on the support of France, the success of IN2P3 and the French + Belgium + China T2 for the success of CMS computing! November 28, 2007 T1-visit to IN2P3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.