T1 visit to IN2P3 Computing

Slides:



Advertisements
Similar presentations
Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
Advertisements

Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
WLCG/8 July 2010/MCSawley WAN area transfers and networking: a predictive model for CMS WLCG Workshop, July 7-9, 2010 Marie-Christine Sawley, ETH Zurich.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
Nick Brook Current status Future Collaboration Plans Future UK plans.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
LCG CCRC’08 Status WLCG Management Board November 27 th 2007
May Donatella Lucchesi 1 CDF Status of Computing Donatella Lucchesi INFN and University of Padova.
Claudio Grandi INFN Bologna CERN - WLCG Workshop 13 November 2008 CMS - Plan for shutdown and data-taking preparation Claudio Grandi Outline: Global Runs.
US-CMS T2 Centers US-CMS Tier 2 Report Patricia McBride Fermilab GDB Meeting August 31, 2007 Triumf - Vancouver.
The CMS Computing System: getting ready for Data Analysis Matthias Kasemann CERN/DESY.
David Stickland CMS Core Software and Computing
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
1 September 2007WLCG Workshop, Victoria, Canada 1 WLCG Collaboration Workshop Victoria, Canada Site Readiness Panel Discussion Saturday 1 September 2007.
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
1 M. Paganoni, 17/1/08 Modello di calcolo di CMS M. Paganoni Workshop Storage T2 - 17/01/08.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Maria Girone, CERN CMS Status Report Maria Girone, CERN David Lange, LLNL.
The CMS Experiment at LHC
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Computing Operations Roadmap
Ian Bird WLCG Workshop San Francisco, 8th October 2016
The Beijing Tier 2: status and plans
LHC Computing Grid Status of Resources Financial Plan and Sue Foffano
Data Challenge with the Grid in ATLAS
INFN-GRID Workshop Bari, October, 26, 2004
for the Offline and Computing groups
Update on Plan for KISTI-GSDC
Status and Prospects of The LHC Experiments Computing
CMS transferts massif Artem Trunov.
LHCb Software & Computing Status
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
Olof Bärring LCG-LHCC Review, 22nd September 2008
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Project Status Report Computing Resource Review Board Ian Bird
ALICE Computing Upgrade Predrag Buncic
Computing Overview Topics here: CSA lessons (briefly) PADA
US ATLAS Physics & Computing
Preparations for the CMS-HI Computing Workshop in Bologna
N. De Filippis - LLR-Ecole Polytechnique
R. Graciani for LHCb Mumbay, Feb 2006
LHC Data Analysis using a worldwide computing grid
ATLAS DC2 & Continuous production
The ATLAS Computing Model
LHCb thinking on Regional Centres and Related activities (GRIDs)
Status and plans for bookkeeping system and production tools
Development of LHCb Computing Model F Harris
The LHCb Computing Data Challenge DC06
Presentation transcript:

T1 visit to IN2P3 Computing Topics here: Resources User Support (-> questionnaire) CSA lessons (briefly) PADA CCRC’08 = CSA08 Matthias Kasemann November 2007 November 28, 2007 T1-visit to IN2P3

CMS Computing Organization Chart Resource Board Computing Offline Matthias Kasemann Patricia McBride Chair - Dave Newbold Common Coordination Integration/CSA07: Ian Fisk/Neil Geddes Resource Coordination: Lucas Taylor 2nd convener identified for Facilities Operation: - awaiting CMS approval Computing Facilities / Infrastructure Operations Commissioning Stefano Belforte / Frank Wuerthwein Daniele Bonacorsi/ Peter Kreuzer User Support Data Operations Kati Lassila-Perini/ Akram Khan Christoph Paus / Interim - Lothar Bauerdick 2nd convener identified for Facilities Operation: - awaiting CMS approval Data operations: -looking for a person At FNAL November 28, 2007 T1-visit to IN2P3

CMS Computing Resource requirements Resource planning (2008 and beyond): Resources (cpu, disk, tape) need to be adjusted where possible to match the CMS requirements. Adjustments seem feasible, but details have to be optimized and negotiated. Currently: 30% deficit in tape resources for 2008 Resource estimate recently updated based on DataModel and Software Performance Promised resources for 2008 Center CPU kSI2k Disk TB CPU/disk Tape TB CPU /tape Expected # streams Associated T2 FZK 1200 650 1.8 900 1.3 5 German T2, Poland, Switzerland IN2P3 1490 780 1.9 1180 6 French T2, China, Belgium PIC 760 350 2.2 835 0.9 2 Spain T2, Portugal CNAF 1925 875 2.3 735 2.6 7 INFN T2 ASGC 1530 675 585 Taipei, India, Pakistan, Korea? RAL 1330 620 2.1 1280 1.0 UK T2, Estonia, Finland FNAL 4256 1986 4700 20 US T2, Brazil CERN Russia, Ukraine Total: 10610 5245 2.0 9506 1.1 50 November 28, 2007 T1-visit to IN2P3

CMS computing resources (2008 pledged) CMS needs all the T1 and the T2 resources for successful data analysis. Total T2 is: 18500 MSI2k, 4700 TB T2(F): about 4% CPU and disk T2(Be): about 6% CPU and disk T2(China): about 3(4)% CPU (disk) For the CMS planning we work with resource numbers (pledges) from the WLCG MoU. CMS recently increased the estimates for storage required (disk and tape). CMS is short of resources at T1 centers, especially for storage. Risk to impact performance significantly November 28, 2007 T1-visit to IN2P3

Tier-1 Resources Outlook The shortfall is largely in disk, which of course has the largest unit cost. However, disk is also the most flexible resource, in that we can tune the caching ratio against tape for different data types. T1 might like to comment on how they actually plan to achieve such control over caching. Required by CMS Pledged for CMS Main shortfall will be in disk storage (even in 2008) Have to search for flexibility in the model here Substantial increase required for 2010/2011 (with high LHC luminosity running) November 28, 2007 T1-visit to IN2P3

Tier-2 Resources Outlook We have far more T2 resource than originally foreseen, and increasing all the time; we do not foresee a ‘resource problem’ at T2 level. This of course puts large demands on T1 storage. The ‘requirement’ here is articificial, in that the T2 replicas factor is tuned to take account of what we actually have. If we chose to have less replicas, the ‘requirement’ would go down, and we could use T2 CPU for other purposes (reco?). However, four copies of the dataset (on average) does not seem an unreasonable assumption. Required by CMS Pledged for CMS Big increase required for high luminosity analysis starting 2010 2010 numbers not final yet Some T2 pledges known to be missing or altering November 28, 2007 T1-visit to IN2P3

MoAs for Computing and Offline - status Detailed project plan for MoAs Completed July 2007 Breakdown of all tasks to Level 4 Resource-loaded with 165 named people and their FTE fractions Lucas Taylor CMS-FB 19 Sep 07 November 28, 2007 T1-visit to IN2P3

User Support (2) Kati developed a short questionnaire and asks each T1 center to fill it, see: http://kati.web.cern.ch/kati/t1_quest.html All T1 centres are asked to fill this out to get an overview of User support situation at remote centers November 28, 2007 T1-visit to IN2P3

CSA07 Goals Test and validate the components the CMS Computing Model in a simultaneous exercise the Tier-0, Tier-1 and Tier-2 workflows Test the CMS software: particularly the reconstruction and HLT packages Test the CMS production systems at 50% scale of expected 2008 operation workflow management, data management, facilities, transfers Test the computing facilities and mass storage systems. Demonstrate that data will transfer between production and analysis sites in a timely way. Test the Alignment and Calibration stream (AlcaReco) Produce, deliver and store AODs + skims for analysis by physics groups November 28, 2007 T1-visit to IN2P3

Prompt Reconstruction CSA07 Workflows Prompt Reconstruction CASTOR HLT TIER-0 CAF Calibration Express-Stream Analysis 300MB/s Re-Reco Skims TIER-1 TIER-1 TIER-1 TIER-1 20-200MB/s ~10MB/s Simulation Analysis TIER-2 TIER-2 TIER-2 TIER-2 November 28, 2007 T1-visit to IN2P3

Preparing for the CSA07 (Jul-Sep) CMSSW - software releases organized by offline team Releases are tested by data operations teams Distributed and installed to the sites (This is not an easy process.) Steps for preparing data for physics (pre-challenge workflows): Generation and Simulation with Geant4 (at the Tier-2 centers) Digitization Digi2RAW - format change to look like data input to HLT HLT processing Data are split into 7 Primary Data Sets (PDS) based on the HLT information This was a big addition in CSA07. The data samples more accurately reflect what will come from the detector, but are harder to produce. November 28, 2007 T1-visit to IN2P3

Preparing for the CSA07 (Jul-Sep) Planned Workflows for the Challenge: Reconstruction - HLT + RECO output (~1 MB) AOD production - (200 kB) Skims for physics analysis at the Tier-1 centers Re-Reco (and redoing AOD production/skims) at the Tier-1 centers Analysis at the Tier-2 centers Lessons from CSA07 preparations: It turned out that there was insufficient time for testing the components since some of the components were coming at the latest moment.. CSA08: We have to devote more time for testing the components November 28, 2007 T1-visit to IN2P3

November 28, 2007 T1-visit to IN2P3

MC Production summary Substantial (… more resources used) November 28, 2007 T1-visit to IN2P3

CSA07 Issues and Lessons There are clearly areas that are going to need development Need to work on the CMSSW application Reduce the number of workflows (in 170, 180 and 200) Reduce the memory footprint to increase the number of events we can run and increase the available resources Goal: CMSSW applications should stay in 1 GB memory Several area should be improved Access and manipulation of IOV constant (over Xmas) HLT data model (on going) New huge increase in memory seen in 170 to be address immediately (mainly in DPG’s code) November 28, 2007 T1-visit to IN2P3

CSA07 Issues and Lessons Increase the speed of IO on mass storage test using new ROOT version Improve our testing and validation procedures for the applications and workflows. Reduce event size RAW/DIGI and RECO size AOD size Mini-workshop with physics and DPG Groups on February 5-6-7 (CERN) Two task forces has been created in order to prepare this workshop RECO Task Force chair (Shahram Rahatlou) Analysis Task Force chair (Roberto Tenchini) FW support for handing of RAW, RECO versus FEVT (This is foreseen for version 2_0_0) November 28, 2007 T1-visit to IN2P3

CSA07 Issues and Lessons Need to work on the CMS Tools Augment the Production tools to be able to better handle continuous operations Roll back to known good points. Modify workflows more simply Increase the speed of Bookkeeping System under specific load conditions Optimize the data transfers in PhEDEx for data availability Improve the analysis Tool (CRAB) Planning a Workshop in January 21-25 2008 (Lyon?). Will be announced soon Goals: Review Data and Workload management components Improvement integration (communication) between operation and development teams Will include also Tier0 components Define work plan for 2008 November 28, 2007 T1-visit to IN2P3

CSA07 Issues and Lessons Facility Lessons: We learned a lot about operating Castor and dCache under load Need to improve the rate of file opens Need to decrease the rate of errors. Need to improve the scalability of some components Need to work on the stability of services at CERN and Tier-1 centers Need to work on the transfer quality when the farms are under heavy processing load General lessons: Much work is needed to achieve simultaneous, sustainable and stable operations November 28, 2007 T1-visit to IN2P3

PADA: processing and data access taskforce Draft Mandate: Integrate developments and services to bring our centers and services to production quality for processing and analysis The Processing And Data Access Task Force is an initiative in the Integration Program Designed to transition services developed in Offline to Operations Elements of integration and testing for Production, Analysis, and Data Management tools Designed to ensure services and sites used in operations are production quality Elements in the commissioning program for links and sites Verify that items identified in the CSA07 are solved Development work is primarily in offline, but verification in Integration Plan is: To build on the expertise of the distributed MC production teams, extend scope We need the expertise in proximity of the centers to help us here For 2008 we want to make this a recognized service contribution in the MoA scheme, Initial time frame: 1 year until we have seen the first data We need to define steps, milestones, recruit people, hope for MC-OPS, DDT, .... November 28, 2007 T1-visit to IN2P3

Final Check before Data taking starts: CCRC’08 = CSA08CMS A combined challenge by all Experiments must be used to demonstrate the readiness of the WLCG Computing infrastructure before start of data taking at a scale comparable to the data taking in 2008. CMS fully supports the plan, to execute this CCRC in two phases: a set of functional tests in February 2008 the final challenge in May 2008 at 100% scale, starting with the readout of the experiment We must do this challenge as WLCG collaboration: Centers and Experiments together Combined planning has started: Mailing list created: wlcg-ccrc08@cern.ch Agenda pages: Phone conference every Monday afternoon (difficult time for APR…) Monthly session in pre-GDB meeting November 28, 2007 T1-visit to IN2P3

CCRC’08 Schedule Phase 1 - February 2008: Possible scenario: blocks of functional tests, Try to reach 2008 scale for tests at… Phase 2: - May 2008: Full workflows at all centers executed simultaneously by all 4 LHC experiments Use data from cosmics data run, add artificial load to reach 100% Duration of challenge: 1 week setup, 4 weeks challenge November 28, 2007 T1-visit to IN2P3

1) Detector Installation, Commissioning & Operation 2) Preparation of Software, Computing & Physics Analysis Aug Sep Oct Nov Dec Jan Feb Mar Apr May S/w Release 1_6 (CSA07) V36 Schedule (Nov’07) CSA07 Cooldown of Magnet: Test S/w Release 1_7 (CCR_0T, HLT Validation) Tracker Insertion 2007 Physics Analyses First Results Out CMS Cosmic Run CCR_0T Several short periods Dec-Mar) Last Heavy Element Lowered S/w Release 1_8 (Lessons of ‘07) Test Magnet at low current Functional Tests CSA08 (CCRC) Beam-pipe Closed and Baked-out S/w Release 2_0 (CCR_4T, Production startup MC samples) 1 EE endcap Installed, Pixels installed MC Production for Startup Cosmic Run CCR_4T CSA08 (CCRC) Combined Computing Readiness Challenge Master Contingency 2nd ECAL Endcap Ready for Installation end Jun’08 November 28, 2007 T1-visit to IN2P3

CCRC’08 Phase 1: February 2008 Possible scenario: blocks of functional tests, Try to reach 2008 scale for tests at… CERN: data recording, processing, CAF, data export Tier-1’s: data handling (import, mass-storage, export), processing, analysis Tier-2’s: Data Analysis, Monte Carlo, data import and export Proposed Goals for CMS: Verify (not simultaneously) solutions to CSA07 issues and lessons and attempt to reach ‘08 scale on individual tests Computing&Software challenge, no physics delivery attached to CCRC’08/1 tests Cosmics run and MC production have priority if possible Tests should be as independent from each other as possible Tests can be done in parallel Individual test successful if sustained for n days If full ‘08 scale is not possible (hardware) scale down to hardware limit November 28, 2007 T1-visit to IN2P3

CCRC’08/1: proposed scope CERN: data recording, processing, CAF, data export data recording: 250Hz: from P5, HLT, streams, SM to T0, repacking, CASTOR Processing: 250Hz at T0: CASTOR, CMSSW.x.x, 20out-streams, CASTOR CAF: to be defined CERN data export: 600MB/s aggregate to all T1’s MSS Tier-1’s: data handling (import, mass-storage, export), processing, analysis Data import: T0-T1 to MSS at full ‘08 scale to tape Data handling: IO for processing and skimming at full ‘08 scale from tape Processing: re-reconstruction (incl. output streams) at full ‘08 scale from tape Skimming: develop an executable able to run with >20 skims, run it at T1’s Data export: T1 to all T1 at full ‘08 scale from tape/disk Data export: T1 to > 5 T2 at full ‘08 scale from tape/disk Jobs: handle 50k jobs/day Data import: >5 T2 to T1 at 20MB/s to tape Tier-2’s: Data Analysis, Monte Carlo, data import and export Links commissioned: have 40 T2’s with at least 1 commissioned up- and downlink have 30 T2’s with at least 3(or 5) commissioned up- and downlink Data transfer: import data from 3 T1’s at 20 MB/s Data transfer: export data to 2 T1’s at 10 MB/s Data analysis: handle 150k jobs/day (… hard to reach) Reminder: IN2P3 T1 ~ 15% of CMS T1’s November 28, 2007 T1-visit to IN2P3

Summary (1/2) In CSA07 a lot was learned and a lot was achieved.. We hit most of metrics - but separately and intermittently Several steps accomplished simultaneously Many workflow steps hit metric routinely Now work on accomplishing all steps simultaneously, and providing stability in a sustainable way. Global connectivity between T1-T2 sites is still an important issue. The DDT task force has been successful in increasing the # of working links. This effort must continue and work must be done to automate the process of testing/commissioning the links. We still have to increase the number of people involved in facilities, commissioning and operations. Some recent actions: New (2nd) L2 appointed to lead facility operations (based at CERN) New Production And Data Access (PADA) Task Force starting - will include some of the people from DDT task force and MC production teams. November 28, 2007 T1-visit to IN2P3

Summary (2/2) ~ 200M Events processed and re-processed Calibration, MC production, Reconstruction, skimming, merging all tested successfully. Still need time to test the analysis model. CSA07 Goals for providing data for physics will be accomplished … albeit delayed due to schedule slips Processing continues to complete the data samples for physics and detector studies. We are keeping the challenge infrastructure alive and trying to keep it stable, going forward... Continue to support global detector commissioning and physics studies. We have to prepare for the ‘Combined Computing Readiness Challenge’, CCRC = CSA08 Without testing the software and infrastructure we are not prepared… We depend on the support of France, the success of IN2P3 and the French + Belgium + China T2 for the success of CMS computing! November 28, 2007 T1-visit to IN2P3