Presentation at the International Symposium on Grid Computing CMS Computing Model Presentation at the International Symposium on Grid Computing Taipei 27-29 April, 2005 David Stickland Princeton University
Not Presented Today… Data Challenge DC04 resulted in lots of changes to CMS Event Data Model and to the Computing Model Those results, reasons, redesign etc, will not be presented today. CMS produced about 90M events in the last year or so using LCG2, GRID3 and local computing resources Even quite complex computing production (such as digitization with pile-up) is being run on LCG now These events are being served to CMS physicists now for analysis. The data is being analyzed where it is located We have a prototype system CRAB that (many) non-gridified physicists are using to run their analysis jobs on LCG at CERN, CNAF, FNAL, Lyon, PIC, RAL, Legnaro,…
GRIDS The CMS Computing Model relies on GRIDS. CMS will work in at least three GRID environments CMS/LCG-EGEE CMS/OSG CMS/NorduGrid (Why CMS/ ? Because in each case there will be CMS specific work to be done on top of or around the offered GRID environments, we expect/need the CMS communities associated with these GRIDS to do this work) The GRID environments do not offer a consistent set of services at the same strata of the middleware Since CMS needs to offer a uniform environment that means we must have the ability to work with both upper and lower level GRID middleware components; to match our applications and interfaces to specific GRID implementations
Architectural Elements Data Granularity LHC Triggers cut deeply into physics; Data always needs to be considered in its trigger context Split annual O(2PB) raw data into O(50) (40TB) trigger determined datasets Data Tiers RAW, Reconstructed, Analysis, Tag Keep Raw and Reconstructed close together (initially at least) Custodial RAW+Reco distributed over Tier-1s (one copy somewhere) Analysis Data, Full copy at each Tier-1, partial copies at many Tier-2 Computing Tiers CMS/Tier-0: Close connection to Online, highly organized, Tier-1: Data Custody, Selection, Data Distribution, (Analysis), Re-Reco Tier-2: Analysis Data under Physicist “control”, MC production
Data Flow CMS/CAF
Data Management CMS has chosen a baseline in which (initially) the bulk experiment-wide data is pre-located at sites (by policy and explicit decisions taken by CMS to manage the data) The DM system will focus on bringing up this basic functionality, however hooks will be provided for more dynamic movement of data in the future DM architecture is based on a set of loosely coupled components which,taken together, provide the necessary core functionality. Workload Management (WM) system need only steer jobs to the correct location
Components of the Workload and Data Management DataSet Bookkeeping System What data exists Data Location Service Where is it File Placement and Transfer Service PhEDEx On top of reliable transfer components/services Local File catalogs Site specific Data Access and Storage Systems SRM on SE Access POSIX-like CMS tools to allow CMS policy and space management Monitoring and job-tracking Mona-Lisa, GridIce and Boss Information for priority changing Workflow support Configuration control Job preparation CRAB (or son of CRAB)..
Overview of CMS Planning Computing and software planning overview Baseline system assumes thin grid middleware, experiments significant stake-holders with major input to operation and choices made Full computing system essentially being put into operation now New event data model / framework being deployed this year Schedule July - December 2005: LCG Service challenge 3 October 2005 - February 2006: Magnet test, cosmic challenge Summer 2006 - : LCG Service challenge 4 Summer/Autumn 2006: DC06 (= SC4?, or production phase of SC4?) February 2007: Deliver LHC ready computing and software July 2007: LHC start-up
Outstanding GRID Issues Top three issues and/or missing functionality Priorities and share allocation in both workload and data management (data management aims for solution summer 2005) Comprehensive monitoring to understand resource usage To guide re-prioritisation and policy changes Monitoring tools != understanding (and need to get them deployed!) Operational robustness and stability (The fact that the basic SE and CE functionalities are not in this list is evidence that these core components of Grids are(getting) under control and no longer the critical path issues)
Priorities and Share Allocation in WM and DM This is the least mature part of GRIDs to date Sensible system management requires many more control possibilities than are yet available Granularity of VO is too crude Tools like PhEDEx can apply policy, e.g.: Data Transfer from Tier-0 runs at highest priority Physics group coordinators can preempt other inter-site transfers MC from tier-2 may run as “when nothing else to transfer” Unless T2 buffer overflow dictates “as fast as you can” Etc., Fair-share between experiments Presumably a site responsibility Intra-experiment fair-share Site responsibility (Separate VOs) or Experiment| ? How to stop one group/user exhausting local allocation during first week of month when it is known that another group will need it in last two weeks? More granular priority and ACLs are mandatory
Networks We are pushing available networks to their limits in the Tier-1/Tier-2 connections Tier -0 needs ~2x10Gb/s links for CMS Each Tier-1 needs ~10Gb/s links Each Tier-2 needs 1Gb/s for its incoming traffic There will be extreme upward pressure on these numbers as the distributed computing becomes more and more useable and effective Service Challenges with LCG, CMS Tier-1 centers and CMS Data Management team/components planned for this year Ensure we are on path to achieve these performances.
(Draft) Integration Schedule 2005 June 2005 (integration testing: June - August) Data management: new dataset bookkeeping system, data location index; able to represent information in current production database (RefDB) in a way that workload management can use Data transfers: priority-based + dynamic routing, easy deployment Production: ability to push pile-up to worker nodes, file merging option Workload management, conditions: (to be defined) September 2005 (integration testing: September - November) Data management + workload management: functionally complete for bulk (collaboration wide) data processing; output file harvesting; support for new EDM Production: revised RefDB, task-queue based job pull system Conditions: delivery of conditions data to sites and use there New EDM + framework complete, ORCA migration begins December 2005 (integration testing: December - February) Data management + workload management: user / analysis output data, n-tuple handling; ability to serve production needs Production: can execute bulk production using DM/WM tools Conditions: delivery of new conditions back to detector facility ORCA based on new framework
Summary All Baseline Services need to be deployed, even if also evolving, this year Components of the Service Challenges need to continue as production services Experiments must work with at least three grid implementations, we had better get used to it… All Tier-1 centers are getting underway now All first round Tier-2 centers need to be operational by, about, SC4 We live in interesting times…. (Which apparently contrary to Robert F Kennedy’s assertion, is not a Chinese curse, but it seems appropriate anyway)