CMS Software and Computing FNAL Internal Review of USCMS Software and Computing David Stickland Princeton University CMS Software and Computing Deputy Project Manager
David Stickland, Princeton University CMS Software and Computing - Overview Outline: l Status of Software / Computing Milestones l CMS Software Architecture development plans l CMS Computing plans l CMS Reorganization of Software and Computing Project l Key issues in 2001
David Stickland, Princeton University Strategic Software Choices Modular Architecture (flexible and safe) l Object-Oriented Framework l Strongly-typed interface Uniform and coherent software solutions l One main programming language l One main persistent object manager l One main operating system Adopt standards l Unix, C++, ODMG, OpenGL... Use widely spread, well supported products (with healthy future) l Linux, C++, Objectivity, Qt... Mitigate risks l Proper planning with milestones l Track technology evolution; investigate and prototype alternatives l Verify and validate migration paths; have a fall-back solution ready
David Stickland, Princeton University CARF CMS Analysis & Reconstruction Framework ODBMS Geant3/4 CLHEP Paw Replacement C++ standard library Extension toolkit Reconstruction Algorithms Data Monitoring Event Filter Physics Analysis Calibration Objects Event Objects Configuration Objects Generic Application Framework Physics modules Utility Toolkit Specific Framework CMS adapters and extensions
David Stickland, Princeton University Milestones: Software CMS software development strategy: First, transition to C++, then functionality, then performance Oct 2000
David Stickland, Princeton University Prototypes Mass-production Installation & Commissioning Maintenance & Operation Detector Functional Prototype Fully Functional Production System ComputingHardware Software & Integration Software Life Cycle Design Prototype Test & Integrate Deploy Cyclic Releases 2025
David Stickland, Princeton University Software Development Phases 2: Functional Prototype More complex functionality Integrated into projects Preparation for Trigger and DAQ TDRs Reality Check: ~1% Data Challenge 5: Production System Online / Trigger Systems: 75 100Hz Offline Systems: few Bytes / year 10 9 events / year to look for a handful of (correct!) Higgs Highly distributed collaboration and resources Long lifetime 1: Proof of Concept: End of 1998 Basic functionality Very loosely integrated 3: Fully Functional System Complete Functionality Integration across projects Reality Check: ~5% Data Challenge SW/computing TDR Preparation for Physics TDR 4: Pre-Production System Reality Check: ~20% Data Challenge 2025 Logarithmic
David Stickland, Princeton University Significant Requirements from CMS TDR’s or: “it’s not just an evolution to 2005 software” Major Core Software Milestones = TDR Trigger Dec 2000 DAQ Dec 2001 Software & Computing Dec 2002 Physics Dec
David Stickland, Princeton University Milestones: Data Raw data: 1 MB / event, 100 Hz 1 PB / year + reconstructed data, physics objects, calibration data, simulated data,... Now moving towards distributed production and analysis
David Stickland, Princeton University Milestones: Hardware TDR and MoU
David Stickland, Princeton University Common access for Physicists everywhere Maximize total funding resources while meeting the total computing need Proximity of datasets to appropriate resource l Tier-n Model Efficient use of network bandwidth l Local > regional > national > international Utilizing all intellectual resources l CERN, national labs, universities, remote sites l Scientists, students Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by region Systems’ complexity Partitioning of facility tasks, to manage and focus resources Why Worldwide Computing? Regional Center Concept: Advantages
David Stickland, Princeton University Milestone Review Regional Centres l Identify initial candidates06/2000 l Turn on functional centres12/2002 l Fully operational centres06/2005 Central (CERN) systems l Functional prototype12/2002 l Turn on initial systems12/2003 l Fully operational systems06/2005 Need to define intermediate working milestones
David Stickland, Princeton University CMS Regional Centre Prototypes 2003 Candidates Tier1 Tier2 Finland - Helsinki FranceCCIN2P3/Lyon ? India - Mumbai Italy INFN INFN, at least 3 sites Pakistan - Islamabad Russia MoscowDubna UKRAL? US FNALCaltech, U.C.-San Diego,Florida, Iowa, Maryland; Minnesota, Wisconsin and others
David Stickland, Princeton University The Grid Services Concept Standard services that l Provide uniform, high-level access to a wide range of resources (including networks) l Address interdomain issues: security, policy l Permit application-level management and monitoring of end-to-end performance l Perform resource discovery l Manage authorization and prioritization Broadly deployed (like Internet Protocols)
David Stickland, Princeton University Why CMS (HEP) in the GRID? GRID Middleware provides a route towards effective use of distributed resources and complexity management GRID design matches the MONARC hierarchical Model We (HEP) have some Grid-like tools (hand-made). The scale of CMS Computing requires a more professional approach to live for decades. CMS already participates in relevant GRID initiatives, e.g. l The Particle Physics Data Grid (PPDG) [US] Distributed Data Services and Data Grid System Prototypes l Grid Physics Network (GriPhyN ) [US] Production-Scale Data Grids l DATAGRID [EU] Middleware development and Real Applications Test
David Stickland, Princeton University Grid Data Management Prototype (GDMP) Distributed Job Execution and Data Handling: Goals l Transparency l Performance l Security l Fault Tolerance l Automation Submit job Replicate data Replicate data Site A Site B Site C r Jobs are executed locally or remotely r Data is always written locally r Data is replicated to remote sites Job writes data locally GDMP V1.0: Caltech, EU DataGrid, PPDG, GriPhyn; Tests by CALTECH, CERN, FNAL and INFN, for CMS “HLT” Production fall 2000
David Stickland, Princeton University Computing Progress CMS is progressing towards a coherent distributed system, to support production and analysis We need to study the problems and prototyping the solutions for distributed analysis by hundreds of users in many countries Production, via prototypes, will lead to decisions about the architecture on the basis of measured performances and possibilities
David Stickland, Princeton University CMS Software and Computing Evolve the organization to build a complete and consistent Physics Software Recognize cross-project nature of key deliverables l Core Software and Computing CSW&C n More or less what US calls SW&C “Project” l Physics Reconstruction & SelectionPRS n Consolidate Physics Software work between the detector groups targeted at CMS deliverables (HLT design, test-beams, calibrations, Physics TDR.. l Trigger and Data AcquisitionTRIDAS n Online Event Filter Farm
David Stickland, Princeton University Cross-Project Working Groups Core Software and Computing Physics Reconstruction & Selection TRIDAS Reconstruction Project Simulation Project Calibration etc.. Joint Technical Board
David Stickland, Princeton University Key issues for 2001 Choice of baseline for database Ramp-up of Grid R&D and use in production activities Perform HLT data challenge ( ~ 500CPUs at CERN ~500 offsite, 20 TB) Continue work on test beams, and validate simulation Validate OSCAR/Geant4 for detector performance studies Overall assessment, formalization and consolidation of SW systems and processes Work towards Computing MoU in the collaboration