Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis
Vincenzo Innocente, Beauty 2002 CMS on the grid 2 Computing Architecture: Challenges at LHC Bigger Experiment, higher rate, more data Larger and dispersed user community performing non trivial queries against a large event store Make best use of new IT technologies Increased demand of both flexibility and coherence ability to plug-in new algorithms ability to run the same algorithms in multiple environments guarantees of quality and reproducibility high-performance user-friendliness
Vincenzo Innocente, Beauty 2002 CMS on the grid 3 Challenges: Complexity Detector: ~2 orders of magnitude more channels than today Triggers must choose correctly only 1 event in every 400,000 Level 2&3 triggers are software-based (must be of highest quality) Computer resources will not be available in a single location
Vincenzo Innocente, Beauty 2002 CMS on the grid 4 Challenges: Geographical Spread 1700 Physicists 150 Institutes 32 Countries CERN state 55 % NMS 45 % Major challenges associated with: Communication and collaboration at a distance Distributed computing resources Remote software development and physics analysis
Vincenzo Innocente, Beauty 2002 CMS on the grid 5 b physics: a challenge for CMS computing A large distributed effort already today ~150 physicists in CMS Heavy-flavor group > 40 institutions involved Requires precise and specialized algorithms for vertex- reconstruction and particle identification Most of CMS triggered events include B particles High level software triggers select exclusive channels in events triggered in hardware using inclusive conditions Challanges: Allow remote physicists to access detailed event-information Migrate effectively reconstruction and selection algorithms to High Level Trigger
Vincenzo Innocente, Beauty 2002 CMS on the grid 6 CMS Experiment-Data Analysis Detector Control Online Monitoring Environmental data store Request part of event Simulation store Data Quality Calibrations Group Analysis User Analysis on demand Request part of event Request part of event Store rec-Obj and calibrations Quasi-online Reconstruction Request part of event Store rec-Obj Persistent Object Store Manager Database Management System Event Filter Object Formatter PhysicsPaper
Vincenzo Innocente, Beauty 2002 CMS on the grid 7 Analysis Environments Real Time Event Filtering and Monitoring Data driven pipeline High reliability Pre-emptive Simulation, Reconstruction and Event Classification Massive parallel batch-sequential process Excellent error recovery and rollback mechanisms Excellent scheduling and bookkeeping systems Interactive Statistical Analysis Rapid Application Development environment Excellent visualization and browsing tools Human “readable” navigation
Vincenzo Innocente, Beauty 2002 CMS on the grid 8 Analysis Model Hierarchy of Processes (Experiment, Analysis Groups, Individuals) Reconstruction Selection Analysis Re-processing 3 per year Iterative selection Once per month Different Physics cuts & MC comparison ~1 time per day Experiment- Wide Activity (10 9 events) ~20 Groups’ Activity (10 9 10 7 events) ~25 Individual per Group Activity (10 6 –10 8 events) New detector calibrations Or understanding Trigger based and Physics based refinements Algorithms applied to data to get results 3000 SI95sec/event 1 job year 3000 SI95sec/event 1 job year 3000 SI95sec/event 3 jobs per year 3000 SI95sec/event 3 jobs per year 25 SI95sec/event ~20 jobs per month 25 SI95sec/event ~20 jobs per month 10 SI95sec/event ~500 jobs per day 10 SI95sec/event ~500 jobs per day Monte Carlo 5000 SI95sec/event 1GHz ~ 50SI95
Vincenzo Innocente, Beauty 2002 CMS on the grid 9 Data handling baseline CMS computing in year 2007 data model typical objects 1KB-1MB 3 PB 3 PB of storage space 10,000 10,000 CPUs 31 sites: 1 tier0+5 tier1+25 tier2 all over the world I/O rates disk->CPU: 10,000 MB/s, average 1 MB/s/CPU RAW->ESD generation: ~0.2 MB/s I/O / CPU ESD->AOD generation: ~5 MB/s I/O / CPU AOD analysis into histos: ~0.2 MB/s I/O / CPU DPD generation from AOD and ESD: ~10 MB/s I/O / CPU Wide-area I/O capacity: order of 700 MByte/s aggregate over all payload intercontinental TCP/IP streams This implies a system with heavy reliance on access to site-local (cached) data Data-Grid
Vincenzo Innocente, Beauty 2002 CMS on the grid 10 Prototype Computing Installation (T0/T1)
Vincenzo Innocente, Beauty 2002 CMS on the grid 11 Three Computing Environments: Different Challenges Centralized quasi-online processing Keep-up with the rate Validate and distribute data efficiently Distributed organized processing Automatization Interactive chaotic analysis Efficient access to data and “Metadata” Management of “private” data Rapid Application Development
Vincenzo Innocente, Beauty 2002 CMS on the grid 12 Migration Today Nobel price becomes trigger for tomorrow (and background the day after) Boundaries between running environments are fuzzy “Physics Analysis” algorithms should migrate up to the online to make the trigger more selective Robust batch systems should be made available for physics analysis of large data sample The result of offline calibrations should be fed back to online to make the trigger more efficient
Vincenzo Innocente, Beauty 2002 CMS on the grid 13 The Final Challenge: A Coherent Analysis Environment Beyond the interactive analysis tool (User point of view) Data analysis & presentation: N-tuples, histograms, fitting, plotting, … A great range of other activities with fuzzy boundaries (Developer point of view) Batch Interactive from “pointy-clicky” to Emacs-like power tool to scripting Setting up configuration management tools, application frameworks and reconstruction packages Data store operations: Replicating entire data stores; Copying runs, events, event parts between stores; Not just copying but also doing something more complicated—filtering, reconstruction, analysis, … Browsing data stores down to object detail level 2D and 3D visualisation Moving code across final analysis, reconstruction and triggers Today this involves (too) many tools
Vincenzo Innocente, Beauty 2002 CMS on the grid 14 Requirements on data processing High efficiency Processing-sites hardware optimization Processing-sites software optimization job structure depends very much on hardware setup Data quality assurance Data validation Data history (job book-keeping) Automatize Input data discovery Crash recovery Resource monitoring Identify bottlenecks and fragile components
Vincenzo Innocente, Beauty 2002 CMS on the grid 15 Analysis part Physics data analysis will be done by 100s of users Analysis part is connected to same catalogs Maintain a global view of all data Big analysis jobs can use production job handling mechanisms Analysis services based on tags
Vincenzo Innocente, Beauty 2002 CMS on the grid 16 Lizard Qt plotter ANAPHE histogram Extended with pointers to CMS events Emacs used to edit CMS C++ plugin to create and fill histograms OpenInventor-based display of selected event Python shell with Lizard & CMS modules
Vincenzo Innocente, Beauty 2002 CMS on the grid 17 Varied components and data flows One Portal Tool plugin module Production system and data repositories ORCA analysis farm(s) (or distributed `farm’ using grid queues) RDBMS based data warehouse(s) PIAF/Proof/.. type analysis farm(s) Local disk User TAGs/AODs data flow Physics Query flow Tier 1/2 Tier 0/1/2 Tier 3/4/5 Production data flow TAG and AOD extraction/conversion/transport services Data extraction Web service(s) Local analysis tool: Lizard/ROOT/… Web browser Query Web service(s)
Vincenzo Innocente, Beauty 2002CMS on the grid18 CMS TODAY Home-Made Tools Data production and analysis exercises granularity (Data Product): Data-Set (simulated physics channel) Development and deployment of a distributed data processing system (Hardware & Software) Test and integration of Grid middleware prototypes R&D on distributed interactive analysis
Vincenzo Innocente, Beauty 2002 CMS on the grid 19 Current CMS Production Pythia Zebra files with HITS HEPEVT Ntuples CMSIM (GEANT3) ORCA/COBRA Digitization (merge signal and pile-up) Objectivity Database ORCA/COBRA ooHit Formatter Objectivity Database OSCAR/COBRA (GEANT4) ORCA User Analysis Ntuples or Root files Objectivity Database IGUANA Interactive Analysis
Vincenzo Innocente, Beauty 2002 CMS on the grid 20 CMS Production stream TaskApplication InputOutput Req. on resources non-standard 1GenerationPythiaNoneNtuple (static link) Geometry files Storage 2SimulationCMSIMNtupleFZ file 3 Hit Formatting ORCA H.F.FZ fileDB Shared libs Full CMS env. Storage 4DigitizationORCA Digi.DB 5 User analysis ORCA UserDB Ntuple or root Shared libs Full CMS env. Distributed input
Vincenzo Innocente, Beauty 2002 CMS on the grid 21 CMS distributed production tools RefDB Production flow Manager Web Portal, MySql backend IMPALA (Intelligent Monte Carlo Production Local Actuator) Job scheduler “to-do” discovery, job decomposition, script assembly from templates error recovery and re-submit BOSS (Batch Object Submission System) Job control, monitoring and tracking Envelop script, filter output-stream, log in MySql Backend DAR Distribution of software in binary form (shared-libs and bin)
Vincenzo Innocente, Beauty 2002 CMS on the grid 22 Current data processing “Produce events dataset mu_MB2mu_pt4” IMPALA decomposition (Job scripts) JOBS RC BOSS DB IMPALA monitoring (Job scripts) Production “RefDB” Production Interface Production Manager distributes tasks to Regional Centers Farm storage Request Summary file RC farm Regional Center Data location through Production DB
Vincenzo Innocente, Beauty 2002 CMS on the grid 23 Production 2002, Complexity Number of Regional Centers11 Number of Computing Centers21 Number of CPU’s~1000 Largest Local Center176 CPUs Number of Production Passes for each Dataset (including analysis group processing done by production) 6-8 Number of Files~11,000 Data Size (Not including fz files from Simulation)17TB File Transfer by GDMP and by perl Scripts over scp/bbcp 7TB toward T1 4TB toward T2
Vincenzo Innocente, Beauty 2002 CMS on the grid 24 Spring02: CPU Resources Wisconsin 18% INFN 18% IN2P3 10% RAL 6% UCSD 3% UFL 5% HIP 1% Caltech 4% Moscow 10% Bristol 3% FNAL 8% CERN 15% IC 6% : 700 active CPUs plus 400 CPUs to come
Vincenzo Innocente, Beauty 2002 CMS on the grid 25 INFN-Legnaro Tier-2 prototype FastEth 32 – GigaEth 1000 BT SWITCH N1 FastEth SWITCH 1 8 S1 S16 N24 N1 Nx – Computational Node Dual PIII – 1 GHz 512 MB 3x75 GB Eide disk + 1x20 GB for O.S. Sx – Disk Server Node Dual PIII – 1 GHz Dual PCI (33/32 – 66/64) 512 MB 3x75 GB Eide Raid 0-5 disks (exp up to 10) 1x20 GB disk O.S. FastEth SWITCH N1 2 N Nodes 70 CPUs 3500 SI95 8 TB up to 190 Nodes S Servers 1100 SI TB To WAN 34 Mbps Mbps 2002
Vincenzo Innocente, Beauty 2002 CMS on the grid 26 ORCA Db Structure 1 CMSIM Job MC Info Container #1 Calo/Muon Hits Tracker Hits ooHit dB's MC Info Run1 MC Info Run2 MC Info Run3.. Concatenated MC Info from N runs. One CMSIM Job, oo-formatted into multiple Db’s. For example: Multiple sets of ooHits concatenated into single Db file. For example: FZ File ~2 GB/file ~300kB/ev ~100kB/ev Few kB/ev ~200kB/ev Physical and logical Db structures diverge...
Vincenzo Innocente, Beauty 2002 CMS on the grid 27 Production center setup Most critical task is digitization 300 KB per pile-up event 200 pile-up events per signal event 60 MB 10 s to digitize 1 full event on a 1 GHz CPU 6 MB / s per CPU (12 MB / s per dual processor client) Up to ~ 5 clients per pile-up server (~ 60 MB / s on its network card Gigabit) Fast disk access client Pile-up server client 12 MB/s ~60 MB/s ~5 clients per server Pile-up DB
Vincenzo Innocente, Beauty 2002CMS on the grid28 CMS TOMORROW Transition to Grid-Middleware Use Virtual Data tools for workflow mng at DataSet level Use Grid Security infrastructure & Workload manager Deploy Grid-enabled portal to interactive Analysis Global monitoring of Grid performances and quality of service CMS Grid workshop at CERN 11-14/6/2002
Vincenzo Innocente, Beauty 2002 CMS on the grid 29 Toward ONE Grid Build a unique CMS-GRID framework (EU+US) EU and US grids not interoperable today. Wait for help from DataTAG-iVDGL-GLUE Work in parallel in EU and US Main US activities: MOP Virtual Data System Interactive Analysis Main EU activities: Integration of IMPALA with EDG WP1+WP2 sw. Batch Analysis: user job submission & analysis farm
Vincenzo Innocente, Beauty 2002 CMS on the grid 30 PPDG MOP system u PPDG Developed MOP System u Allows submission of CMS prod. Jobs from a central location, run on remote locations, and return results u Relies on GDMP for replication u Globus GRAM u Condor-G and local queuing systems for Job Scheduling u IMPALA for Job Specification u being deployed in USCMS testbed u Proposed as basis for next CMS-wide production infrastructure
Vincenzo Innocente, Beauty 2002 CMS on the grid 31 Storage Resource Replica Mngmt Catalog Services Planner Executor User RefDB Materialized Data Catalog Virtual Data Catalog Concrete Planner/ WP1 Abstract Planner MOP/ WP1 Replica Catalog GDMP Local Grid Storage Objectivity Metadata Catalog Local Tracking DB Compute Resource BOSS CMKIN CMSIM ORCA/COBRA Wrapper Scripts Prototype VDG System (production)
Vincenzo Innocente, Beauty 2002 CMS on the grid 32 Storage Resource Replica Mngmt Catalog Services Planner Executor User RefDB Materialized Data Catalog Virtual Data Catalog Concrete Planner/ EDG-WP1 Abstract Planner MOP/ EDG-WP1 Replica Catalog GDMP Local Grid Storage Objectivity Metadata Catalog Local Tracking DB Compute Resource BOSS CMKIN CMSIM ORCA/COBRA Wrapper Scripts = no code= existing= implemented using MOP Prototype VDG System (production)
Vincenzo Innocente, Beauty 2002 CMS on the grid 33 IMPALA/BOSS integration with EDG User Environment DOLLY BOSS jobs mySQL DB RefDB at CERN CE batch manager NFS WN1WN2 CMKIN IMPALA WNn UI GRID EDG-RB UI job executer job
Vincenzo Innocente, Beauty 2002 CMS on the grid 34 Push & Pull rsh & ssh existing scripts snmp RC Monitor Service Farm Monitor Client (other service) Lookup Service Lookup Service Registration Farm Monitor Discovery Proxy Component Factory GUI marshaling Code Transport RMI data access Globally Scalable Monitoring Service
Vincenzo Innocente, Beauty 2002 CMS on the grid 35 Optimisation of “Tag” Databases Tags (n-tuple) are small (~ kbyte) summary objects for each event Crucial for fast selection of interesting event subsets; this will be an intensive activity Past work concentrated in three main areas: Development of Objectivity based Tags integrated with the CMS “COBRA[*]” framework and Lizard Investigations of Tag bitmap indexing to speed queries Comparisons of OO and traditional databases (SQL Server, Oracle 9i, PostGreSQL) as efficient stores for Tags New work concentrates on tag based analysis services
Vincenzo Innocente, Beauty 2002 CMS on the grid 36 CLARENS: a Portal to the Grid Grid-enabling the working environment for physicists' data analysis Clarens consists of a server communicating with various clients via the commodity XML- RPC protocol. This ensures implementation independence. The server will provide a remote API to Grid tools: Client RPC Web Server Clarens Service http/https The Virtual Data Toolkit: Object collection access Data movement between Tier centres using GSI-FTP CMS analysis software (ORCA/COBRA), Security services provided by the Grid (GSI) No Globus needed on client side, only certificate Current prototype is running on the Caltech proto-Tier2
Vincenzo Innocente, Beauty 2002 CMS on the grid 37 Clarens Architecture Common protocol spoken by all types of clients to all types of services Implement service once for all clients Implement client access to service once for each client type using common protocol already implemented for “all” languages (C++, Java, Fortran, etc. :-) Common protocol is XML-RPC with SOAP close to working, CORBA doable, but would require different server above Clarens (uses IIOP, not HTTP) Handles authentication using Grid certificates, connection management, data serialization, optionally encryption Implementation uses stable, well-known server infrastructure (Apache) that is debugged/audited over a long period by many Clarens layer itself implemented in Python, but can be reimplemented in C++ should performance be inadequate More information at along with a web-based demo
Vincenzo Innocente, Beauty 2002CMS on the grid Grid-enable Analysis Sub-event components map to Grid Data-Products Balance of load between Network and CPU Complete Data and Software base “virtually” available at the physicist desktop
Vincenzo Innocente, Beauty 2002 CMS on the grid 39 Evolution of Computing in CMS Ramp Production systems (30%,+30%,+40% of cost each year) Match Computing power available with LHC luminosity M Reco ev/mo 100M Re-Reco ev/mo 30k ev/s Analysis M Reco ev/mo 200M Re-Reco ev/mo 50k ev/s Analysis Old schedule: new one stretched of 15 more months
Vincenzo Innocente, Beauty 2002 CMS on the grid 40 Federation wizards Detector/Event Display Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS POMtools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Grid-enable Analysis Consistent User Interface Coherent set of basic tools and mechanisms Software development and installation
Vincenzo Innocente, Beauty 2002 CMS on the grid 41 Simulation, Reconstruction & Analysis Software System Specific Framework ODBMS Geant3/4 CLHEP Paw Replacement C++ standard library Extension toolkit Reconstruction Algorithms Data Monitoring Event Filter Physics Analysis Calibration Objects Event Objects Configuration Objects Generic Application Framework Physics modules adapters and extensions Basic Services Grid-Aware Data-Products Grid-enabled Application Framework Uploadable on the Grid
Vincenzo Innocente, Beauty 2002 CMS on the grid 42 Reconstruction on Demand Event Rec T2 Rec T1 Rec Hits Analysis Hits T1 CaloCl DetectorElement Compare the results of two different track reconstruction algorithms T2 RecCaloCl
Vincenzo Innocente, Beauty 2002 CMS on the grid 43 Conclusions CMS considers the Grid as the enabling technology for the effective deployment of a coherent and consistent data processing environment This is the only base for an efficient physics analysis program at LHC “Spring 2002” production just finished successfully: Distributed analysis started Make use of grid-middleware is next milestone CMS is engaged in an active development, test and deployment program of all software and hardware components that will constitute the future LHC grid