Download presentation
Presentation is loading. Please wait.
1
ACAT 2002 http://iguana.cern.chJune, 2002 Lassi A. Tuura, Northeastern University CMS Data Analysis Current Status and Future Strategy On behalf of CMS Collaboration Lassi A. Tuura Northeastern University, Boston
2
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 2Overview v The Context — CMS Analysis Today v Data Analysis Environment Architecture r Overview r COBRA r IGUANA r GRID/Production v Tomorrow and Beyond r Leveraging current frameworks in the Grid-enriched analysis environment r Clarens client-server prototype r Other prototype activities
3
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 3 Challenges:Complexity Geographic Dispersion Direct Access To Data Migration from Reconstruction to Trigger Environments: Real-Time Event Filter, Online Monitoring Pre-emptive Simulation, Reconstruction, Analysis Interactive Statistical Analysis Context
4
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 4 Current CMS Production Pythia Zebra files with HITS HEPEVT Ntuples CMSIM (GEANT3) ORCA/COBRA Digitization (merge signal and pile-up) Objectivity Database ORCA/COBRA ooHit Formatter Objectivity Database OSCAR/COBRA (GEANT4) ORCA User Analysis Ntuples or Root files Objectivity Database IGUANA Interactive Analysis
5
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 5 Complexity of Production 2002 7TB toward T1 4TB toward T2 File Transfer by GDMP and by perl Scripts over scp/bbcp 17TBData Size (Not including fz files from Simulation) ~11,000Number of Files 6-8 Number of Production Passes for each Dataset (including analysis group processing done by production) 176 CPUsLargest Local Center ~1000Number of CPU’s 21Number of Computing Centers 11Number of Regional Centers
6
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 6 Interactive Analysis Lizard Qt plotter ANAPHE histogram extended with pointers to CMS events Emacs used to edit a CMS C++ plugin to create and fill histograms OpenInventor-based display of selected event Python shell with Lizard & CMS modules Most of analysis is done using NTUPLEs in PAW, some in ROOT
7
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 7 Behind the Scenes: Frameworks Federationwizards Detector/EventDisplay Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS Objytools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Consistent User Interface Coherent basic tools and mechanisms
8
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 8 ODBMS GEANT 3 / 4 CLHEP PAW Replacement C++ Standard Library + Extension Toolkits C++ Standard Library + Extension Toolkits Frameworks Disected Calibration Objects Calibration Objects Generic Application Framework Physics modules Grid-Uploadable BasicServices Adapters and Extensions Configuration Objects Configuration Objects Event Objects Event Objects (Grid-aware) Data-Products SpecificFrameworks Event Filter Reconstruction Algorithms Physics Analysis Data Monitoring
9
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 9 v Several frameworks provide the environment together r Open: No central framework with all functionality – Frameworks are designed to be extensible – … and to collaborate with other software r Coherent: User sees “final” smooth interface – Achieved by integrating the frameworks together – … but the user does not do this work him/herself ! r Design applied at both framework and object design level v Successfully applied in many parts of CMS software r Applications, persistency; sub-frameworks; visualisation; … r No loss of usability, functionality or performance r Has made it easy to integrate directly with many existing tools v This is nothing novel — it is part of the standard risk- mitigation strategy of any modern industrial solution Framework Design Basis
10
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 10 Frameworks: COBRA Federationwizards Detector/EventDisplay Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS Objytools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Consistent User Interface Coherent basic tools and mechanisms
11
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 11 COBRA: Main Components v Push- and pull-mode execution—and any mixture r Reconstruction-on-demand is a key concept in COBRA r Detector-centric reconstruction—push data from event r Reconstruction-unit-centric reconstruction—pull/create data as needed v Event data and related structures r Basic support for commonly needed objects (hits, digis, containers, …) v Application environments r Basic application frameworks, various semi-specialised applications r Lots of error-handling and recovery code (automatic recovery after crash, …) v Meta data: a key component r Data chunking, system and user collections, data streams, file management, job concepts, configuration and setup records, redirected navigation after reprocessing, …
12
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 12 COBRA: Main Strengths v Algorithms in plug-ins r “Publish-yourself-plug-ins”—self-describing data producers v Strong meta-data facilities r Reconstruction-on-demand matches data product concept very well – Grid virtual data products concept really just an extension r Convenient mapping of data products to chunks: files, containers, … r Scatter / gather: decompose jobs, gather data – One logical job can be chopped into many physical processes, we still know it is logically the same job no matter which process it is running in v Adapts automatically to many environments without special configuration: interactive, batch, farm, stand-alone, trigger, … r Through appropriate use of enabling techniques (transactions, locking, refs) r No data post-processing required r Well-matched to production tools (IMPALA)
13
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 13 Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity
14
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 14 Refs & Navigation Refs & Navigation Queries Cache Management Cache Management Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity
15
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 15 Object Naming Object Naming Configurations (Data Sets) Configurations (Data Sets) Collections Run Resume & Crash Recovery Run Resume & Crash Recovery Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity
16
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 16 File Size Control File Size Control Farm Management Farm Management System Management System Management Storage Manager Storage Manager Schema Manager Schema Manager Transaction Manager Transaction Manager C++ Binding File I/O Lock Server Lock Server Page Server Page Server Catalog Manager DDL Source Processing DDL Source Processing Meta Data Meta Data Object Access Object Access MSS, Grid & Farm Interface MSS, Grid & Farm Interface Objectivity
17
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 17 Frameworks: IGUANA Federationwizards Detector/EventDisplay Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS Objytools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Consistent User Interface Coherent basic tools and mechanisms
18
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 18 User Interface and Visualisation v IGUANA: a generic toolkit for user interfaces and visualisation r Builds on existing high-quality libraries (Qt, OpenInventor, Anaphe, …) r Used to implement specific visualisation applications in other projects v Main technical focus: provide a platform that makes it easy to integrate GUIs as a coherent whole, to provide application services and to visualise any application object r Many categories / layers: GUI gadgets & support, application environment, data visualisers, data representation methods, control panels, … r Designed to integrate with and into other applications r Virtually everything is in plug-ins (can still be statically linked) Plug-In Cache Object Factory Object Factory Component Database Plug-In Cache Plug-In Object Factory Attached Unattached
19
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 19 Illustration: 3D Visualisation QMainWindow Browser Site QMDIShell Browser Site QMDIShell Browser Site 3D Browser Twig Browser
20
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 20 IGUANA GUI Integration Integration Action Visualise Results, Modify Objects, Further Interaction
21
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 21 Tomorrow and Beyond v Leverage the current frameworks on the grid r Many native COBRA concepts match well with grid – (Virtual) data products ~ reconstruction-on-demand – Recording and matching configuration and setup information – Production interfaces: catalogs, redirection, MSS hooks – Scatter/gather job decomposition, production environment r COBRA-based applications can be encapsulated for distributed analysis r IGUANA already separates application objects, model and viewer – Many possibilities for introducing distributed links r IGUANA+COBRA provides a platform for a coherent, well-integrated interface no matter where the code runs and data comes and goes – Both have loads of knobs and hooks for integration v Aiming at adapting the existing software where possible r Adapt and work within CMS software (COBRA, ORCA, …) and existing analysis tools (ROOT, Lizard, …)—don’t replace them
22
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 22 Client RPC Web Server Clarens Service http/https Prototypes: Clarens Web Portals v Grid-enabling the working environment for physicists' data analysis v Communication with clients via the commodity XML-RPC protocol Implementation independence v Server implemented in C++: access to the CMS OO analysis toolkit v Server provides a remote API to Grid tools r The Virtual Data Toolkit: Object collection access r Data movement between tier centres using GSI-FTP r CMS analysis software (ORCA/COBRA) r Security services provided by the Grid (GSI) r No Globus needed on client side, only certificate
23
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 23 Tool plugin module Production system and data repositories ORCA analysis farm(s) (or distributed `farm’ using grid queues) RDBMS based data warehouse(s) PIAF/Proof/.. type analysis farm(s) Local disk User TAGs/AODs data flow Physics Query flow Tier 1/2 Tier 0/1/2 Tier 3/4/5 Production data flow TAG and AOD extraction/conversion/transport services Data extraction Web service(s) Local analysis tool: Lizard/ROOT/… Web browser Query Web service(s) Prototypes: Clarens Web Portals…
24
June, 2002 Lassi A. Tuura, Northeastern University http://iguana.cern.ch 24 Other Prototypes v Tag database optimisation r Fast sample selection is crucial r Various models already tried r Experimenting with RDBMS v MOP: distributed job submission system r Allows submission of CMS production jobs from a central location, run on remote locations, and return results – Job Specification: IMPALA – Replication: GDMP – Globus GRAM – Job Scheduling: Condor-G and local systems
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.