CHEP `03 March 24, 2003 Vincenzo Innocente CERN/EP CMS Data Analysis: Present Status, Future Strategies Vincenzo.

Slides:



Advertisements
Similar presentations
Remote Visualisation System (RVS) By: Anil Chandra.
Advertisements

Physicist Interfaces Project an overview Physicist Interfaces Project an overview Jakub T. Moscicki CERN June 2003.
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Database System Concepts and Architecture
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Ideas on the LCG Application Architecture Application Architecture Blueprint RTAG 12 th June 2002 P. Mato / CERN.
Vincenzo Innocente, BluePrint RTAGNuts & Bolts1 Architecture Nuts & Bolts Vincenzo Innocente CMS.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Workload Management Massimo Sgaravatto INFN Padova.
ACAT Lassi A. Tuura, Northeastern University CMS Data Analysis Current Status and Future Strategy On behalf of CMS.
Chapter 9: Moving to Design
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
REVIEW OF NA61 SOFTWRE UPGRADE PROPOSAL. Mandate The NA61 experiment is contemplating to rewrite its fortran software in modern technology and are requesting.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Introduzione al Software di CMS N. Amapane. Nicola AmapaneTorino, Aprile Outline CMS Software projects The framework: overview Finding more.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett The Athena Control Framework in Production, New Developments and Lessons Learned.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.
Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Introduction CMS database workshop 23 rd to 25 th of February 2004 Frank Glege.
GDB Meeting - 10 June 2003 ATLAS Offline Software David R. Quarrie Lawrence Berkeley National Laboratory
3 rd May’03Nick Brook – 4 th LHC Symposium1 Data Analysis – Present & Future Nick Brook University of Bristol Generic Requirements & Introduction Expt.
The CMS Simulation Software Julia Yarba, Fermilab on behalf of CMS Collaboration 22 m long, 15 m in diameter Over a million geometrical volumes Many complex.
0 Fermilab SW&C Internal Review Oct 24, 2000 David Stickland, Princeton University CMS Software and Computing Status The Functional Prototypes.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
DQM for the RPC subdetector M. Maggi and P. Paolucci.
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
- LCG Blueprint (19dec02 - Caltech Pasadena, CA) LCG BluePrint: PI and SEAL Craig E. Tull Trillium Analysis Environment for the.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Geant4 User Workshop 15, 2002 Lassi A. Tuura, Northeastern University IGUANA Overview Lassi A. Tuura Northeastern University,
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
CPT Week, November , 2002 Lassi A. Tuura, Northeastern University Core Framework Infrastructure Lassi A. Tuura Northeastern.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork1 Software Frameworks for HEP Data Analysis Vincenzo Innocente CERN/EP.
1 CMS Virtual Data Overview Koen Holtman Caltech/CMS GriPhyN all-hands meeting, Marina del Rey April 9, 2001.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
VI/ CERN Dec 4 CMS Software Architecture vs Hybrid Store Vincenzo Innocente CMS Week CERN, Dec
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
BESIII data processing
Database Replication and Monitoring
(on behalf of the POOL team)
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
US ATLAS Physics & Computing
Vincenzo Innocente CERN/EP/CMC
CMS Pixel Data Quality Monitoring
Module 01 ETICS Overview ETICS Online Tutorials
ATLAS DC2 & Continuous production
CMS Software Architecture
Presentation transcript:

CHEP `03 March 24, 2003 Vincenzo Innocente CERN/EP CMS Data Analysis: Present Status, Future Strategies Vincenzo Innocente CERN/EP

March 24, 2003 Vincenzo Innocente CERN/EP 2Abstract v CMS Data Analysis: Current Status and Future Strategy v We present the current status of CMS data analysis architecture and describe work on future Grid-based distributed analysis prototypes. CMS has two main software frameworks related to data analysis: COBRA, the main framework, and IGUANA, the interactive visualisation framework. Software using these frameworks is used today in the world-wide production and analysis of CMS data. We describe their overall design and present examples of their current use with emphasis on interactive analysis. CMS is currently developing remote analysis prototypes, including one based on Clarens, a Grid-enabled client-server tool. Use of the prototypes by CMS physicists will guide us in forming a Grid-enriched analysis strategy. The status of this work is presented, as is an outline of how we plan to leverage the power of our existing frameworks in the migration of CMS software to the Grid.

March 24, 2003 Vincenzo Innocente CERN/EP 3Analysis v Analysis is not to use the tool to plot an histogram, but the full chain from accessing event data up to producing the final plot for publication v Analysis is an iterative process: r Reduce data samples to more interesting subsets (selection) r Compute higher level information r Calculate statistical entities v Several steps: r Run analysis job on full dataset (few times) r Use interactive analysis tool to run several times on reduced dataset and make plots  Still in the early stage of defining an Analysis Model  Today we work with raw data r Reconstruction and analysis are mixed up – (analysis and debugging are mixed up!) r Software development, production and analysis in parallel r No clear concept of high level persistent objects (DST) r Each physics group has its own analysis package and “standard ntuple” CMS is a laboratory for experimenting analysis solutions

March 24, 2003 Vincenzo Innocente CERN/EP 4 Getting ready for April `07 CMS is engaged in an aggressive program of “data challenges” of increasing complexity. Each is focus on a given aspect, all encompass the whole data analysis process: r Simulation, reconstruction, statistical analysis r Organized production, end-user batch job, interactive work v Past: Data Challenge `02 r Focus on High Level Trigger studies v Present: Data Challenge `04 r Focus on “real-time” mission critical tasks v Future: Data Challenge `06 r Focus on distributed physics analysis

March 24, 2003 Vincenzo Innocente CERN/EP 5 HLT Production 2002 v Focused on High Level Trigger studies r 6 M events = 150 Physics channels r files = 500 Event Collections = 20 TB NoPU: 2.5M, 2x10 33 PU:4.4M, PU: 3.8M, filter: 2.9M r jobs, 45 years CPU (wall-clock) r 11 Regional Centers – > 20 sites in USA, Europe, Russia – ~ 1000 CPUs r More than 10 TB traveled on the WAN r More than 100 physics involved in the final analysis v GEANT3, Objectivity, Paw, Root v CMS Object Reconstruction & Analysis Framework (COBRA) and applications (ORCA) Successful validation of CMS High Level Trigger Algorithms Rejection factors, computing performance, reconstruction-framework

March 24, 2003 Vincenzo Innocente CERN/EP 6 Data challenge 2004 v DC04, to be completed in April 2004 r Reconstruct 50 million events r Copes with 25 Hz at 2×10 33 cms -2 s -1 for 1 month r These are supposed to be events in the Tier-0 center, i.e. events passing the HLT – From the computing point of view the test is the same if these events are simple minimum bias – This is a great opportunity to reconstruct events which can be used for full analysis (Physics-TDR) r Define and validate datasets for analysis – Identify reconstruction and analysis objects each group would like to have for the full analysis – Develop selection algorithms necessary to obtain the required sample r Prepare for “mission critical” analysis  test event model – Look at calibration and alignment r Physics and computing validation of Geant4 detector simulation

March 24, 2003 Vincenzo Innocente CERN/EP 7 How data analysis begins The result of the reconstruction will be saved along with the raw data in an Object database Monitoring, calibration HLTFU Filter Unit SU Server Unit PU Processing Unit Online Offline Express lines Reconstruction, reprocessing, analysis latency: (minutes – hours)

March 24, 2003 Vincenzo Innocente CERN/EP 8 Data Challenge 2004 … …… … Digitization ORCA Digis: raw data bx MB … …… … MC ntuples Event generation PYTHIA  b/  e/  JetMet Analysis Iguana/ Root/PAW Ntuples: MC info, tracks, etc DST stripping ORCA ………… … …… … Reconstruction, L1, HLT ORCA DST Detector simulation OSCAR Detector Hits MB … …… … Calibration

March 24, 2003 Vincenzo Innocente CERN/EP 9 Jets CaloClustersTkTracks CaloRecHitsTkRecHits CaloDataFramesTkDigis JetReconstructor  <  cut  r < r cut Calib-A Align-C TkHits Random Nr. CaloHits Reconstruction DAQ or Simulation High granularity “DAG” Calibrations and detail detector and physics studies require access to few objects per event. These studies will need also to access to “conditions” data associated to these objects. Access pattern to the very same object may be very different for different use cases: F a flexible definition of “datasets” (associated to use-cases) is required.

March 24, 2003 Vincenzo Innocente CERN/EP 10 Analysis Environments Real Time Event Filtering and Monitoring r Data driven pipeline r High reliability Pre-emptive Simulation, Reconstruction and Event Classification r Massive parallel batch-sequential process r Excellent error recovery and rollback mechanisms r Excellent scheduling and bookkeeping systems Interactive Statistical Analysis r Rapid Application Development environment r Excellent visualization and browsing tools r Human “readable” navigation

March 24, 2003 Vincenzo Innocente CERN/EP 11 Three Computing Environments: Different Challenges v Centralized quasi-online processing r Keep-up with the rate r Validate and distribute data efficiently v Distributed organized processing r Automatization v Interactive chaotic analysis r Efficient access to data and “Metadata” r Management of “private” data r Rapid Application Development

March 24, 2003 Vincenzo Innocente CERN/EP 12 The Ultimate Challenge: A Coherent Analysis Environment v Beyond the interactive analysis tool (User point of view) r Data analysis & presentation: N-tuples, histograms, fitting, plotting, … v A great range of other activities with fuzzy boundaries (Developer point of view) r Batch r Interactive from “pointy-clicky” to Emacs-like power tool to scripting r Setting up configuration management tools, application frameworks and reconstruction packages r Data store operations: Replicating entire data stores; Copying runs, events, event parts between stores; Not just copying but also doing something more complicated—filtering, reconstruction, analysis, … r Browsing data stores down to object detail level r 2D and 3D visualisation r Moving code across final analysis, reconstruction and triggers Today this involves (too) many tools

March 24, 2003 Vincenzo Innocente CERN/EP 13 Federation wizards Detector/Event Display Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS LCGtools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Architecture Overview Consistent User Interface Coherent set of basic tools and mechanisms Software development and installation

March 24, 2003 Vincenzo Innocente CERN/EP 14 Simulation, Reconstruction & Analysis Software System Specific Framework Object Persistency Geant3/4 CLHEP Analysis Tools C++ standard library Extension toolkit Reconstruction Algorithms Data Monitoring Event Filter Physics Analysis Calibration Objects Event Objects Configuration Objects Generic Application Framework Physics modules adapters and extensions Basic Services Grid-Aware Data-Products Grid-enabled Application Framework Uploadable on the Grid LCG

March 24, 2003 Vincenzo Innocente CERN/EP 15 Qt plotter Histogram extended with pointers to CMS events Emacs used to edit CMS C++ plugin to create and fill histograms OpenInventor-based display of selected event Python shell with external & CMS modules

March 24, 2003 Vincenzo Innocente CERN/EP 16 Varied components and data flows One Portal Production system and data repositories ORCA analysis farm(s) (or distributed `farm’ using grid queues) RDBMS based data warehouse(s) PIAF/Proof/.. type analysis farm(s) Local disk User TAGs/AODs data flow Physics Query flow Tier 1/2 Tier 0/1/2 Tier 3/4/5 Production data flow TAG and AOD extraction/conversion/transport services Data extraction Web service(s) Local analysis tool: Iguana/ROOT/… Web browser Query Web service(s) Tool plugin module

March 24, 2003 Vincenzo Innocente CERN/EP 17 CLARENS: a Portal to the Grid  Grid-enabling the working environment for physicists' data analysis  Clarens consists of a server communicating with various clients via the commodity XML-RPC protocol. This ensures implementation independence.  The server will provide a remote API to Grid tools: Client RPC Web Server Clarens Service http/https  The Virtual Data Toolkit: Object collection access  Data movement between Tier centres using GSI- FTP  CMS analysis software (ORCA/COBRA),  Security services provided by the Grid (GSI)  No Globus needed on client side, only certificate Current prototype is running on the Caltech proto-Tier2

March 24, 2003 Vincenzo Innocente CERN/EP 18 Clarens Architecture Common protocol spoken by all types of clients to all types of services Implement service once for all clients Implement client access to service once for each client type using common protocol already implemented for “all” languages (C++, Java, Fortran, etc. :-) Common protocol is XML-RPC with SOAP close to working, CORBA doable, but would require different server above Clarens (uses IIOP, not HTTP) Handles authentication using Grid certificates, connection management, data serialization, optionally encryption Implementation uses stable, well-known server infrastructure (Apache) that is debugged/audited over a long period by many Clarens layer itself implemented in Python, but can be reimplemented in C++ should performance be inadequate More information at along with a web-based demo

March 24, 2003 Vincenzo Innocente CERN/EP 19 Example of analysis on the grid Web Server Clarens Service Remote batch service: resource allocations, control, monitoring Local analysis Environment: Data cache browser, presenter Resource broker? Remote web service: act as gateway between users and remote facility

March 24, 2003 Vincenzo Innocente CERN/EP 20Summary Success of analysis software will be measured by the ability to provide at the same time a simple, coherent and stable view to the physicists retaining the flexibility required to achieve the maximal computing efficiency CMS is responding to this challenge developing an analysis software architecture based on a layered structure r a consistent interface to the physicist – Customizable – Implemented in many flavors (Qt, python, root, web-browser) r A flexible application framework – Mainly responsible of managing event-data with high-granularity r A set of back-end services – Specialized for different use-cases and computing environments

March 24, 2003 Vincenzo Innocente CERN/EP 21Summary v “Spring 2002” production successfully: r Distributed organized production r Distributed “traditional” analysis r Validation of High Level Trigger strategy v Next target (DC04): r one month of mission critical analysis r Test of analysis and computing model