Vincenzo Innocente, CERN/EP Persistency: 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Why a Commercial ODMBS can suit CMS.

Slides:



Advertisements
Similar presentations
Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann.
Advertisements

DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
March 24-28, 2003Computing for High-Energy Physics Configuration Database for BaBar On-line Rainer Bartoldus, Gregory Dubois-Felsmann, Yury Kolomensky,
M. D'Amato, M. Mennea, L.Silvestris INFN-Bari CMS Data Model 9-11 Aprile 2001, Catania I Workshop INFN Grid CMS DATA MODEL M. D’Amato, M. Mennea, L. Silvestris.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
Data - Information - Knowledge
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.
EventStore Managing Event Versioning and Data Partitioning using Legacy Data Formats Chris Jones Valentin Kuznetsov Dan Riley Greg Sharp CLEO Collaboration.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Introduction to Systems Analysis and Design
August 98 1 Jürgen Knobloch ATLAS Software Workshop Ann Arbor ATLAS Computing Planning ATLAS Software Workshop August 1998 Jürgen Knobloch Slides also.
Data Quality Monitoring of the CMS Tracker
Framework for Automated Builds Natalia Ratnikova CHEP’03.
Designing a HEP Experiment Control System, Lessons to be Learned From 10 Years Evolution and Operation of the DELPHI Experiment. André Augustinus 8 February.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Introduzione al Software di CMS N. Amapane. Nicola AmapaneTorino, Aprile Outline CMS Software projects The framework: overview Finding more.
Conditions DB in LHCb LCG Conditions DB Workshop 8-9 December 2003 P. Mato / CERN.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall
Chapter 4 Realtime Widely Distributed Instrumention System.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
The european ITM Task Force data structure F. Imbeaux.
5 May 98 1 Jürgen Knobloch Computing Planning for ATLAS ATLAS Software Week 5 May 1998 Jürgen Knobloch Slides also on:
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
STAR Event data storage and management in STAR V. Perevoztchikov Brookhaven National Laboratory,USA.
- Early Adopters (09mar00) May 2000 Prototype Framework Early Adopters Craig E. Tull HCG/NERSC/LBNL ATLAS Arch CERN March 9, 2000.
21 April, 1999 Vincenzo Innocente LHC++ Meeting1 Time-Ordered Persistent Collections Vincenzo Innocente CMS Collaboration see also contribution to RD45.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Monte-Carlo Event Database: current status Sergey Belov, JINR, Dubna.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Michele de Gruttola 2008 Report: Online to Offline tool for non event data data transferring using database.
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
Computing R&D and Milestones LHCb Plenary June 18th, 1998 These slides are on WWW at:
Some Ideas for a Revised Requirement List Dirk Duellmann.
Claudio Grandi INFN-Bologna CHEP 2000Abstract B 029 Object Oriented simulation of the Level 1 Trigger system of a CMS muon chamber Claudio Grandi INFN-Bologna.
March, 2002 Efficient Bitmap Indexing Techniques for Very Large Datasets Kesheng John Wu Ekow Otoo Arie Shoshani.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
General requirements for BES III offline & EF selection software Weidong Li.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
G.Govi CERN/IT-DB 1GridPP7 June30 - July 2, 2003 Data Storage with the POOL persistency framework Motivation Strategy Storage model Storage operation Summary.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
STAR Persistent Pointers in the STAR Micro-DST V. Perevoztchikov Brookhaven National Laboratory,USA.
Marco Cattaneo, 6-Apr Issues identified in sub-detector OO software reviews Calorimeters:18th February Tracking:24th March Rich:31st March.
Thomas Ruf, CERN EP Experience with C++ and ROOT used in the VX Beam Test Thomas Ruf, CERN, EP  Why? Event structure for VX-data rather complex: raw hits.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
VI/ CERN Dec 4 CMS Software Architecture vs Hybrid Store Vincenzo Innocente CMS Week CERN, Dec
The Database Project a starting work by Arnauld Albert, Cristiano Bozza.
W4118 Operating Systems Instructor: Junfeng Yang.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
GUIDO VOLPI – UNIVERSITY DI PISA FTK-IAPP Mid-Term Review 07/10/ Brussels.
1 Design and Implementation of a High-Performance Distributed Web Crawler Polytechnic University Vladislav Shkapenyuk, Torsten Suel 06/13/2006 석사 2 학기.
The ZEUS Event Store An object-oriented tag database for physics analysis Adrian Fox-Murphy, DESY CHEP2000, Padova.
An ODBMS approach to persistency in CMS
Content Management Systems
Vincenzo Innocente CERN/EP/CMC
CMS Persistent Event Structure
Use of GEANT4 in CMS The OSCAR Project
CMS Software Architecture
Presentation transcript:

Vincenzo Innocente, CERN/EP Persistency: October 1999, CERN 1st Internal Review of CMS Software and Computing Why a Commercial ODMBS can suit CMS Vincenzo Innocente CERN, EP/CMC

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: HEP Data Event Collection CollectionMeta-Data Event Electrons Electrons Tracker Alignment Tracks Tracks Ecal calibration Ecal calibration User Tag (N-tuple) Environmental data u Detector and Accelerator status u Calibrations, Alignments Event-Collection Meta-Data (luminosity, selection criteria, …) … Event Data, User Data

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Do I need a DBMS? (a self-assessment) Do I encode meta-data (run number, version id) in file names? How many files and logbooks I should consult to determine the luminosity corresponding to a histogram? How easily I can determine if two events have been reconstructed with the same version of a program and using the same calibrations? How many lines of code I should write and which fraction of data I should read to select all events with two  ’s with p  > 11.5 GeV and |  |<2.7? The same at generator level? If the answers scare you, you need a DBMS!

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: A major challenge for LHC: The scale Event output rate 100 events/sec (10^9 events/year) (10^9 events/year) Data written to tape 100 M Bytes/sec (1PB/yr) Processing capacity > 10 TIPS (= 10^13 instr./s) Typical networks Hundreds of Mbits/second Lifetime of experiment 2-3 decades Users ~1700 physicists Software developers ~100 è ~100 Petabytes Total for the LHC

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Can CMS do without a DBMS? An experiment lasting 20 years can not rely just on ASCII files and file systems for its production bookkeeping, “condition” database, etc. Even today at LEP, the management of all real and simulated data-sets (from raw-data to n-tuples) is a major enterprise. A DBMS is the modern answer to such a problem and, given the choice of OO technology for the CMS software, an ODBMS (or a DBMS with an OO interface) is the natural solution.

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: A “BLOB” Model Event RecEve nt RawEve nt Blob Event Blob DataBase Objects Blob Blob: a sequence of bytes. Decoding it is a “user” responsibility. Why should Blobs not be stored in the DBMS?

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Raw Event RawDat a RawEvent RawData... Vector of Digi ReadOu t Index RawData are identified by the corresponding ReadOut. RawData belonging to different “detectors” are clustered into different containers. The granularity will be adjusted to optimize I/O performances. An index at RawEvent level is used to avoid the access to all containers in search for a given RawData. A range index at RawData level could be used for fast random access in complex detectors. Index implemented as an ordered vector of pairs

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Can every object have its own persistency? Data size Data complexity Self-Description: which granularity? Meta-Data vs Data logical vs physical organization Flexibility vs Efficiency Interface with “standard” tools (like GUIs) Fast prototyping vs formal/controlled design User knowledge and training

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Is an ODBMS an overkill for Histograms? Maybe, if histograms are your sole I/O. (I use my sun ultra-5 to read mails through pine even if a line-mode terminal would be more than adequate) N-tuples are “user” event-data and, for any serious use, require a level of management and book-keeping similar to the “experiment-wide” event data. What counts is the efficiency and reliability of the analysis: The most sophisticated histogramming package is useless if you are unable to determine the luminosity corresponding to a given histogram!

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Objectivity Features CMS (really) uses Persistent objects are real C++ (and Java) objects I/O cache (memory) management u no explicit read and write u no need to delete previous event idpointer Smart-pointers (automatic id to pointer conversion) bi-directional associations VArray Efficient containers by value (VArray) flexible object physical-clustering Object Naming u as top level entry point (at “collection” level)

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Additional ODBMS (Objy) Advantages Novel access methods: u A collection of “electrons” with no reference to events u Direct reference from event-objects to “condition database” u Direct reference to event-data from user-data Flexible run-time clustering of heterogeneous-type objects u cluster together all tracks or all objects belonging to the same event Real DB management of reconstructed objects u add or modify in place and on demand parts of an event

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: CMS Experience (Pro) Designing and implementing persistent classes not harder than doing it for native C++ classes. Easy and transparent distinction between logical associations and physical clustering. Fully transparent I/O with performances essentially limited by the disk speed (random access). File size overhead (3% for realistic CMS object sizes) not larger than for other “products” such as ZEBRA or BOS. Objectivity/DB (compared to other products we are used to) is robust, well documented and provides many additional useful features.

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: CMS Experience (Cons) Objectivity (and the compilers it supports) does not implement the “latest” C++ features (changing: fast convergence toward ANSI standard) There are additional “configuration elements” to care about: ddl files, schema-definition databases, database catalogs u organized software development: rapid prototyping is not impossible, its integration in a product should be done with care Performance degradations often wait you around the corner u monitoring of running applications is essential, off-the-shelf solutions often exist Objectivity is a “bare” product: u integration into a framework is our responsibility

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: CMS Experience (missing features) Scalability: 64K files are not enough (Objy is working on it) containers are the natural Objectivity units, still things for which the OS (and files) is preferred u “bulk” data transfer (to mass-storage, among sites) u access control, space allocation to users, etc. Efficient and secure AMS (ok in 5.2?) u with MSS and WAN support Adequate Data Base administration tools Support for “private” user classes and user data (w.r.t. experiment-wide ones)

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: ODBMS: part of a strategy The ODBMS is one component of a strategy for developing a reliable and efficient software system. ODBMS, as any other technology, is not a silver bullet. Any single technical issue can be solved with few thousand lines of code by any of us. This is not the point: What we need is a coherent solution to the problem of data management and object persistency for an experiment which will last longer than a decade

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: SummarySummary A DBMS is required to manage the large data set of CMS (including user data) An ODBMS is the natural choice if OO is used in all SW There is no reason NOT to store event-data in the DB as a “Blob” or as a real object system Once an ODBMS will be deployed to manage the experiment data, it will be very natural to use it to manage any kind of data related to detector studies and physics analysis Objectivity/DB is proving to be a reliable product and the company is responding to our peculiar requirements

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Object Model

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Object Model

27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Reconstructed Objects S Track S-Track Reconstructor S Track... Vector of Hits RecEven t Track SecInf o Track Constituen ts Reconstructed Objects produced by a given “algorithm” are managed by a Reconstructor. A Reconstructed Object (Track) is split into several independent persistent objects to allow their clustering according to their access requirements (physics analysis, reconstruction, detailed detector studies, etc.). The top level object acts as a proxy. Intermediate reconstructed objects (Hits) are transient and are cashed by value into the final objects.