Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vincenzo Innocente, CERN/EP Persistency: 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Why a Commercial ODMBS can suit CMS.

Similar presentations


Presentation on theme: "Vincenzo Innocente, CERN/EP Persistency: 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Why a Commercial ODMBS can suit CMS."— Presentation transcript:

1 Vincenzo Innocente, CERN/EP Persistency: 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Why a Commercial ODMBS can suit CMS Vincenzo Innocente CERN, EP/CMC

2 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: HEP Data Event Collection CollectionMeta-Data Event Electrons Electrons Tracker Alignment Tracks Tracks Ecal calibration Ecal calibration User Tag (N-tuple) Environmental data u Detector and Accelerator status u Calibrations, Alignments Event-Collection Meta-Data (luminosity, selection criteria, …) … Event Data, User Data

3 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Do I need a DBMS? (a self-assessment) Do I encode meta-data (run number, version id) in file names? How many files and logbooks I should consult to determine the luminosity corresponding to a histogram? How easily I can determine if two events have been reconstructed with the same version of a program and using the same calibrations? How many lines of code I should write and which fraction of data I should read to select all events with two  ’s with p  > 11.5 GeV and |  |<2.7? The same at generator level? If the answers scare you, you need a DBMS!

4 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: A major challenge for LHC: The scale Event output rate 100 events/sec (10^9 events/year) (10^9 events/year) Data written to tape 100 M Bytes/sec (1PB/yr) Processing capacity > 10 TIPS (= 10^13 instr./s) Typical networks Hundreds of Mbits/second Lifetime of experiment 2-3 decades Users ~1700 physicists Software developers ~100 è ~100 Petabytes Total for the LHC

5 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Can CMS do without a DBMS? An experiment lasting 20 years can not rely just on ASCII files and file systems for its production bookkeeping, “condition” database, etc. Even today at LEP, the management of all real and simulated data-sets (from raw-data to n-tuples) is a major enterprise. A DBMS is the modern answer to such a problem and, given the choice of OO technology for the CMS software, an ODBMS (or a DBMS with an OO interface) is the natural solution.

6 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: A “BLOB” Model Event RecEve nt RawEve nt Blob Event Blob DataBase Objects Blob Blob: a sequence of bytes. Decoding it is a “user” responsibility. Why should Blobs not be stored in the DBMS?

7 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Raw Event RawDat a RawEvent RawData... Vector of Digi ReadOu t Index RawData are identified by the corresponding ReadOut. RawData belonging to different “detectors” are clustered into different containers. The granularity will be adjusted to optimize I/O performances. An index at RawEvent level is used to avoid the access to all containers in search for a given RawData. A range index at RawData level could be used for fast random access in complex detectors. Index implemented as an ordered vector of pairs

8 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Can every object have its own persistency? Data size Data complexity Self-Description: which granularity? Meta-Data vs Data logical vs physical organization Flexibility vs Efficiency Interface with “standard” tools (like GUIs) Fast prototyping vs formal/controlled design User knowledge and training

9 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Is an ODBMS an overkill for Histograms? Maybe, if histograms are your sole I/O. (I use my sun ultra-5 to read mails through pine even if a line-mode terminal would be more than adequate) N-tuples are “user” event-data and, for any serious use, require a level of management and book-keeping similar to the “experiment-wide” event data. What counts is the efficiency and reliability of the analysis: The most sophisticated histogramming package is useless if you are unable to determine the luminosity corresponding to a given histogram!

10 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Objectivity Features CMS (really) uses Persistent objects are real C++ (and Java) objects I/O cache (memory) management u no explicit read and write u no need to delete previous event idpointer Smart-pointers (automatic id to pointer conversion) bi-directional associations VArray Efficient containers by value (VArray) flexible object physical-clustering Object Naming u as top level entry point (at “collection” level)

11 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Additional ODBMS (Objy) Advantages Novel access methods: u A collection of “electrons” with no reference to events u Direct reference from event-objects to “condition database” u Direct reference to event-data from user-data Flexible run-time clustering of heterogeneous-type objects u cluster together all tracks or all objects belonging to the same event Real DB management of reconstructed objects u add or modify in place and on demand parts of an event

12 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: CMS Experience (Pro) Designing and implementing persistent classes not harder than doing it for native C++ classes. Easy and transparent distinction between logical associations and physical clustering. Fully transparent I/O with performances essentially limited by the disk speed (random access). File size overhead (3% for realistic CMS object sizes) not larger than for other “products” such as ZEBRA or BOS. Objectivity/DB (compared to other products we are used to) is robust, well documented and provides many additional useful features.

13 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: CMS Experience (Cons) Objectivity (and the compilers it supports) does not implement the “latest” C++ features (changing: fast convergence toward ANSI standard) There are additional “configuration elements” to care about: ddl files, schema-definition databases, database catalogs u organized software development: rapid prototyping is not impossible, its integration in a product should be done with care Performance degradations often wait you around the corner u monitoring of running applications is essential, off-the-shelf solutions often exist Objectivity is a “bare” product: u integration into a framework is our responsibility

14 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: CMS Experience (missing features) Scalability: 64K files are not enough (Objy is working on it) containers are the natural Objectivity units, still things for which the OS (and files) is preferred u “bulk” data transfer (to mass-storage, among sites) u access control, space allocation to users, etc. Efficient and secure AMS (ok in 5.2?) u with MSS and WAN support Adequate Data Base administration tools Support for “private” user classes and user data (w.r.t. experiment-wide ones)

15 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: ODBMS: part of a strategy The ODBMS is one component of a strategy for developing a reliable and efficient software system. ODBMS, as any other technology, is not a silver bullet. Any single technical issue can be solved with few thousand lines of code by any of us. This is not the point: What we need is a coherent solution to the problem of data management and object persistency for an experiment which will last longer than a decade

16 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: SummarySummary A DBMS is required to manage the large data set of CMS (including user data) An ODBMS is the natural choice if OO is used in all SW There is no reason NOT to store event-data in the DB as a “Blob” or as a real object system Once an ODBMS will be deployed to manage the experiment data, it will be very natural to use it to manage any kind of data related to detector studies and physics analysis Objectivity/DB is proving to be a reliable product and the company is responding to our peculiar requirements

17 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Object Model

18 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Object Model

19 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Vincenzo Innocente, CERN/EP Persistency: Reconstructed Objects S Track S-Track Reconstructor S Track... Vector of Hits RecEven t Track SecInf o Track Constituen ts Reconstructed Objects produced by a given “algorithm” are managed by a Reconstructor. A Reconstructed Object (Track) is split into several independent persistent objects to allow their clustering according to their access requirements (physics analysis, reconstruction, detailed detector studies, etc.). The top level object acts as a proxy. Intermediate reconstructed objects (Hits) are transient and are cashed by value into the final objects.


Download ppt "Vincenzo Innocente, CERN/EP Persistency: 27-28 October 1999, CERN 1st Internal Review of CMS Software and Computing Why a Commercial ODMBS can suit CMS."

Similar presentations


Ads by Google