STAR Schema Evolution Implementation in ROOT I/O V. Perevoztchikov Brookhaven National Laboratory,USA.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Buffer Manager 1. 2 Covered in week 1.
Advertisements

The Assembly Language Level
The Zebra Striped Network File System Presentation by Joseph Thompson.
1 Chapter Three Using Methods. 2 Objectives Learn how to write methods with no arguments and no return value Learn about implementation hiding and how.
(1) ICS 313: Programming Language Theory Chapter 10: Implementing Subprograms.
Guide To UNIX Using Linux Third Edition
ASP.NET Programming with C# and SQL Server First Edition
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
 2007 Pearson Education, Inc. All rights reserved C++ as a Better C; Introducing Object Technology.
C++ fundamentals.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Chapter 18 I/O in C. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display Standard C Library I/O commands.
1)Never start coding unless you understand the task! 2)Gather requirements first. This means identify the problem and ask questions about it. Now you kind.
1 Using Classes Object-Oriented Programming Using C++ Second Edition 5.
Using Classes Object-Oriented Programming Using C++ Second Edition 5.
XML I/O in ROOT S. Linev, R. Brun, H.G. Essel CHEP 2004.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
Status of SQL and XML I/O Sergey Linev, GSI, Darmstadt, Germany.
STAR C OMPUTING Maker and I/O Model in STAR Victor Perevoztchikov.
Developer workshop on I/O and persistence evolution LAL,Orsay, Feb 2012 Marcin Nowak (PAS BNL) Extended T/P Converters.
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
Chapter 4. INTERNAL REPRESENTATION OF FILES
Course: Software Engineering ©Alessandra RussoUnit 2: States and Operations, slide number 1 States and Operations This unit aims to:  Define: State schemas.
Learners Support Publications Classes and Objects.
CS4432: Database Systems II Record Representation 1.
CHEP 2003 March 22-28, 2003 POOL Data Storage, Cache and Conversion Mechanism Motivation Data access Generic model Experience & Conclusions D.Düllmann,
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Views Lesson 7.
CSC3315 (Spring 2008)1 CSC 3315 Subprograms Hamid Harroud School of Science and Engineering, Akhawayn University
CPS120: Introduction to Computer Science Functions.
ADTs and C++ Classes Classes and Members Constructors The header file and the implementation file Classes and Parameters Operator Overloading.
STAR Sti, main features V. Perevoztchikov Brookhaven National Laboratory,USA.
ROOT I/O for SQL databases Sergey Linev, GSI, Germany.
Topic 1 Object Oriented Programming. 1-2 Objectives To review the concepts and terminology of object-oriented programming To discuss some features of.
STAR STAR VMC tracker V. Perevoztchikov Brookhaven National Laboratory,USA.
STAR Kalman Track Fit V. Perevoztchikov Brookhaven National Laboratory,USA.
STAR Event data storage and management in STAR V. Perevoztchikov Brookhaven National Laboratory,USA.
Persistent Object References in ROOT1 Persistent Object References in ROOT I/O Status & Proposal CMS-ROOT meeting CERN- November 27 Ren é Brun ftp://root.cern.ch/root/refs.ppt.
Using of XML for object store S. Linev, GSI Using of XML for object store. S.Linev2 Content XML and existing packages XML and existing packages.
ROOT Tutorials - Session 51 ROOT Tutorials – Session 5 Dictionary Generation, rootcint, Simple I/O, Hands-on Fons Rademakers.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
General Purpose ROOT Utilities Victor Perevoztchikov, BNL.
Programming Fundamentals. Topics to be covered Today Recursion Inline Functions Scope and Storage Class A simple class Constructor Destructor.
9/28/2005Philippe Canal, ROOT Workshop TTree / SQL Philippe Canal (FNAL) 2005 Root Workshop.
More about Java Classes Writing your own Java Classes More about constructors and creating objects.
CINT & Reflex – The Future CINT’s Future Layout Reflex API Work In Progress: Use Reflex to store dictionary data Smaller memory footprint First step to.
Trees: New Developments1 Trees: New Developments Folders and Tasks ROOT Workshop 2001 June 13 FNAL Ren é Brun CERN
Reading ROOT files in (almost) any browser.  Use XMLHttpRequest JavaScript class to perform the HTTP HEAD and GET requests  This class is highly browser.
STAR Persistent Pointers in the STAR Micro-DST V. Perevoztchikov Brookhaven National Laboratory,USA.
STAR SVT Self Alignment V. Perevoztchikov Brookhaven National Laboratory,USA.
STAR Simulation. Status and plans V. Perevoztchikov Brookhaven National Laboratory,USA.
Chapter 10 Chapter 10 Implementing Subprograms. Implementing Subprograms  The subprogram call and return operations are together called subprogram linkage.
FUNCTIONS (C) KHAERONI, M.SI. OBJECTIVE After this topic, students will be able to understand basic concept of user defined function in C++ to declare.
Prof. I. J. Chung Data Structure #1 Professor I. J. Chung.
Classes (Part 1) Lecture 3
User-Defined Functions
Circular Buffers, Linked Lists
MARIE: An Introduction to a Simple Computer
Automated support of STL containers
Introduction to Primitive Data types
File I/O in C Lecture 7 Narrator: Lecture 7: File I/O in C.
Classes and Objects.
File Input and Output.
Prepared by Jaroslav makovski
Zooming on ROOT files and Containers
ENERGY 211 / CME 211 Lecture 8 October 8, 2008.
Internal Representation of Files
Introduction to Primitive Data types
Presentation transcript:

STAR Schema Evolution Implementation in ROOT I/O V. Perevoztchikov Brookhaven National Laboratory,USA

STAR Victor Perevoztchikov, BNL ALICE/STAR ROOT I/O in STAR ROOT I/O was chosen as the main mechanism of persistence in Star. The main power of root i/o is :  No artificial separation between transient and persistent data model.  User is free to develop complex data objects without concern for the I/O implementation, and -- importantly -- without building dependence on the used I/O scheme;  Automatic creation of a streamer method for user defined classes, which provides persistence of the object;  For special, more complicated, objects, user still can write this streamer method himself.

STAR Victor Perevoztchikov, BNL ALICE/STAR STAR I/O Classes The component organization of STAR I/O is supported by STAR I/O classes: StTree,StBranch, StIOEvent and StFile ( no relation to ROOT TTree and TBranch classes). StTree - container of components; StBranch - representation of STAR I/O component; StIOEvent - ROOT I/O connection; StFile - container of files. These classes perform I/O, add, fill, update of files/components They are heavily based on ROOT environment and work well. However when user modifies the definition of his class and ROOT rewrites The corresponding streamer method, then previously written data becomes inaccessible. ROOT does not yet support automatic schema evolution. Schema evolution aside, ROOT I/O is completely sufficient for us.

STAR Victor Perevoztchikov, BNL ALICE/STAR Automatic Schema Evolution Complete schema evolution is an unachievable goal, but schema evolution With some limitations is possible. The limitations must be reasonable. There are two solutions:  Reading the old formatted data into memory and then the new application deals with the old data;  Reading and converting the old format into the new one and then the new application deals with the new format. The first approach was used in ZEBRA. ZEBRA can read any ZEBRA file and it is the problem of the application to work with the old format. This approach is completely impossible in C++. There is no way to create an old C++ object when the new one is declared. So, we must somehow convert the old data into the new format.

STAR Victor Perevoztchikov, BNL ALICE/STAR Automatic Schema Evolution(continued) To achieve this, we have modified the ROOT disk format by splitting the whole task of writing into numerous, but simple ''atomic'' subtasks.  Each object is written separately. All its members are written close to each other;  Pointers to object are not followed immediately. Writing of these objects is delayed. This allows to skip unknown or unneeded object;  Member which is a C++ class is written as a separate object;  Streamer of an object is splited by "atomic" actions. An action is applied to one member. Each action described by: l Numeric code related to the kind of action. For example: § Member of fundamental type; § Pointer to fundamental type; §C++ object; §Pointer to C++ object. §Etc...

STAR Victor Perevoztchikov, BNL ALICE/STAR Automatic Schema Evolution(continued)  The description of these ''atomic'' actions is stored into the file together with data. It is not the description of written classes; it is the description of streamers, the description of how the objects were written. When the output format is formalized in such a way, we can compare the streamer descriptions of old and new data. Reading:  Read the streamer descriptions of old classes;  Got an old object. If class is known, create it. If not, skip object;  Got an old ''atom''. If we have the new ''atom'' of the same kind, type and name, fill it. If not, skip it. Some members of the new object could not be filled. It is the responsibility of the class designer to provide default filling of them. After conversion, an application should deal, with not filled members. But this is a problem of application schema evolution. I/O schema evolution is solved. %

STAR Victor Perevoztchikov, BNL ALICE/STAR Modified ROOT I/O Format Modified ROOT I/O format is based on the last version of standard ROOT I/O. There is no ideological difference, it is slightly different implementation. The main feature is the possibility to skip not only object but any member of object. It is essential for schema evolution.  Each object has a header containing flag,classid and objectsize ; l Short header - one 32 bits word (Classid < 1K && ObjectSize< 1M); l Long header - two 32 bits words. (Classid > 1K | | ObjectSize> 1M);  Object is written continuously, pointers are not followed immediately. l Simple members written immediately; l TObject* & TObject: buffer offset of object is written. Object itself is written separately and schema evolution is applied for it recursively; l General C++ class: written immediately preceded by its size;  Reference pointers either zero or offset of object in buffer. This is a new feature.;  List of used classes is written at the end of the record.

STAR Victor Perevoztchikov, BNL ALICE/STAR Automatically Generated Streamer The new automatically generated streamer method is more complicated than a standard one. An additional communication with TBuffer class is developed.  At the beginning, Streamer asked TBuffer is class modified? If not, it works as usual;  Before reading of a member streamer requests TBuffer permission to read it. If permission is granted, reading, if not, next member;  When Streamer returns, it could be called again, to read some skipped members. It could be happened if the order of members was changed When it works?  New member added;  Old member removed;  Type of member changed, ie. Int to float,int to short, etc…;  Array size changed;  Definition of class member changed;

STAR Victor Perevoztchikov, BNL ALICE/STAR Streamer example void TLorentzVector::Streamer(TBuffer &R__b) { //Stream an object of class TLorentzVector. void (*R__bs)(TObject *,TBuffer*); Version_t R__v = 0; if (R__b.IsReading()) { int R__Comp = R__b.DoIt(); if (R__Comp || R__b.DoIt(10)) R__v = R__b.ReadVersion(); TObject::Streamer(R__b); if (R__Comp || R__b.DoIt(40,"fX","double",8)) R__b >> fX; if (R__Comp || R__b.DoIt(40,"fY","double",8)) R__b >> fY; if (R__Comp || R__b.DoIt(40,"fZ","double",8)) R__b >> fZ; if (R__Comp || R__b.DoIt(40,"fE","double",8)) R__b >> fE; } else { // Writing part is skipped }} //___________________________________________________________ ___________________ void TLorentzVector::ShowMembers(TMemberInspector &R__insp, char *R__parent) { // Inspect the data members of an object of class TLorentzVector. TClass *R__cl = TLorentzVector::IsA(); Int_t R__ncp = strlen(R__parent); if (R__ncp || R__cl || R__insp.IsA()) { } R__insp.Inspect(R__cl, R__parent, "fX", &fX); R__insp.Inspect(R__cl, R__parent, "fY", &fY); R__insp.Inspect(R__cl, R__parent, "fZ", &fZ); R__insp.Inspect(R__cl, R__parent, "fE", &fE); TObject::ShowMembers(R__insp, R__parent); }

STAR Victor Perevoztchikov, BNL ALICE/STAR TStreamer class To perform schema evolution, old and new classes should be compared; Class TStreamer keeps information how class was written. Based on this, TBuffer::DoIt method allows or disallows to Streamer to read current part of TBuffer.  One instance of TStreamer related to one instance of Tclass;  It keeps old class check sum. Comparing this with current class, TStreamer makes decision was class modified or not;  It keeps information about all “atomic” operations of old class;  Above information is produced by exactly the same code by which automatic Streamer was generated;  Hash list of TStreamer’s belongs to TFile and saved during Close()  When TFile is opened this information restored and acceptable to TBuffer

STAR Victor Perevoztchikov, BNL ALICE/STAR Modified ROOT classes  TStreamer - new class introduced;  TClass - GetClassID method, returns class check sum;  TFile - add TStreamer hash list;  rootcint - new automatic Streamer generated;  TBuffer - big modifications;  TKey - minor modifications;

STAR Victor Perevoztchikov, BNL ALICE/STAR Conclusions  ROOT I/O was modified and automatic schema evolution implemented. It is in testing stage now. Performance: l Size of file the same as in standard ROOT; l The same speed as standard ROOT. Current status:  Codes of modified ROOT I/O and automatic schema evolution are ready and should be tested in real production.

STAR Victor Perevoztchikov, BNL ALICE/STAR Future Future of STAR-like ROOT schema evolution, as usual for any future, is unclear. It could be 3 scenarios:  The best one: our automatic schema evolution will be accepted by ROOT framework;  It will not be accepted. Then we introduce StTBuffer,StTFile, etc… inherited from standard ROOT classes and will be used in STAR. It is not convenient, but possible solution. (As Rene told - schism);  Somebody will implement better schema evolution and we will accept it