STAR Event data storage and management in STAR V. Perevoztchikov Brookhaven National Laboratory,USA.

Slides:



Advertisements
Similar presentations
Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann.
Advertisements

Etter/Ingber Engineering Problem Solving with C Fundamental Concepts Chapter 4 Modular Programming with Functions.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Software Development Languages and Environments. Programming languages High level languages are problem orientated contain many English words are easier.
The Assembly Language Level
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
File Management Systems
Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.
The Event as an Object-Relational Database: Avoiding the Dependency Nightmare Christopher D. Jones Cornell University, USA.
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
Bertrand Bellenot root.cern.ch ROOT I/O in JavaScript Reading ROOT files from any web browser ROOT Users Workshop
STAR C OMPUTING Maker and I/O Model in STAR Victor Perevoztchikov.
M.Frank LHCb/CERN - In behalf of the LHCb GAUDI team Data Persistency Solution for LHCb ã Motivation ã Data access ã Generic model ã Experience & Conclusions.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
ALICE Offline week, CERN 21 February 2005 I. Hrivnacova 1 New geometry framework in MUON I.Hrivnacova IPN, Orsay ALICE Offline week, CERN 21 February 2005.
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.
CY2003 Computer Systems Lecture 09 Memory Management.
Developer workshop on I/O and persistence evolution LAL,Orsay, Feb 2012 Marcin Nowak (PAS BNL) Extended T/P Converters.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
The european ITM Task Force data structure F. Imbeaux.
CHEP 2003 March 22-28, 2003 POOL Data Storage, Cache and Conversion Mechanism Motivation Data access Generic model Experience & Conclusions D.Düllmann,
STAR Sti, main features V. Perevoztchikov Brookhaven National Laboratory,USA.
File Organization Lecture 1
A Technical Validation Module for the offline Auger-Lecce, 17 September 2009  Design  The SValidStore Module  Example  Scripting  Status.
STAR STAR VMC tracker V. Perevoztchikov Brookhaven National Laboratory,USA.
STAR Kalman Track Fit V. Perevoztchikov Brookhaven National Laboratory,USA.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
STAR Schema Evolution Implementation in ROOT I/O V. Perevoztchikov Brookhaven National Laboratory,USA.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
General Purpose ROOT Utilities Victor Perevoztchikov, BNL.
V.Fine ALICE-STAR Joint meeting April, 9, 2000 The STAR offline framework* V. Fine *) See also:
Session 1 Module 1: Introduction to Data Integrity
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
Introduction to OOP CPS235: Introduction.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
CINT & Reflex – The Future CINT’s Future Layout Reflex API Work In Progress: Use Reflex to store dictionary data Smaller memory footprint First step to.
GDML “Geometry Description Markup Language” by Daniele Francesco Kruse University of Rome “Tor Vergata” European Organization for Nuclear Research.
Reading ROOT files in (almost) any browser.  Use XMLHttpRequest JavaScript class to perform the HTTP HEAD and GET requests  This class is highly browser.
STAR Persistent Pointers in the STAR Micro-DST V. Perevoztchikov Brookhaven National Laboratory,USA.
ECE 456 Computer Architecture Lecture #9 – Input/Output Instructor: Dr. Honggang Wang Fall 2013.
STAR SVT Self Alignment V. Perevoztchikov Brookhaven National Laboratory,USA.
STAR Simulation. Status and plans V. Perevoztchikov Brookhaven National Laboratory,USA.
The Database Project a starting work by Arnauld Albert, Cristiano Bozza.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
File System Structure How do I organize a disk into a file system?
Database Management System
Methodology – Monitoring and Tuning the Operational System
Chapter 9 Structuring System Requirements: Logic Modeling
Indexing and Hashing Basic Concepts Ordered Indices
Automated support of STL containers
Methodology – Monitoring and Tuning the Operational System
Prepared by Jaroslav makovski
UNIT-I Introduction to Database Management Systems
Chapter 9 Structuring System Requirements: Logic Modeling
Event Storage GAUDI - Data access/storage Framework related issues
Chapter 4: Writing and Designing a Complete Program
Chapter 5 File Systems -Compiled for MCA, PU
Presentation transcript:

STAR Event data storage and management in STAR V. Perevoztchikov Brookhaven National Laboratory,USA

STAR Victor Perevoztchikov, BNL ALICE/STAR Introduction The Solenoidal Tracker At RHIC (STAR) is a large acceptance collider detector. STAR is designed to measure the momentum and identify several thousands of particles per event. About 300 Terabytes of data will be generated each year. To handle it, sophisticated data structure was developed. The main features are:  All the persistent data is organized as a set of named components;  All data objects are persistent. No separation between transient and persistent data structures;  Persistence is based on ROOT I/O;  Automatic schema evolution is implemented. It was developed on the base of non automatic ROOT schema evolution ;  The system is working.

STAR Victor Perevoztchikov, BNL ALICE/STAR STAR I/O components The STAR event is large and complex. Different parts of it are created in different processing stages. To ease management and increase storage flexibility we decided to split the event into more simple parts - named components  Each component lives in a separate file;  Each offline stage reads existing components and creates new ones, without modifying or extending old files;  The size of a full event in STAR is very big (~15-20MB). Separation of components allows to keep at least 50 events in one file, with the 1GB limit per a file;  It is easy to add a new offline stage, without reorganization of existing data;  It is easy to reprocess events from any stage;Any application can read only needed files;  The most frequently used files/components can reside on disks, the others on tape.

STAR Victor Perevoztchikov, BNL ALICE/STAR STAR I/O components (continued)  One component consists of a set of keyed records. These keys are based on Run/Event numbers. This allows to construct one full event, reading several files in parallel.  Records, in turn, contain a tree of named datasets;  The tree structure is not predefined. Addition or removal the tree does not affect the behaviour of modules which do not use them. All components are born equal. But some of them are more equal than others. They contain only one record per file.  hist component - contains named tree of histograms, filled by different modules during processing;  runco component - contains named tree of Run/Control parameters used in reconstruction;  tagdb component - contains named tree of physical tags, defined in different modules. This component is used to fill the STAR TagDB

STAR Victor Perevoztchikov, BNL ALICE/STAR STAR I/O components (continued)  A group of files/components with the same set of events organizes a “family” of files.  Each file, in addition to its component, keeps information about the all other components existing at that time. Thus, the last file in production chain keeps information about the all produced components and files.  It is enough to open any file and select needed component names. All the needed files from the ''family'' will be opened automatically. Such component organization looks rather complicated, but for a huge event size and a complex processing chain, it allows to split complex event into relatively simple parts to simplify management. In an environment of larger numbers of smaller events a simpler approach could be appropriate.

STAR Victor Perevoztchikov, BNL ALICE/STAR ROOT I/O in STAR ROOT I/O was chosen as the main mechanism of persistence in STAR. The main power of ROOT I/O is :  No artificial separation between transient and persistent data model.  User is free to develop complex data objects without concern for the I/O implementation, and -- importantly -- without building dependence on the used I/O scheme;  Automatic creation of a streamer method for user defined classes, which provides persistence of the object;  For special, more complicated, objects, user still can write this streamer method himself.

STAR Victor Perevoztchikov, BNL ALICE/STAR STAR I/O classes The component organization of STAR I/O is supported by STAR I/O classes: StTree,StBranch, StIOEvent and StFile ( no relation to ROOT TTree and Tbranch classes). StTree - container of components; StBranch - representation of STAR I/O component; StIOEvent - ROOT I/O connection; StFile - container of files. These classes perform I/O, add, fill, update of files/components They are heavily based on ROOT environment and work well. However when user modifies the definition of his class and ROOT rewrites the corresponding streamer method, then previously written data becomes inaccessible. ROOT does not yet support automatic schema evolution. Schema evolution aside, ROOT I/O is completely sufficient for us.

STAR Victor Perevoztchikov, BNL ALICE/STAR Automatic Schema evolution Complete schema evolution is an unachievable goal, but schema evolution with some limitations is possible. The limitations must be reasonable. There are two solutions:  Reading the old formatted data into memory and then the new application deals with the old data;  Reading and converting the old format into the new one and then the new application deals with the new format. The first approach was used in ZEBRA. ZEBRA can read any ZEBRA file and it is the problem of the application to work with the old format. This approach is completely impossible in C++. There is no way to create an old C++ object when the new one is declared. So, we must somehow convert the old data into the new format.

STAR Victor Perevoztchikov, BNL ALICE/STAR Automatic Schema evolution(continued) To achieve this, we have modified the ROOT disk format by splitting the whole task of writing into numerous, but simple ''atomic'' subtasks.  Each object is written separately. All its members are written close to each other;  Pointers to object are not followed immediately. Writing of these objects is delayed. This allows to skip unknown or unneeded object;  Member which is a C++ class is written as a separate object;  Streamer of an object is splited by "atomic" actions. An action is applied to one member. Each action described by: l Numeric code related to the kind of action. For example: § member of fundamental type; § pointer to fundamental type; §C++ object; §pointer to C++ object. §Etc...

STAR Victor Perevoztchikov, BNL ALICE/STAR Automatic Schema evolution(continued)  The description of these ''atomic'' actions is stored into the file together with data. It is not the description of written classes; it is the description of streamers, the description of how the objects were written. When the output format is formalized in such a way, we can compare the streamer descriptions of old and new data. Reading:  Read the streamer descriptions of old classes;  Got an old object. If class is known, create it. If not, skip object;  Got an old ''atom''. If we have the new ''atom'' of the same kind, type and name, fill it. If not, skip it. Some members of the new object could not be filled. It is the responsibility of the class designer to provide default filling of them. After conversion, an application should deal, with not filled members. But this is a problem of application schema evolution. I/O schema evolution is solved. %

STAR Victor Perevoztchikov, BNL ALICE/STAR Conclusions  STAR I/O based on component approach and ROOT I/O was implemented. It is has been working for one year;  ROOT I/O was modified and automatic schema evolution implemented. It is in testing stage now. Performance: l The same file size as for standard ROOT; l The same speed as standard ROOT I/O. Current status:  Components and ROOT I/O has been working for one year;  Codes of modified ROOT I/O and automatic schema evolution are ready and will be tested in real production.