Download presentation
Presentation is loading. Please wait.
1
CLEO III Datastorage Martin Lohner Cornell University CHEP 2000
2
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 2 Overview CLEO III Experiment Trivia Use of Commercial Software in CLEO Datastorage as part of the CLEO III Data Access System Datastorage Design Decisions to limit Complexity Summary
3
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 3 CLEO III The CLEO experiment is located on Cornell Campus, Ithaca NY, USA, fed by an e+e- accelerator, CESR, taking data at or around the 4S Upsilon (ca. 10.6 GeV) The CLEO III experiment, scheduled for physics data taking around March 2000, will collect on the order of 200 TB of data over its lifetime. Challenge for CLEO III: how to store such a large dataset and allow efficient access.
4
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 4 CLEO III Trivia On Cornell Campus, Ithaca, NY, USA, fed by a e + e - accelerator, CESR, taking data at ~4S Upsilon (10.6 GeV) Lean-mean collaboration w/150 physicists from 25 insts. Engineering data taking since Dec ‘99 Physics data taking scheduled for mid-April ‘00 20 TB in the first year, 200 TB of data over 5 years Event size 40 kB at 100 Hz 4MB/s How to store such a large dataset with efficient access?
5
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 5 Setting the Stage Datastorage is mission-critical for many years –Probably longer than most database companies will last Resources limited at a (relatively) small experiment –Shortage of code development personnel Uncertainly in future of commercial databases
6
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 6 Why a Database? Why Objectivity? Ease of Management. Scalability. –Who wants to know where those files are? –And which file contains what run? Efficient access to sub-components –e.g. only access tracks rather than entire event Does an OODBMS fit the bill? Why not an RDBMS? –Performance? A number of ongoing and proposed HEP experiments (most notably BaBar) have adopted Objectivity to store Terabytes and Petabytes of data.
7
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 7 Use of Commercial Software in CLEO Before (CLEO-II) never relied on commercial software Now: –Objectivity/DB for Datastorage –Visigenics (Corba) for middleware Dangers: –binaries instead of source code –tightly coupled to OS versions and compilers would like Objectivity for Alpha/Linux –lifetime of company vs. lifetime of experiment –rely on manuals and customer support –find a bug: trial&error; report it, can’t fix it yourself CLEO III online stores data in our own binary format instead of directly to Objectivity database.
8
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 8 CLEO III Data Access System Datastorage is part of the CLEO III Data Access System: described further in A216 (poster) is designed to be input/output data format independent Data-bus consisting of Records (e.g. Event Record) –synchronized with respect to each other to provide consistent view of the CLEO detector at one instant in time Records can be served by Sources, written to Sinks Any storage format plugs in via a concrete Source and/or Sink a la device driver
9
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 9 CLEO III Data Access System (cont.) Separation between transient and persistent objects: –user analysis written in terms of transient objects independent of storage formats! –No drawback except for potential performance penalty -- NOT we disallow links between objects (except via index-list objects) data is served on demand (via proxies) Main data access application “Suez”: –skeleton program, run job setup and control –dynamic loading and/or static linking of modules –Database code loaded as “Objectivity Source/Sink” module
10
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 10 Database Layout: CLEO concepts Natural unit of CLEO III data: the Record –Record contains different types of data e.g. Event Record contains Tracks and Showers Sets of Records make up “Streams” –e.g. Event Stream, Geometry Alignment Stream Sets of Records are grouped in data-taking “Runs” –accelerator fill, same run conditions, same run number
11
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 11 Database Layout (cont.) Translate to Database: –Records become Record-“Headers” w/ links to different data types –Different Streams of Records saved independently –Everything grouped by Run
12
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 12 Database Layout (cont.) Clustering by Event classification: –hadronic, bhabha, tau etc. Tags with fast-selection criteria –e.g. number of tracks
13
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 13 Schema Management, StorageHelpers, Compression Schema is type information of stored data in database Schema changes are non-trivial –one Schema for the entire federation of databases –changing=evolving types requires updating the stored objects –avoid corrupting the Schema at all costs User data types in official Schema? Storing data as real types prevents compression at object level –then can only do compression at database=file level.
14
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 14 Schema Mngmnt, StorageHelpers, Compression (cont.) Different Approach: All data types stored as Binary Blobs Only data access layer knows how to interpret blobs –we do store storage information (compression info, etc.) No direct links between objects –want to support other storage formats (e.g. sequential access files) –instead use index-list objects (“Lattice”) Allows compression at object level Conversion blob to transient object via StorageHelpers –see C215 in poster session –basic serialization approach
15
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 15 Data Organization Objy has fixed limits on amount of objects, containers, databases in a federation: –No intention to store all data from day one in one federation –Divide data into “data sets” (run ranges) in separate federations natural division are data taking periods between shutdowns (run 1-1105 = fdb1, run 1106-2452 = fdb2, etc.) –Have to require the schema to be the same for all federations Necessary to allow access to several federations in one job Our schema is simple (data are binary blobs) Storage of “Constants” in separate federation –different uses, different sizes –access to second federation via Corba
16
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 16 User Data We have not fully addressed how to handle User Data –have ideas, but no definite plan yet Objectivity allows access to only one federation in process –We don’t want to store User Data in the official database –Forced to use Corba to access Constants in another federation Why are we not worried? –Binary Blobs: User Data don’t impact Schema –Our ultra-flexible and storage-independent Data Access System allows handling of multiple sources and sinks (different storage formats) in the same job! We will most likely need another format –based on historical CLEO formats –will CLEO collaborators install Objy at their home institutions?
17
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 17 Concurrency Issues Objectivity locking is done at container level Objy standard mode allows many readers XOR one writer Objy MROW mode allows many readers AND ONE writer –can lead to logical data corruption if used improperly In Reconstruction want to parallelize task: –have many processes update the database update separate containers -- no problem update central objects/containers -- problem (e.g. compression information stored centrally) –potential lock collisions –could preallocate in standalone job -- maintenance issue!
18
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 18 Objectivity and Mass Storage Objy AMS server with Veritas Storage Migrator (=HSM) on top of tape robot holding AIT (AIT-II) tapes Objy 5.2: –OOFS layer allows hooks into underlying file system –prior to 5.2 had to deal with timeouts (>25s) due to HSM latency –plan to use Defer-Request Protocol to deal with time-outs –plan to use Redirect-Request Protocol for load-balancing
19
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 19 Current Status Support Solaris 2.x and OSF1 4.x; Linux/Intel soon –found name clashes w/ persistent Objy STL vs normal STL (abandoned persistent STL for our own implementation in terms of “ooVArray”) –new Objectivity 5.2 Java-style collection classes look promising Tested in full-blown Mock-data reconstruction challenge in summer 1999 and various other tests. Deployed database with our engineering runs –~200GB worth of data Another Mock-data challenge planned shortly after CHEP
20
2/7/00Martin Lohner, Cornell U A254 CHEP 2000 20 Summary Described various design decisions to limit complexity of our data storage system CLEO III Data Access System is format-independent! –User code independent of storage formats (no recomp/relinking) –Any number of storage formats can be used in the same job –Objectivity is “just another” storage format Major advantage of our system!! Data storage in Objy as Binary Blobs –more like a data location manager than true object store –no schema evolution problems –storage of user data? Stress-tested and now deployed for data –Found good performance with Objy 5.2!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.