Download presentation
Presentation is loading. Please wait.
Published byClarissa Garrison Modified over 9 years ago
1
Lee Lueking 1 The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo, Matt Vranicar, Rich Wellner, Vicky White. URL: www-d0.fnal.gov/~lueking/sam/sequential.html. CHEP98 Sept. 3, 1998
2
Lee Lueking 2 What is The Sequential Access Model: SAM? l Sequential events: Data is stored in files as sequential events. l Data Tiers: Each event is stored in each of several data tiers. »The Event Data Unit (EDU) is the unit of data stored in each tier. »Physical event size: EDU5=5kB/event, EDU50=50kB/event, et cetera. l Physical streaming (clustering): Data categories based on Trigger or reconstruction information l Database catalog: File, Event and Processing Database; Information about the data - event-level, file-level, run-level. Also processing information; static and dynamic.
3
Lee Lueking 3 Data Organization Physical Clustering File & Event Database Event Information Tiers Warm Cache User and physics group (derived) data
4
Lee Lueking 4 How Do I Access Data? l Pipelines: Data access channels tailored for particular processing and analysis patterns. l Pipeline segments: Tapes, drives + Automated Tape Library + Storage Management System, network, group-shared and/or user-private analysis disk. l Example access modes: »Database: Access to event, trigger & other FEDB info. »Thumbnail: Disk resident sketch of each event. »Freight Train: Large data stream file server. »Event Picking: Random event selection from any data tier. »Small Data-set: One or a few files from any data tier.
5
Lee Lueking 5 Data Access Mass StoragePipelineConsumers =Disk Storage =Tape Storage =File =Event =Data flow =Group of Users =Single User=Pipeline Name
6
Lee Lueking 6 D0 Specifications l Data sizes l Further details »10-15 exclusive streams preferred. Based on L3 and/or Reconstruction information. »10% warm (tape or disk) caches of Raw and Medium EDU data. »Possible on-demand reconstruction.
7
Lee Lueking 7 Will SAM Scale to Run II?
8
Lee Lueking 8 Exclusive Streaming See Talk #182: Heidi Schellman, “Assurance of Data Integrity in a Petabyte Data Sample”
9
Lee Lueking 9 Data Handling System Buffer and Cache
10
Lee Lueking 10 SAM Design Details l Network distributed. l Easily scalable. l Works for all access modes. l Uses CORBA interfaces between modules. l Modules being written in JAVA, Python and C++. l File, Event and Processing Database uses ORACLE 8. l Not tightly coupled to: »Tape Mass Storage System. »CPU availability or Batch processing facilities on Farm or Analysis machines. »The D0 event data model.
11
Lee Lueking 11 Main Components l File and Event Database: Info about data location and processing details. (see poster session #127: Vicky White, “Use of ORACLE in Run II for D0” ) l Global Optimizer: Optimizes tape access and regulates bandwidth to various stations and activities. l Station: Management for a set of processing resources, including buffer and Data I/O. l Project Master: Responsible for managing projects which are lists of files to process. l Consumer/producer: Actual data processing l GUI and API user interfaces: Allow users to access data and administrators to control the system.
12
Lee Lueking 12 Components of SAM Station A Mass Storage System Global Optimizer Project Master Station B Station C Station D Station E Station F Consumer/ Producer Project User & Admin. Interface (API and GUI) DB and Information Servers Project Consumer/ Producer Consumer/ Producer Consumer/ Producer Consumer/ Producer
13
Lee Lueking 13 Files ID Name Format Size # Events File and Event Database Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Volume Project Data Tier Physical Data Stream Trigger Configuration Processing Info Run Event-File Catalog
14
Lee Lueking 14 (Mass Storage System Needs) l Provide access to data through file-level semantics. l Manage all tape activity within the ATL(S) and to/from shelf. l Allow data to be physically clustered in tape groupings or “file families”. l A mechanism for sending priorities with file requests to allow control over allocation of resources for various activities. l System must optimize the use of resources such as arm time and tape mounts. l Retry and fail-over features for failed tape read/write activities. l Open tape format to allow removal of tapes and exchange of data with other sites. l Reliable and unattended operation. See ENSTORE presentation #126: Don Petravick, “ENSTORE - An Alternative Data Storage System”
15
Lee Lueking 15 Access to Data through SAM l User or group defines a “project” by sending a list of constraints or file list to the Database Server. l DB Server returns a summary of the project (number of files, size and availability). l User is provided a list of possible “stations” where the project might run. He chooses one. l User registers with the station for a given (new or existing) project. He is given a unique “key” to use. l User’s client “consumer/ producer” sends the “project master” on the chosen station the “key”, and is given the next available file in the “project”.
16
Lee Lueking 16 Consumer- Read from Storage
17
Lee Lueking 17 Producer - Write to Storage
18
Lee Lueking 18 SAM Prototype l Status: Being built, ready early October. l Goals: »Populate and exercise the SAM database. »Specify projects - data to be accessed for processing or analysis. »Attach to a ‘Station’ which makes files for that Project accessible. »Interface to ENSTORE - get/put files - using SAM “Global Optimizer”. »Build Analysis programs using D0 framework. »Demonstrate multiple Stations, Projects, Analysis consumers. l Testing: Further testing in fall with SAM PC test-bed. l Beta version: Plan to make MC data available through SAM late ‘98.
19
Lee Lueking 19 SAM Prototype PC test-bed Example configuration Enstore Warehouse Consumers/Producers SAM Station Servers Network HUB Main Backbone To Database Server
20
Lee Lueking 20 Summary l Dzero plans to use a file based Sequential Access Model for run II data access. l The design is network distributed with CORBA communication between modules written in JAVA, PYTHON and C++. ORACLE 8 is used for the DB. l A SAM prototype is being built now and will be ready in Early October. l Hardware to construct a SAM test-bed will be assembled this fall to more fully test and understand the system. l We plan to employ the system for MC data by the end of `98, and perform large-scale testing with Run II hardware the first part of next year.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.