Lee Lueking 1 The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo,

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.
Chapter 20 Oracle Secure Backup.
The Premier Software Usage Analysis and Reporting Toolset CELUG Presentation – May 12, 2010 LT-Live : License Tracker’s License Server Monitor.
Netscape Application Server Application Server for Business-Critical Applications Presented By : Khalid Ahmed DS Fall 98.
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Technical Architectures
Component-Based Software Engineering Introducing the Bank Example Paul Krause.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
CHEP2006 Network Information and Management Infrastructure Igor Mandrichenko, Eileen Berman, Phil DeMar, Maxim Grigoriev, Joe Klemencic, Donna Lamore,
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Database Infrastructure for Application Development Designing tables and relations (Oracle Designer) Creating and maintaining database tables d0om - ORACLE.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
SAM plans and remote access Vicky White for the SAM team Lee Lueking, Vicky White, Heidi Schellman, Igor Terekhov, Matt Vranicar, Julie Trumbo, Rich Wellner,
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
SCADA. 3-Oct-15 Contents.. Introduction Hardware Architecture Software Architecture Functionality Conclusion References.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
SAM Job Submission What is SAM? sam submit …… Data Management Details. Conclusions. Rod Walker, 10 th May, Gridpp, Manchester.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
MediaGrid Processing Framework 2009 February 19 Jason Danielson.
September 4,2001Lee Lueking, FNAL1 SAM Resource Management Lee Lueking CHEP 2001 September 3-8, 2001 Beijing China.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
ORBMeeting July 11, Outline SAM Overview and Station description Resource Management Station Cache Station Prioritized Fair Share Job Control File.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
ALICE, ATLAS, CMS & LHCb joint workshop on
Workshop on Computing for Neutrino Experiments - Summary April 24, 2009 Lee Lueking, Heidi Schellman NOvA Collaboration Meeting.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Elizabeth Gallas August 9, 2005 CD Support for D0 Database Projects 1 Elizabeth Gallas Fermilab Computing Division Fermilab CD Grid and Data Management.
SAM - Sequential Data Access via Metadata Schema Metadata Functionality Workshop Glasgow University April 26-28,2004.
The PHysics Analysis SERver Project (PHASER) CHEP 2000 Padova, Italy February 7-11, 2000 M. Bowen, G. Landsberg, and R. Partridge* Brown University.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Database Server Concepts and Possibilities Lee Lueking D0 Data Browser Workshop April 8, 2002.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
3D Testing and Monitoring Lee Lueking LCG 3D Meeting Sept. 15, 2005.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Status & development of the software for CALICE-DAQ Tao Wu On behalf of UK Collaboration.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003.
November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /
1 Channel Access Concepts – IHEP EPICS Training – K.F – Aug EPICS Channel Access Concepts Kazuro Furukawa, KEK (Bob Dalesio, LANL)
Data Management with SAM at DØ The 2 nd International Workshop on HEP Data Grid Kyunpook National University Daegu, Korea August 22-23, 2003 Lee Lueking.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Introduction to the SAM System at DØ Physics 5391 July 1, 2002 Mark Sosebee U.T. Arlington.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
SAM: Past, Present, and Future Lee Lueking All Dzero Meeting November 2, 2001.
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
Netscape Application Server
CMS High Level Trigger Configuration Management
Lee Lueking D0RACE January 17, 2002
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Lee Lueking 1 The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo, Matt Vranicar, Rich Wellner, Vicky White. URL: www-d0.fnal.gov/~lueking/sam/sequential.html. CHEP98 Sept. 3, 1998

Lee Lueking 2 What is The Sequential Access Model: SAM? l Sequential events: Data is stored in files as sequential events. l Data Tiers: Each event is stored in each of several data tiers. »The Event Data Unit (EDU) is the unit of data stored in each tier. »Physical event size: EDU5=5kB/event, EDU50=50kB/event, et cetera. l Physical streaming (clustering): Data categories based on Trigger or reconstruction information l Database catalog: File, Event and Processing Database; Information about the data - event-level, file-level, run-level. Also processing information; static and dynamic.

Lee Lueking 3 Data Organization Physical Clustering File & Event Database Event Information Tiers Warm Cache User and physics group (derived) data

Lee Lueking 4 How Do I Access Data? l Pipelines: Data access channels tailored for particular processing and analysis patterns. l Pipeline segments: Tapes, drives + Automated Tape Library + Storage Management System, network, group-shared and/or user-private analysis disk. l Example access modes: »Database: Access to event, trigger & other FEDB info. »Thumbnail: Disk resident sketch of each event. »Freight Train: Large data stream file server. »Event Picking: Random event selection from any data tier. »Small Data-set: One or a few files from any data tier.

Lee Lueking 5 Data Access Mass StoragePipelineConsumers =Disk Storage =Tape Storage =File =Event =Data flow =Group of Users =Single User=Pipeline Name

Lee Lueking 6 D0 Specifications l Data sizes l Further details »10-15 exclusive streams preferred. Based on L3 and/or Reconstruction information. »10% warm (tape or disk) caches of Raw and Medium EDU data. »Possible on-demand reconstruction.

Lee Lueking 7 Will SAM Scale to Run II?

Lee Lueking 8 Exclusive Streaming See Talk #182: Heidi Schellman, “Assurance of Data Integrity in a Petabyte Data Sample”

Lee Lueking 9 Data Handling System Buffer and Cache

Lee Lueking 10 SAM Design Details l Network distributed. l Easily scalable. l Works for all access modes. l Uses CORBA interfaces between modules. l Modules being written in JAVA, Python and C++. l File, Event and Processing Database uses ORACLE 8. l Not tightly coupled to: »Tape Mass Storage System. »CPU availability or Batch processing facilities on Farm or Analysis machines. »The D0 event data model.

Lee Lueking 11 Main Components l File and Event Database: Info about data location and processing details. (see poster session #127: Vicky White, “Use of ORACLE in Run II for D0” ) l Global Optimizer: Optimizes tape access and regulates bandwidth to various stations and activities. l Station: Management for a set of processing resources, including buffer and Data I/O. l Project Master: Responsible for managing projects which are lists of files to process. l Consumer/producer: Actual data processing l GUI and API user interfaces: Allow users to access data and administrators to control the system.

Lee Lueking 12 Components of SAM Station A Mass Storage System Global Optimizer Project Master Station B Station C Station D Station E Station F Consumer/ Producer Project User & Admin. Interface (API and GUI) DB and Information Servers Project Consumer/ Producer Consumer/ Producer Consumer/ Producer Consumer/ Producer

Lee Lueking 13 Files ID Name Format Size # Events File and Event Database Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Volume Project Data Tier Physical Data Stream Trigger Configuration Processing Info Run Event-File Catalog

Lee Lueking 14 (Mass Storage System Needs) l Provide access to data through file-level semantics. l Manage all tape activity within the ATL(S) and to/from shelf. l Allow data to be physically clustered in tape groupings or “file families”. l A mechanism for sending priorities with file requests to allow control over allocation of resources for various activities. l System must optimize the use of resources such as arm time and tape mounts. l Retry and fail-over features for failed tape read/write activities. l Open tape format to allow removal of tapes and exchange of data with other sites. l Reliable and unattended operation. See ENSTORE presentation #126: Don Petravick, “ENSTORE - An Alternative Data Storage System”

Lee Lueking 15 Access to Data through SAM l User or group defines a “project” by sending a list of constraints or file list to the Database Server. l DB Server returns a summary of the project (number of files, size and availability). l User is provided a list of possible “stations” where the project might run. He chooses one. l User registers with the station for a given (new or existing) project. He is given a unique “key” to use. l User’s client “consumer/ producer” sends the “project master” on the chosen station the “key”, and is given the next available file in the “project”.

Lee Lueking 16 Consumer- Read from Storage

Lee Lueking 17 Producer - Write to Storage

Lee Lueking 18 SAM Prototype l Status: Being built, ready early October. l Goals: »Populate and exercise the SAM database. »Specify projects - data to be accessed for processing or analysis. »Attach to a ‘Station’ which makes files for that Project accessible. »Interface to ENSTORE - get/put files - using SAM “Global Optimizer”. »Build Analysis programs using D0 framework. »Demonstrate multiple Stations, Projects, Analysis consumers. l Testing: Further testing in fall with SAM PC test-bed. l Beta version: Plan to make MC data available through SAM late ‘98.

Lee Lueking 19 SAM Prototype PC test-bed Example configuration Enstore Warehouse Consumers/Producers SAM Station Servers Network HUB Main Backbone To Database Server

Lee Lueking 20 Summary l Dzero plans to use a file based Sequential Access Model for run II data access. l The design is network distributed with CORBA communication between modules written in JAVA, PYTHON and C++. ORACLE 8 is used for the DB. l A SAM prototype is being built now and will be ready in Early October. l Hardware to construct a SAM test-bed will be assembled this fall to more fully test and understand the system. l We plan to employ the system for MC data by the end of `98, and perform large-scale testing with Run II hardware the first part of next year.