David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.

Slides:



Advertisements
Similar presentations
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Advertisements

PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
© 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
Web Database Design Session 6 and 7 Matakuliah: Web Database Tahun: 2008.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
1 Advanced Computer Programming Databases. Overview What is a database? Database Basics Database Components Data Models Normalization Database Design.
ITEC224 Database Programming
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
ATLAS Data Periods in COMA Elizabeth Gallas - Oxford ATLAS Software and Computing Week CERN – April 4-8, 2011.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
David Adams ATLAS AJDL: Analysis Job Description Language David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting.
David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Event Data History David Adams BNL Atlas Software Week December 2001.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
INFSO-RI Enabling Grids for E-sciencE ATLAS Distributed Analysis A. Zalite / PNPI.
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting.
David Adams ATLAS DIAL status David Adams BNL November 21, 2002 ATLAS software meeting GRID session.
An RTAG View of Event Collections, and Early Implementations David Malon ATLAS Database Group LHC Persistence Workshop 5 June 2002.
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL June 7, 2004 BNL Technology Meeting.
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
D. Adams, D. Liko, K...Harrison, C. L. Tan ATLAS ATLAS Distributed Analysis: Current roadmap David Adams – DIAL/PPDG/BNL Dietrich Liko – ARDA/EGEE/CERN.
DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
1 DATABASE TECHNOLOGIES (Part 2) BUS Abdou Illia, Fall 2015 (September 9, 2015)
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
The ATLAS TAGs Database - Experiences and further developments Elisabeth Vinek, CERN & University of Vienna on behalf of the TAGs developers group.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
M. Oldenburg GridPP Metadata Workshop — July 4–7 2006, Oxford University 1 Markus Oldenburg GridPP Metadata Workshop July 4–7 2006, Oxford University ALICE.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
David Adams ATLAS Hybrid Event Store Integration with Athena/StoreGate David Adams BNL March 5, 2002 ATLAS Software Week Event Data Model and Detector.
Database Development Lifecycle
David Adams Brookhaven National Laboratory September 28, 2006
Metadata Editor Introduction
CHAPTER 3 Architectures for Distributed Systems
OGSA Data Architecture Scenarios
Databases and Information Management
Systems Analysis and Design
Databases and Information Management
Database Management System
© 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke
Presentation transcript:

David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session2 Contents Functional definition of dataset Dataset properties Dataset categories Dataset category associations Properties and categories Implementation Development plan

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session3 Functional definition of Dataset Define dataset by the way it is used: Dataset is the unit of data with which users normally interact There are two use cases User selects a dataset for processing User hands dataset to a “system” that distributes processing, gathers and merges results, and returns the result to the user –Result can be another dataset or –Summary analysis data such as histograms What properties are datasets required to have to satisfy these use cases?

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session4 Dataset properties 0. Identity Dataset must have an unique index and/or name 1. Content Description of the type of data in the dataset –Event or non-event data –Simulation, reconstruction, –ESD, AOD, … –Jets, tracks, electrons,… 2. Location Where to find the data –Logical files, physical files, site,… 3. Mapping Which content is at which location?

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session5 Dataset properties (cont) 4. Provenance Prescription for creating the data E.g. input dataset and transformation 5. History Details of production beyond provenance –How task was split into jobs, –Processing node and time for each job, … 6. Labels Assigned metadata outside other categories, e.g. –Integrated luminosity –Result of quality checks –Flag indicating ok for use in published analyses

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session6 Dataset properties (cont) 7. Mutability May dataset be modified? Possible states: locked, unlocked, extensible, … 8. Compositeness Dataset made up of other datasets. Two cases: –Construction: provenance is the list of sub-datasets >E.g. the summer dataset is defined to be the union of the June, July and August datasets. –Assignment: factorization into sub-datasets >Typically to reflect data placement >E.g. a representation of a global dataset might include sub-datasets in New York, Paris and Moscow.

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session7 Dataset categories Categorize datasets according to the extent of their location information: Virtual –no location Logical –Collection of logical files Physical –Collection of physical files –Inferred from logical DS and file catalog (Magda, RLS, …) Staged –Collection of “jobs” >each sub-dataset matched to CPU/process –Not important for discussion here

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session8 Dataset category associations One-to-many association as we move down these categories Virtual dataset may map to multiple logical datasets –Optimize file size for local mass store –Copy out only selected events (vs. all plus event list) –Move data into a DB at one site –Composite representation along placement boundaries Logical dataset maps to many physical datasets –Many combinations inferred from file catalog –No need to record all these datasets –But system (or user) might record LDS used to process one task and reuse it for the next request

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session9 Dataset category associations (example) VDS 1 LDS 1-1 {LF1 LF2} LDS 1-2 {LF3} PDS {PF1A PF2A} PDS {PF1B PF2B} PDS {PF1A PF2B} PDS {PF3A} PDS {PF3B} Virtual Logical Physical

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session10 Dataset category associations (cont) OO view LDS (logical dataset) “is a” VDS (virtual dataset) PDS (physical dataset) “is a” VDS –And “is a” LDS if files are in file catalog Representation Use the word representation for this relationship –In the figure, LDS 1-2, LDS 1-2 are two different representations of VDS 1

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session11 Properties and categories Virtual datasets have Content –Might have to have a representation to know all content –All representations have the same content Provenance Labels –Most are associated with the virtual view –Again may need representation to evaluate some labels Mutability Compositeness –Sub-datasets are also virtual –Composite because it was constructed by merging datasets

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session12 Properties and categories (cont) Logical datasets add Location –Which logical files –Extend to add sites where LDS can be found? Mapping History –Perhaps stored in job tracking system Compositeness –Constituents are logical datasets –Typically they reflect data placement

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session13 Implementation Dataset metadata catalog (DMC) Holds VDS properties User selects dataset based on these properties Receives a VDS name or ID DIAL wants programmatic interface for DMC VDS class Programmatic interface to access VDS properties –E.g. what content is in a given VDS? Primary user is workload management system (WMS) rather than analysis users –E.g. to verify that a VDS has the content required for a given analysis task

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session14 Implementation (cont) Dataset replica catalog (DRC) Enables WMS to select a LDS representing a VDS Also get info about the sites where LDS can be found LDS class Provide access to VDS, logical files (location) and sub- datasets (compositeness), if present –WMS uses this for splitting, matchmaking and staging (files) Means to select content or events to define an LDS with a subset of the data in the original LDS –New LDS has a set of files that is subset of original set –Used by WMS for splitting DS’s and staging files

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session15 Implementation (cont) PDS/SDS catalogs and classes At a site, locate a PDS or SDS representing an LDS Not needed: WMS can use file replica catalog But PDS/SDS provide means to record the choices made for one task so they can be used for the next task using the same LDS However these pieces should wait to see what if (any) requirements come from WMS’s Following table shows where properties are recorded

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session16 Implementation (cont)

David Adams ATLAS September 24, 2003Datasets… ATLAS SW – DB session17 Development plan Define generic VDS and LDS interfaces Implement VDS and LDS for CBNT files Existing CbntDataset is starting point Define interface for DMC Fill DMC for DC1 CBNT datasets Connect to AMI Create trivial DRC for these datasets One LDS for each VDS  DIAL users can analyze all DC1 data Data must be staged at BNL and/or CERN Or file catalog does staging