Data Management Overview David M. Malon U.S. ATLAS Computing Meeting Brookhaven, New York 28 August 2003.

Slides:



Advertisements
Similar presentations
Athena/POOL integration
Advertisements

1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
Information Systems and Data Acquisition for ATLAS What was achievedWhat is proposedTasks Database Access DCS TDAQ Athena ConditionsDB Time varying data.
Argonne National Laboratory ATLAS Core Database Software U.S. ATLAS Collaboration Meeting New York 22 July 1999 David Malon
ATLAS-Specific Activity in GridPP EDG Integration LCG Integration Metadata.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
Conditions DB in LHCb LCG Conditions DB Workshop 8-9 December 2003 P. Mato / CERN.
Alignment Strategy for ATLAS: Detector Description and Database Issues
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
The ATLAS Database Project Richard Hawkings, CERN Torre Wenaus, BNL/CERN ATLAS plenary meeting June 24, 2004.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
ATLAS, U.S. ATLAS, and Databases David Malon Argonne National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
The History and Future of ATLAS Data Management Architecture D. Malon, S. Eckmann, A. Vaniachine (ANL), J. Hrivnac, A. Schaffer (LAL), D. Adams (BNL) CHEP’03.
ATLAS, U.S. ATLAS, and Databases David Malon Argonne National Laboratory PCAP Review of U.S. ATLAS Computing Project Argonne National Laboratory
Magda Distributed Data Manager Status Torre Wenaus BNL ATLAS Data Challenge Workshop Feb 1, 2002 CERN.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
“Primary Numbers” Database for ATLAS Detector Description Parameters March 24, 2003 S. Eckmann, D. Malon, A. Vaniachine (ANL) P. Nevski, T. Wenaus (BNL)
An RTAG View of Event Collections, and Early Implementations David Malon ATLAS Database Group LHC Persistence Workshop 5 June 2002.
The Persistency Patterns of Time Evolving Conditions for ATLAS and LCG António Amorim CFNUL- FCUL - Universidade de Lisboa A. António, Dinis.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
ATLAS Offline Database Architecture for Time-varying Data, with Requirements for the Common Project David M. Malon LCG Conditions Database Workshop CERN,
SEAL Core Libraries and Services CLHEP Workshop 28 January 2003 P. Mato / CERN Shared Environment for Applications at LHC.
SEAL Project Core Libraries and Services 18 December 2002 P. Mato / CERN Shared Environment for Applications at LHC.
The POOL Persistency Framework POOL Project Review Introduction & Overview Dirk Düllmann, IT-DB & LCG-POOL LCG Application Area Internal Review October.
GDB Meeting - 10 June 2003 ATLAS Offline Software David R. Quarrie Lawrence Berkeley National Laboratory
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
I/O Infrastructure Support and Development David Malon ATLAS Software Technical Interchange Meeting 9 November 2015.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Alexandre Vaniachine (ANL) LCG PEB Applications Area Meeting November 20, 2002 Alexandre Vaniachine (ANL) MySQL Service Plans and Needs in ATLAS.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Data Management Overview David M. Malon Argonne U.S. ATLAS Physics and Computing Project Advisory Panel Meeting Berkeley, CA November 2002.
Data Management Overview David M. Malon Argonne U.S. LHC Computing Review Berkeley, CA January 2003.
12 February 2004 ATLAS presentation to LCG PEB 1 Why ATLAS needs MySQL  For software developed by the ATLAS offline group, policy is to avoid dependencies.
Data Management Overview David M. Malon Argonne NSF/DOE Review of U.S. ATLAS Physics and Computing Project NSF Headquarters 20 June 2002.
Valeri Fine Athena/POOL integration.
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
Grid Technologies for Distributed Database Services 3D Project Meeting CERN, May 19, 2005 A. Vaniachine (ANL)
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
Introduction S. Rajagopalan August 28, 2003 US ATLAS Computing Meeting.
Summary of persistence discussions with LHCb and LCG/IT POOL team David Malon Argonne National Laboratory Joint ATLAS, LHCb, LCG/IT meeting.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL DOE/NSF Review of US LHC Software and Computing Fermilab Nov 29, 2001.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Database Access Patterns in ATLAS Computing Model G. Gieraltowski, J. Cranshaw, K. Karr, D. Malon, A. Vaniachine ANL P, Nevski, Yu. Smirnov, T. Wenaus.
D. Duellmann, IT-DB POOL Status1 POOL Persistency Framework - Status after a first year of development Dirk Düllmann, IT-DB.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
VI/ CERN Dec 4 CMS Software Architecture vs Hybrid Store Vincenzo Innocente CMS Week CERN, Dec
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
POOL Based CMS Framework Bill Tanenbaum US-CMS/Fermilab 04/June/2003.
Magda Distributed Data Manager Torre Wenaus BNL October 2001.
Database Replication and Monitoring
(on behalf of the POOL team)
S. Rajagopalan August 28, 2003 US ATLAS Computing Meeting
POOL persistency framework for LHC
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
ATLAS DC2 & Continuous production
Offline framework for conditions data
Presentation transcript:

Data Management Overview David M. Malon U.S. ATLAS Computing Meeting Brookhaven, New York 28 August 2003

David Malon, ANL U.S. ATLAS Computing Meeting 2 Outline  Technology and technology transition  POOL and POOL integration  Detector description and primary numbers  Interval-of-validity databases and conditions  Data Challenge databases: Magda, AMI, metadata, and virtual data  Near-term plans  Interactions with online, and with Technical Coordination  Staffing  Conclusions

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 3 Technology transition  ATLAS database strategy has been, consistently. to employ “LHC common solutions wherever possible”  Until last year this meant Objectivity/DB as the baseline technology  Objectivity/DB conversion services retained as a reference implementation until developer releases leading to 6.0.0; retired at the end of 2002  Today’s baseline is LCG POOL (hybrid relational and ROOT-based streaming layer)  ATLAS is contributing staff to POOL development teams  All ATLAS event store development is POOL-based  Transition period technology: AthenaROOT conversion service  Deployed pre-POOL; provided input to persistence RTAG that led to POOL  AthenaROOT service will, like Objectivity, be retired once POOL infrastructure is sufficiently mature

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 4 What is POOL?  POOL is the LCG Persistence Framework  Pool of persistent objects for LHC  Started by LCG-SC2 in April ’02  Common effort in which the experiments take over a major share of the responsibility  for defining the system architecture  for development of POOL components  ramping up over the last year from 1.5 to ~10FTE  Dirk Duellmann is project leader  Information on this and the next several slides borrowed from him  See for details

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 5 POOL and the LCG Architecture Blueprint  POOL is a component-based system  A technology-neutral API  Abstract C++ interfaces  Implemented reusing existing technology  ROOT I/O for object streaming  complex data, simple consistency model  RDBMS for consistent metadata handling  simple data, transactional consistency  POOL does not replace any of its components technologies  It integrates them to provides higher level services  Insulates physics applications from implementation details of components and technologies used today

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 6 POOL Work Package breakdown  Storage Service  Stream transient C++ objects into/from storage  Resolve a logical object reference into a physical object  File Catalog  Track files (and physical/logical names) and their descriptions  Resolve a logical file reference (FileID) into a physical file  Collections and Metadata  Track (large, possibly distributed) collections of objects and their descriptions (“tag” data); support object-level selection  Object Cache (DataService)  Track and manage objects already in transient memory to speed access

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 7 POOL Internal Organization RDBMS Storage Svc ?

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 8 POOL integration into ATLAS  AthenaPOOL conversion service prototype is available in current releases  Scheduled to be ready for early adopters in Release  Based upon this month’s “user release” of POOL  POOL releases have been pretty much on schedule  Current release, 1.2.1, incorporates most recent LCG SEAL release  Several integration issues are resolved in a stopgap fashion; much work remains  Further LCG dictionary work (SEAL) will be required to represent ATLAS event model

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 9 ATLAS POOL/SEAL integration  Many nuisance technical obstacles to POOL integration into ATLAS  Not long-term concerns, but in the first year, they consume a great deal of time  Integration of POOL into ATLAS/Athena has not been trivial  Examples  Conflicts in how cmt/scram/ATLAS/SPI handle build environments, compiler/linker settings, external packages and versions, …  Conflicts between Gaudi/Athena dynamic loading infrastructure and SEAL plug-in management  Conflicts in lifetime management with multiple transient caches (Athena StoreGate and POOL DataSvc)  Issues in type identification handling between Gaudi/Athena and the emerging SEAL dictionary  Keeping up with moving targets (but rapid LCG development is good!)

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 10 ATLAS contributions to POOL  ATLAS has principal responsibility for POOL collections and metadata work package  Principal responsibility for POOL MySQL and related (e.g., MySQL++) package and server configuration  Also contributing to foreign object persistence for ROOT  Contributions to overall architecture, dataset ideas, requirements, …  Related: participation in SEAL project’s dictionary requirements/design effort  Expect to contribute strongly to newly endorsed common project on conditions data management when it is launched

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 11 Notes on relational technology  POOL relational layer work is intended to be readily portable—to make no deep assumptions about choice of relational technology  Collections work is currently implemented in MySQL; file catalog has MySQL and Oracle9i implementations  Heterogeneous implementations are possible, e.g., with Oracle9i at CERN and MySQL on small sites  CERN IT is an Oracle shop  Some planning afoot to put Oracle at Tier1s, and possibly beyond  Note that non-POOL ATLAS database work has tended to be implemented in MySQL; like POOL, avoiding technology-specific design decisions

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 12 Detector description databases  “Primary numbers” (numbers that parameterize detector description) database deployed based upon NOVA  Leverages externally developed software  Begun as a BNL LDRD project (Vaniachine, Nevski, Wenaus)  Current implementation based upon MySQL  NOVA also used for other ATLAS purposes  Used increasingly in Athena directly (NovaConversionSvc), via GeoModel, and by standalone Geant3 and Geant4 applications  New data continually being added  Most recently toroids/feet/rails, tiletest data, materials

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 13 NOVA work  Automatic generation of converters and object headers from database content integrated into nightly build infrastructure  Work needed on input interfaces to NOVA, and on consistent approaches to event/nonevent data object definition  Sasha Vaniachine is the principal contact person

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 14 NOVA browser screenshots

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 15 Interval-of-validity (IOV) databases  ATLAS was a principal contributor of requirements to an LHC conditions database project organized under the aegis of RD45  LCG SC2 in June endorsed a common project on conditions data, to begin soon, with this work as its starting point  Lisbon TDAQ group has provided a MySQL implementation of common project interface  ATLAS online/offline collaboration  Offline uses Lisbon-developed implementation in its releases as an interval- of-validity database  Offline provides an Athena service based upon this implementation  Prototype (with interval-of-validity access to data in NOVA) in releases now  Due for early adopter use in Release 7.0.0

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 16 IOV databases  ATLAS database architecture extends “usual” thinking about conditions service  Interval-of-validity database acts as a registration service and mediator for many kinds of data  Example:  Geometry data is written in an appropriate technology (e.g., POOL), and later “registered” in the IOV database with a range of runs as its interval of validity  Similarly for calibrations produced offline  No need to figure out how to represent complex objects in another storage technology (conditions database) when this is a problem already solved for event store technology (POOL)

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 17 IOV Database Conditions Data Writer Conditions or other time- varying data 1. Store an instance of data that may vary with time or run or … 2. Return reference to data 3. Register reference, assigning interval of validity, tag, …

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 18 IOV Database Athena Transient Conditions Store Conditions data 1. Folder (data type), timestamp, tag, 2. Ref to data (string) 3. Dereference via standard conversion services 4. Build transient conditions object Conditions data client

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 19 Conditions and IOV people  LAL Orsay (Schaffer, Perus) is leading the IOV database integration effort  LBNL (Leggett) provides the transient infrastructure to handle time validity of objects in the transient store (StoreGate) with respect to current event timestamp  Hong Ma (liquid argon) is an early adopter of the Athena-integrated IOV database service  Joe Rothberg (muons) is writing test beam conditions data to the database outside of Athena, for later Athena-based processing

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 20 Conditions Data Working Group  A Conditions Data Working Group has been newly commissioned, headed by Richard Hawkings (calibration and alignment coordinator)  Charged with articulating a model for conditions/calibration data flow between online/TDAQ and offline, including DCS, for understanding and recording rates and requirements, and more—not just conditions data persistence  Contact Richard (or me, I guess) if you’d like to contribute

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 21 Production, bookkeeping, and metadata databases  Data Challenges have provided much of the impetus for development of production, bookkeeping, and metadata databases  Strong leveraging of work done under external auspices  MAGDA (BNL) used for file/replica cataloging and transfer  Developed as an ATLAS activity funded by PPDG  Magda/RLS integration/transition planned prior to DC2  AMI database (Grenoble) used for production metadata  Some grid integration of AMI (e.g., with EDG Spitfire)  Small focused production workshop held at CERN earlier this month to plan production infrastructure for Data Challenge 2  Rich Baker, Kaushik De, Rob Gardner involved on the U.S. side  Report due soon

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 22 Metadata handling  ATLAS Metadata workshop held July in Oxford  Issues:  Metadata infrastructure to be deployed for Data Challenge 2  Integration of metadata at several levels from several sources  Collection-level and event-level physics metadata  Collection-level and event-level physical location metadata  Provenance metadata  …  Common recipe repository/transformation catalog to be shared among components  Workshop report due soon (overdue, really)

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 23 Virtual data catalogs  Data Challenges have been a testbed for virtual data catalog prototyping, in AtCom, with VDC, and using the Chimera software from the GriPhyN (Grid Physics Networks) project  Shared “recipe repository” (transformation catalog) discussions are underway  Recent successes with Chimera-based ATLAS job execution on “CMS” nodes on shared CMS/ATLAS grid testbeds  More from the grid folks (Rob Gardner?) on this  Work is needed on integration with ATLAS database infrastructure

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 24 Near-term plans  Focus in coming months: deploy and test a reasonable prototype of ATLAS Computing Model in the time frame of Data Challenge 2  Model is still being defined—Computing Model working group preliminary report due in October(?)  Note that DC2 is intended to provide an exercise of the Computing Model sufficient to inform the writing of the Computing TDR  Ambitious development agenda required  See database slides from July ATLAS Data Challenge workshop  Tier 0 reconstruction prototype is a principal focus, as is some infrastructure to support analysis

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 25 Event persistence for Tier 0 reconstruction in DC2  Persistence for ESD, AOD, Tag data in POOL (8.0.0)  ESD, AOD, Tag not yet defined by reconstruction group  Athena interfaces to POOL collection building and filtering infrastructure (7.5.0)  Physical placement control, and placement metadata (to support selective retrieval) (7.5.0)  Support for writing to multiple streams (e.g., by physics channel) (7.5.0)  Support for concurrent processors contributing to common streams (8.0.0)  Cataloging of database-resident event collections in son-of-AMI database, and other AMI integration with POOL (7.5.0)  Magda  RLS  POOL integration ( ?)

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 26 Conditions database development  Extensions needed to Athena/StoreGate to support writing calibrations/conditions from Athena  AthenaPOOL service should be capable of handling the persistence aspects by Release 7.0.0/7.1.0  Work underway in the database group on organizing persistent calibration/conditions data, infrastructure for version tagging, for specifying in job options which data are needed  Limited prototype capabilities in Release  Responsibility for model moving to new calibration/alignment coordinator  Some exercise of access to conditions data varying at the sub-run level by Tier 0 reconstruction is planned for inclusion in DC2

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 27 Conditions database futures  LCG conditions database common project will start soon  ATLAS development agenda will be tied to this  Expect to contribute strongly to common project requirements and development  Some DCS and muon test beam data are already going into the ATLAS/Lisbon implementation of the common project interface that will be the LCG conditions project’s starting point; liquid argon testing of this infrastructure also underway

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 28 Coordination with online and TC  Historically, demonstrably good at the component level (cf. the joint conditions database work with Lisbon), but largely ad hoc  Now formalized with new ATLAS Database Coordination Group commissioned by Dario Barberis, with representation from online/TDAQ, offline, and Technical Coordination  Conditions Data Working Group also launched, with substantial involvement from both offline and online  Successful joint conditions data workshop organized in February in advance of this

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 29 Staffing  Current census: small groups (2-3 FTEs) at Argonne, Brookhaven, LAL Orsay; about 1 FTE at Grenoble working on tag collector and AMI databases for production  Enlisting involvement from British GANGA team; cf. metadata workshop at Oxford last month  U.S. ATLAS computing management has worked hard to try to increase support for database development, but we all know how tough the funding situation is  Trying to leverage grid projects wherever possible  Lack of database effort at CERN is conspicuous, and hurts us in several ways  Data Challenge production and support is a valuable source of requirements and experience, but it reduces development effort  Not clear that expected staffing levels will allow us to meet DC2 milestones  More on this at the LHC manpower [sic] review next week

28 August 2003 David Malon, ANL U.S. ATLAS Computing Meeting 30 Conclusions  Work is underway on many fronts: common infrastructure, event store, conditions and IOV databases, primary numbers and geometry, metadata, production databases, …  Many things will be ready for early adopters soon (7.0.0/7.1.0)  Development agenda for Data Challenge 2 is daunting  We need help, but we cannot pay you  If we survive DC2, look for persistence tutorials at the next U.S. ATLAS computing workshop