The ATLAS Database Project Richard Hawkings, CERN Torre Wenaus, BNL/CERN ATLAS plenary meeting June 24, 2004
June 24th, 2004 Richard Hawkings 2 Outline Mandate and scope Project definition process Organization and communication Subproject survey Concluding remarks The current draft plan can be found at: (temporary location until new database web is up and running)
June 24th, 2004 Richard Hawkings 3 Project Mandate and Scope Lead and coordinate all ATLAS database activities Including those so far under Software, TC, TDAQ, Detector Projects Software, servers, distributed data management infrastructure Specifically, databases and data management for Detector production, detector installation, survey data Detector geometry Online configuration, run bookkeeping, run conditions Event data and metadata Calibration and alignment (online and offline) Offline processing configuration and bookkeeping Grid based access to event and non-event data
June 24th, 2004 Richard Hawkings 4 Project Definition Process RH, TW appointments (each currently at 50%) effective May 1 st 2004 Serious project definition work began late April Based on the project outline presented at January 2004 software week Project should strengthen, not weaken or delay, DB activities across ATLAS Many individual discussions, feeding into plan iterations Public draft circulated to ATLAS software community in advance of BNL software week, and discussed during the week No major new input – generally favourable impression Draft plan was approved by May 28 th CMB/SPMB Continuing to gather input (TDAQ community, subdetectors, EB, …) The plan will evolve, but the current version will guide project launch
June 24th, 2004 Richard Hawkings 5 Organization Part of Software and Computing Project (Dario Barberis) Richard Hawkings, Torre Wenaus Co-Leaders 5/04-5/06 Computing Management Board (CMB) members Ex-officio Trigger/DAQ steering group (TDAQ-SG) members Database Steering Group chaired by the Project Leaders is planning and decision making body Twelve subprojects cover the mandated scope Some subprojects embedded in other parts of ATLAS Where tight integration of DB activities must be preserved Subprojects being organized in close consultation with the projects concerned, to ensure this happens
June 24th, 2004 Richard Hawkings 6 Work Breakdown 1) Project management - steering, planning, coordination, strategy 2) Detector production - long term storage of subdetector production data 3) Detector installation - manufacturing and test (MTF), racks, cabling, survey 4) Detector geometry - primary numbers for detector description 5) Online databases - configuration, conditions, bookkeeping, offline transfer 6) Calibration and alignment - central tools, not subdetector algorithm work 7) Conditions database infrastructure - core sw and tools; framework integration 8) Event data - events and metadata from raw to analysis. Common core sw 9) Distributed data management - event and conditions data. Grid integration 10) Offline processing configuration and bookkeeping - production metadata 11) Distributed database services - physical databases, distributed infrastructure 12) Software support services - supporting users and deployers of DB software
June 24th, 2004 Richard Hawkings 7 Steering Group Planning and decision making body Integration mechanism to ensure synergy and coherence Across subdetectors, across database areas Representation from subprojects, associated projects Decisions taken following consensus in the steering group Project Leaders have full authority for planning and execution, including cases lacking full consensus In cases of serious dissent, CMB and TDAQ-SG (where appropriate) take final decision Strategic decisions go to the CMB (and TDAQ-SG) for endorsement Steering Group is a large body, as the broad scope requires…
June 24th, 2004 Richard Hawkings 8 Steering Group Composition Still have to fill some appointments in consultation with appropriate communities Technical coordination - Kathy Pommes, Luc Poggioli Online - Antonio Amorim, Mihai Caprini, Igor Soloviev High level trigger - TBD Calibration and Alignment - Richard Hawkings Detector geometry - Joe Boudreau Inner Detector - TBD LAr calorimeter - Hong Ma Tile calorimeter – Karl Gellerstedt Muon spectrometer - Joe Rothberg Conditions database infrastructure - RD Schaffer Event data - David Malon Distributed data management - TBD Offline processing - TBD Distributed database services - Alexandre Vaniachine User feedback - TBD, an informed+noisy+constructive user voice Persistency Framework Project (LCG Apps Area) - Dirk Duellmann Computing Coordinator (ex officio) - Dario Barberis Software Project Leader (ex officio) - David Quarrie Liaison from DB Project to Software Project Management Board - David Malon
June 24th, 2004 Richard Hawkings 9 Communication Meetings All with agendas in advance, and minutes documenting technical progress, planning and decisions. Phone connections to allow wide participation Steering Group meeting bi-weekly (Friday 15:30 starting June 25 th ) Weekly meeting covering primarily offline – continuation of existing meeting Technical planning and execution, within overall guidelines and plan of the SG A second weekly slot (to be defined) for Online database meeting, roughly bi-weekly TC database meeting (production, installation) every 2-4 weeks Conditions data working group meeting periodically Associated mailing lists for all these communities (online/TC to be setup) Web Project web as a comprehensive and current source of technical and planning information and documentation is a very high priority We take this as a project management responsibility We will both write web content and nag others to do the same!
June 24th, 2004 Richard Hawkings 10 Subproject Survey A compressed survey of the subprojects… A compressed survey of the subprojects…
June 24th, 2004 Richard Hawkings 11 1) Project Management Most of this area already addressed… Planning and steering Project meetings Project web …but also includes… Monitoring of QA, testing and validation Strategy and technology evolution
June 24th, 2004 Richard Hawkings 12 2) Detector Production The many production/construction DBs used worldwide in the subdetectors are not the responsibility of this project Ensuring all data of long-term interest to ATLAS is gathered into central (CERN IT Oracle) databases is in the mandate Central system for uniform access and long-term maintainability Provision of tools, standards, guidelines to subdetectors Data definition and entry is subdetector responsibility Central DB exists and some subsystems are entering data, but there is a great deal of central/common work to be done Personnel and oversight for central/common work is mostly absent Some ideas and possibilities – benefit from subdet production finishing?
June 24th, 2004 Richard Hawkings 13 3) Detector Installation MTF (manufacturing and test) installation database Installed parts with links to production database Rack database (exists; being populated) Cabling database (partially exists) Survey database (does not exist) Extraction tools to e.g. use cabling data for online config (not existing, and needed soon – e.g. comissioning) Here again, personnel and oversight for central/common work (this project’s mandate) is severely lacking
June 24th, 2004 Richard Hawkings 14 4) Detector Geometry Primary numbers used by detector description software NOVA-based system deployed and operating for some time Work is underway to move to a successor with versioning support Approach is consistent with EB-mandated push to implement final ‘as-built’ detector geometry before subdetector engineers leave Involves a ‘fast track’ implementation using standard relational DB tools to quickly support gathering and loading data Offline access (via LCG ‘relational POOL’, conditions DB) on longer timescale when that software is ready
June 24th, 2004 Richard Hawkings 15 5) Online Databases Configuration database - 30% of the online system; must remain integral to online Online run bookkeeping - expect to employ standard offline/online tools Conditions database interfaces - ditto. Online, through the Lisbon group, has provided the standard tool used also offline, now also contributing to LCG CondDB project External interfaces and data flow - Information Server (IS), slow controls (DCS), offline (to AMI, production mgmt)
June 24th, 2004 Richard Hawkings 16 6) Calibration and Alignment Activities organized via conditions data working group Communication forum for developing strategies, preparing online and offline algorithms Conditions database loading and access Contribute to computing model Little manpower for central/common tasks Both subdetector and coordination effort currently focused on CTB
June 24th, 2004 Richard Hawkings 17 7) Conditions Database Infrastructure Conditions database core software development Supporting tools (browsers, data distribution and synchronization, subsetting, etc.) Athena services for conditions data ATLAS participation in LCG CondDB common project Activity is increasing, with ATLAS the largest experiment participant New: Relational DB support for POOL, versioning system component Short-term focus is on CTB support and stability Planning (after CTB) to converge from many tools … Lisbon CondDB, NOVA, POOL, geometry DB … to essentially one, incorporating all experience gained Common project CondDB with POOL support
June 24th, 2004 Richard Hawkings 18 8) Event Data Core software support for event data, from raw data to analysis Including event collections and physics datasets Athena integration - both event data specific and common persistency services This activity moved from Software Project to Database Project Event data access outside Athena, e.g. in ROOT analysis environment ATLAS participation in POOL common project Event data storage for CTB and DC2 is generally OK File-level data management is handled by the next subproject
June 24th, 2004 Richard Hawkings 19 9) Distributed Data Management Management of ATLAS data around the world Cataloging, replication, synchronization, access control, … Event, conditions and other data; files and relational DBs Integration/interfacing with grid tools for data management And working around grid software deficiencies Present focus on DC2 production needs – Key tool: Don Quixote – interface to heterogeneous grids As yet no overall strategy for DDM today and in the future Need urgently to address user-level data management tools
June 24th, 2004 Richard Hawkings 20 10) Offline Processing Configuration and Bookkeeping Databases cataloging metadata that is input to and output from offline processing jobs Both managed production and (in the future) group and individual level jobs Cataloging of provenance information to unambiguously define job/software configuration Key tools at present are AMI and the production DB Again needs plan & strategy, including technology choices Present focus on DC2 and CTB support
June 24th, 2004 Richard Hawkings 21 11) Distributed Database Services Support for deployed database and data management services at CERN and throughout ATLAS Physical servers, distributed (heterogeneous!) database infrastructure Support and/or liaison for admin and operations of databases away from CERN Liaison to CERN IT/DB for CERN-based services Possible common project in distributed database infrastructure under discussion, initiated by ATLAS (David Malon) Present focus is again on CTB and DC2 support
June 24th, 2004 Richard Hawkings 22 12) Database Software Support Services Support for software, distinct from support for physical services (preceding subproject) Documentation Not authoring (developers are responsible), but organization, usability, monitoring and review, ‘encouragement’ to authors Tutorials and training User support services E.g. Savannah problem reporting, feature requests
June 24th, 2004 Richard Hawkings 23 The challenges ahead ATLAS database project is a big project Covers many different areas, diverse communities Key objectives: Improving communication, facilitating data transfers Short and medium-term concerns Manpower for TC-related areas (detector production / installation) Missing both sub-project leadership effort and workers Becoming increasingly important as we approach commissioning Can we exploit effort freed up from sub-detectors ? Data management strategies and needs – DC2 and vision beyond Large scale distributed infrastructure – LCG common project initiative Individual doing analysis/development – end user tools New contributions / efforts are needed and welcome !