Download presentation
Presentation is loading. Please wait.
Published byHolly Shepherd Modified over 9 years ago
1
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3
2
29-Aug-2009 Elizabeth Gallas 2 Overview – Oracle usage in ATLAS Oracle used extensively at every stage of data taking and analysis Configuration PVSS – Detector Control System (DCS) Configuration & Monitoring Trigger – Trigger (Configure 3-level online event selection) OKS - Configuration databases for the TDAQ Detector Description – Geometry File and Job management Tier-0 – Initial event data processing at CERN DQ2/DDM – distributed file and dataset management Dashboard - monitor jobs and data movement on the ATLAS grid PanDa – workload management: production & distributed analysis Dataset selection catalogue AMI - dataset selection catalogue with links to other ATLAS metadata Event summary - event-level metadata TAGs – ease selection of and navigation to events of interest Conditions data (non-event data for offline analysis) Conditions Database in Oracle POOL files (referenced from the Conditions DB, stored in DDM)
3
29-Aug-2009 Elizabeth Gallas 3
4
29-Aug-2009 Elizabeth Gallas 4 Conditions Database Overview Subsystems need to store information which is needed in offline analysis which is not “event-wise”: the information represents conditions of the system for an interval ranging from very short to infinity. Volume ~ 1GB / day (online offline) We store Conditions Data and reference to POOL files in Oracle in a generic schema design which can store/accommodate/deliver a large amount of data for a diverse set of subsystems. Relies on considerable infrastructure Based on CORAL (developed by CERN IT, ATLAS) Restricts data types Allow extraction of ‘slices’ of Conditions into alternative DBMS Used at every stage of data taking and analysis From online calibrations and alignment … processing … more calibrations … further alignment… reprocessing … analysis … to relating recorded events with the beam conditions
5
29-Aug-2009 Elizabeth Gallas 5 Stages: ATLAS Reconstruction RAW data file ESD (Event Summary Data) ~500 kB/event AOD (Analysis Object Data) ~100 kB/event TAG (not an acronym) ~ 1 kB/event (stable) Athena (framework for ATLAS reco/analysis) Input: file based events + Conditions DB via COOL COOL = the Conditions DB API, allows uniform access to ATLAS Conditions data from Athena Each stage of ATLAS processing/analysis requires Conditions DB data RAW AOD ESD TAG
6
29-Aug-2009 Elizabeth Gallas 6 Conditions DB terminology ATLAS Conditions Database is an Interval of Validity Database: All tables are indexed using an interval in time FOLDER: ‘think’ Table in the Conditions DB Indexed by IOV (Interval of Validity) in time range or run-LB range (Optionally) CHANNEL number (or name) Useful to store many objects of identical structure (Optionally) Version (called a COOL TAG ) Contains its ‘ PAYLOAD ’: the data (one/more columns) ‘inline’ values and/or ‘reference’ values (pointer to external) Many payload data types available – restricted by CORAL FOLDERSET: set of folders and/or foldersets arranged in hierarchical structure, names ~UNIX pathnames Subdetector Folderset_1 Folder 1 A Folder 1 B Folder_A
7
29-Aug-2009 Elizabeth Gallas 7 Some numbers / recent reprocessing Conditions DB data are organized in 16 database schemas: Total of 747 tables organized in 122 folders plus system tables Current volume (simulation, cosmic and commissioning data): CERN Master Online: 41 GB CERN Master Offline: 400 GB 10 Tier-1 Oracle RACs: 168 GB Volume will increase considerably with real collision data Mid-November 2009 From recent reprocessing of cosmic ray data: 35 distinct database-resident payloads from 32 bit to 16 MB Referencing 64 external POOL files in total To process a 2 GB file with 1K raw events a typical reconstruction job makes ~2K queries to read ~40 MB of database-resident data Some jobs read tens of MB extra Plus about the same volume of data is read from external POOL files
8
29-Aug-2009 Elizabeth Gallas 8 Challenge: Distribution of Conditions data Use cases continue to grow for distributed Processing…Calibration…Alignment…Analysis … Can happen anywhere on the ATLAS grid (worldwide) Oracle stores a huge amount of essential data Keeps all this data ‘at our fingertips’ But ATLAS has many… many… many… fingers looking for both the oldest and newest data Adding network latency bottlenecks Solutions for distributed databases: make a ‘copy’ SQLite file: mini-Conditions DB with specific Folders, IOV range (,CoolTag) required. Considerable file management required Thought by many not to be scalable with real data Frontier: store results in a web cache (from CMS model) Located at/near the Oracle RAC
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.