ATLAS Database Operations Invited talk at the XXI International Symposium on Nuclear Electronics & Computing Varna, Bulgaria, 10-17 September 2007 Alexandre.

Slides:



Advertisements
Similar presentations
Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
Advertisements

ATLAS Databases: An Overview, Athena use of Geometry/Conditions DB, and Conditions Metadata Elizabeth Gallas - Oxford ATLAS-UK Distributed Computing Tutorial.
CERN - IT Department CH-1211 Genève 23 Switzerland t Relational Databases for the LHC Computing Grid The LCG Distributed Database Deployment.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
ATLAS Scalability Tests of Tier-1 Database Replicas WLCG Collaboration Workshop (Tier0/Tier1/Tier2) Victoria, British Columbia, Canada September 1-2, 2007.
CHEP 2006, Mumbai13-Feb-2006 LCG Conditions Database Project COOL Development and Deployment: Status and Plans Andrea Valassi On behalf of the COOL.
2nd September Richard Hawkings / Paul Laycock Conditions data handling in FDR2c  Tag hierarchies set up (largely by Paul) and communicated in advance.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
CERN Physics Database Services and Plans Maria Girone, CERN-IT
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
CERN - IT Department CH-1211 Genève 23 Switzerland t COOL Conditions Database for the LHC Experiments Development and Deployment Status Andrea.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Database Competence Centre openlab Major Review Meeting nd February 2012 Maaike Limper Zbigniew Baranowski Luigi Gallerani Mariusz Piorkowski Anton.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
Development, Deployment and Operations of ATLAS Databases XVI International Conference on Computing in High Energy and Nuclear Physics Victoria, British.
Julia Andreeva on behalf of the MND section MND review.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
11th November Richard Hawkings Richard Hawkings (CERN) ATLAS reconstruction jobs & conditions DB access  Conditions database basic concepts  Types.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Database Access Patterns in ATLAS Computing Model G. Gieraltowski, J. Cranshaw, K. Karr, D. Malon, A. Vaniachine ANL P, Nevski, Yu. Smirnov, T. Wenaus.
ATLAS The ConditionDB is accessed by the offline reconstruction framework (ATHENA). COOLCOnditions Objects for LHC The interface is provided by COOL (COnditions.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Database Requirements Updates from LHC Experiments WLCG Grid Deployment Board Meeting CERN, Geneva, Switzerland February 7, 2007 Alexandre Vaniachine (Argonne)
Dario Barberis: ATLAS DB S&C Week – 3 December Oracle/Frontier and CondDB Consolidation Dario Barberis Genoa University/INFN.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
Database Replication and Monitoring
Grid site as a tool for data processing and data analysis
IT-DB Physics Services Planning for LHC start-up
LCG 3D Distributed Deployment of Databases
Pasquale Migliozzi INFN Napoli
3D Application Tests Application test proposals
Data Challenge with the Grid in ATLAS
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
Conditions DB Data Distribution
Conditions Data access using FroNTier Squid cache Server
Simulation use cases for T2 in ALICE
Workshop Summary Dirk Duellmann.
ALICE Computing Upgrade Predrag Buncic
LHCb Conditions Database TEG Workshop 7 November 2011 Marco Clemencic
Monitoring of the infrastructure from the VO perspective
LHC Data Analysis using a worldwide computing grid
The LHCb Computing Data Challenge DC06
Presentation transcript:

ATLAS Database Operations Invited talk at the XXI International Symposium on Nuclear Electronics & Computing Varna, Bulgaria, September 2007 Alexandre Vaniachine

2 NEC'2007, Varna, Bulgaria, September 2007 ATLAS Database Applications Overview Centralized Databases –accessed interactively (by people) or by limited number of computers Technical Coordination DB: detector construction information, Engineering DB Online databases: slow control data (PVSS), Trigger DB, DBs for subsystems Computing Operations databases: –AKTR DB (ATLAS Knowledgement of Tasks Requests) physics tasks definitions for job submission –prodDB (production system DB): job configuration and job completion –Distributed Data Management (DDM) databases: Dashboard DB (dataset and file transfers/monitoring), etc. AMI (ATLAS Metadata Information DB – “Where is my dataset?”) Databases that are distributed world-wide (for scalability) –accessed by an “unlimited” number of computers on the Grid: simulations jobs, reconstruction jobs, analysis jobs,… Conditions DB - COOL (Developed by LCG-AA with ATLAS contributions) –distributed world-wide via Oracle streams Now operated in production with real data (from commissioning) and simulations data Geometry DB - ATLASDD (developed by ATLAS) –distributed world-wide in files (Database Release) TAG DB - event-level metadata for physics - (LCG-AA with ATLAS contributions)

3 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 ATLAS Computing: Transition to Service vs. Development In preparation for ATLAS operations the “Database Deployment & Operations” activity is now definedDatabase Deployment & Operations The activity consists in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and related datasets, as well as the actual operation of this system on ATLAS multi-grid infrastructure

4 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 WLCG Distributed Deployment of Databases (3D) Project To set-up database services and facilities for relational data transfers as part of the WLCG infrastructure the WLCG Distributed Deployment of Databases (3D) project is coordinating database deployment and operations activity between LHC experiments and WLCG tier sites [ The 3D project is deploying distributed database services and developing advanced technologies allowing LCG applications to access the required data

5 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 ATLAS Conditions DB Challenges Unprecedented complexity of LHC detectors –orders of magnitude more channels than in previous experiments To address these challenges ATLAS adopted Common LHC solution for Conditions DB – COOL: –COOL enables separation of the Interval-of-Validity metadata from the actual Conditions data (“payload”) –select “payload” data are stored in files outside of the database server e.g. calorimeter calibration constants that are not really suited to relational databases because of their size and access patterns COOL technology is successfully deployed in production for ATLAS detector commissioning operations –major detector commissioning exercise – the M4 Cosmics Run in August Current ATLAS Conditions DB operations snapshot: –more than 70 GB of data –each day up to 1 GB of COOL data is added and replicated –data growth rates are increasing as more subdetector elements are being instrumented during ATLAS detector commissioning –access varies between 8K and 34K sessions per day

6 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 Distributed Database Operations Access to Conditions DB data is critical for event data reconstruction To achieve scalability in Tier-0 operations slices of the corresponding conditions/calibrations data will be delivered to Tier-0 farm via files on afs Beyond Tier-0 we need a different technology for data distribution –3D Oracle streams to Tier-1 sites for replication of fast-varying data – such as Conditions DB data –ATLAS DDM DQ2 for file-based database data replication for replication of slow-varying data – such as ‘static’ Database Releases (Geometry DB, etc.) for replication of large files with Conditions DB payload Critical operational infrastructure for these technologies is delivered by: –ATLAS Distributed Data Management (DDM) –WLCG Project on Distributed Deployment of Databases (3D)

7 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 All Ten ATLAS Tier-1 Sites in Production Operation Using 3D infrastructure ATLAS is running a ‘mini Calibration Data Challenge’: –regular conditions data updates on Online RAC, testing propagation to Offline RAC and further to ten Tier-1s –since April ~2500 runs, 110 GB of COOL data replicated to Tier-1s at rates of 1.9 GB/day Leveraging the 3D Project infrastructure, ATLAS Conditions DB worldwide replication is now in production with real data (from detector commissioning) and data from MC simulations: –Snapshot of real-time monitoring of 3D operations on EGEE Dashboard: These successful ATLAS operations contributed to a major WLCG milestone

8 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 Major WLCG Milestones Accomplished Together we have accomplished a major achievement: Slide: Dirk Duellmann

9 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 DB Replicas at Tier-1s: Critical for Reprocessing What for we are building such large distributed database system? ATLAS Computing Model provides following requirements at Tier-1: –Running reconstruction re-processing: O(100) jobs in parallel –Catering for other ‘live’ Conditions DB usage at the Tier-1 (Calibration and Analysis), and perhaps for the associated Tier-2/3s To provide input to future hardware purchases for Tier-1s –How many servers required? Balance between CPU, memory, disk we have to do Oracle scalability tests with the following considerations: –We should test realistic Conditions DB workload with mixed amount and types of data: dominated by the “slow control” (DCS) with three combinations: – “no DCS”, “with DCS” and “10xDCS” (see backup slides) –Tier-0 uses file-based Conditions DB slice, at Tier-1 DB access differs Because of rate considerations we may have to stage and process all files grouped by physical tapes, rather than datasets – We should test random data access pattern

10 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 How Scalability Test Works First ATLAS scalability tests started at the French Tier-1 site at Lyon. Lyon has a 3-node 64-bit Solaris RAC cluster which is shared with another LHC experiment - LHCb. In scalability tests our goal is to overload the database cluster by launching many jobs at parallel Initially, the more concurrent jobs is running (horizontal axis) – the more processing throughput we will get (vertical axis), until the server became overloaded, when it takes more time to retrieve the data, which limits the throughput In that particular plot shown the overload was caused by lack of optimization in the COOL 2.1 version that was used in the very first test –But it was nice to see that our approach worked “no DCS” Concurrent jobs per Oracle RAC node

11 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 Scalability Tests at T1s: CNAF, Bologna & CC IN2P3 Lyon CNAF has a 2-node dual-CPU Linux Oracle RAC (dedicated to ATLAS ) –Comparison shows the importance of doing tests at several sites Further COOL 2.2 optimization is expected to increase the performance ATLAS targets

12 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 Conclusions and Credits In preparation for data taking a coordinated shift from development towards services/operations has occurred in ATLAS database activities ATLAS is active in the development and deployment of the tools that allow the worldwide distribution and installation of databases and related datasets, as well as the actual operation of this system on WLCG multi- grid infrastructure In collaboration with WLCG 3D Project a major milestone accomplished in database deployment: –ATLAS Conditions DB is deployed in production operations with real data on one of the largest distributed database systems world-wide First Oracle scalability tests indicate that WLCG 3D capacities in deployment for ATLAS are in the ballpark of what ATLAS requires –Further scalability tests will allow more precise determination of the actual ATLAS requirements for distributed database capacities Acknowledgements I wish to thank all my ATLAS collaborators who contributed to and facilitated ATLAS database deployment and operations

Backup Slides

14 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 Realistic Workload for Conditions DB Tests ATLAS replications workload is using multiple COOL schemas with mixed amount and types of data: SchemaFoldersChannelsChannel payloadN/RunTotal GB INDET char CALO char1 1.8 MDT CLOB: 3kB+4.5kB GLOBAL1503 x float TDAQ/DCS x float TRIGGER x float ‘best guess’ Replicated data provided read-back data for scalability tests –The realistic conditions data workload: a ‘best guess’ for ATLAS Conditions DB load in reconstruction – dominated by the DCS data Three workload combinations were used in the tests: –“no DCS”, “with DCS” and “10xDCS”

15 Alexandre Vaniachine NEC'2007, Varna, Bulgaria, September 2007 Three Combinations Used for the Test Workloads The latest COOL 2.2 version enabled more realistic workload testing Three different workloads have been used in these tests: “no DCS” - Data from 19 folders of 32 channels each, POOL reference (string) payload, plus 2 large folders each with 1174 channels, one with 3k string per channel, one with 4.5k string per channel, which gets treated by Oracle as a CLOB, plus 1 folder with 50 channels simulating detector status information. This is meant to represent a reconstruction job running reading calibration but no DCS data. The data is read once per run. “with DCS” - As above, but an additional 10 folders with 200 channels each containing 25 floats, and 5 folders with 1000 channels of 25 floats, representing some DCS data, again read once per run “10xDCS” - As above, but processing 10 events spaced in time so that all the DCS data is read again for each event. This represents a situation where the DCS data varies over the course of a run, so each job has to read in 10 separate sets of DCS data.