Database operations in CMS

Slides:



Advertisements
Similar presentations
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Update on Database Issues Peter Chochula DCS Workshop, June 21, 2004 Colmar.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
CMS Databases P. Paolucci. CMS DB structure HLT-CMSSW applicationReconstruction-CMSSW application FronTIER read/write objects.
Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
ALICE, ATLAS, CMS & LHCb joint workshop on
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Detector Diagnostics Calibration Analysis Ped/LED/Laser RadDam Analysis Detector Optimization Lumi Detector Performance Monitoring DQM On/Offline Prompt.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group.
Introduction CMS database workshop 23 rd to 25 th of February 2004 Frank Glege.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Michele de Gruttola 2008 Report: Online to Offline tool for non event data data transferring using database.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
G.Govi CERN/IT-DB 1 September 26, 2003 POOL Integration, Testing and Release Procedure Integration  Packages structure  External dependencies  Configuration.
Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
3D Testing and Monitoring Lee Lueking LCG 3D Meeting Sept. 15, 2005.
November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /
11th November Richard Hawkings Richard Hawkings (CERN) ATLAS reconstruction jobs & conditions DB access  Conditions database basic concepts  Types.
Summary of User Requirements for Calibration and Alignment Database Magali Gruwé CERN PH/AIP ALICE Offline Week Alignment and Calibration Workshop February.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Database Issues Peter Chochula 7 th DCS Workshop, June 16, 2003.
M.Frank, CERN/LHCb Persistency Workshop, Dec, 2004 Distributed Databases in LHCb  Main databases in LHCb Online / Offline and their clients  The cross.
Time critical condition data handling in the CMS experiment during the first data taking period CHEP 2010 Software Engineering, Data Stores, and Databases.
The Database Project a starting work by Arnauld Albert, Cristiano Bozza.
European Organization For Nuclear Research CERN Accelerator Logging Service Overview Focus on Data Extraction for Offline Analysis Ronny Billen & Chris.
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
CT-PPS DB Info (Preliminary) DB design will be the same as currently used for CMS Pixels, HCAL, GEM, HGCAL databases DB is Oracle based A DB for a sub-detector.
INFN Tier1/Tier2 Cloud WorkshopCNAF, 22 November 2006 Conditions Database Services How to implement the local replicas at Tier1 and Tier2 sites Andrea.
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
Jean-Philippe Baud, IT-GD, CERN November 2007
HCAL Database Goals for 2009
DCS Status and Amanda News
Oracle structures on database applications development
Database Replication and Monitoring
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
CMS High Level Trigger Configuration Management
Maximum Availability Architecture Enterprise Technology Centre.
POOL persistency framework for LHC
Conditions Data access using FroNTier Squid cache Server
Workshop Summary Dirk Duellmann.
TriggerDB copy in TriggerTool
LHCb Conditions Database TEG Workshop 7 November 2011 Marco Clemencic
AWS COURSE DEMO BY PROFESSIONAL-GURU. Amazon History Ladder & Offering.
Monitoring of the infrastructure from the VO perspective
Grid Canada Testbed using HEP applications
Chapter 2 Database Environment Pearson Education © 2009.
Database Services for CERN Deployment and Monitoring
Cloud computing mechanisms
DQM for the RPC subdetector
ATLAS DC2 & Continuous production
Status and plans for bookkeeping system and production tools
Presentation transcript:

Database operations in CMS Distributed Database Operations Workshop 2010 19 November 2010 Giacomo Govi (CERN) On behalf of the CMS experiment

Outline CMS Data into RDBMS Database infrastructure Condition Database Luminosity data management Operations Planning Summary 10/19/10 Giacomo Govi

CMS Data stored in RDBMS CMS operations involves a relevant number of services/activities requiring data management of various nature with different volumes, performance or scaling issues RDBMS is a key component for several applications Online systems, offline reconstruction and analysis, workflow management, configuration management, authentication, monitoring, logging strategic data: accessible by many applications private data: internally used by a single application 10/19/10 Giacomo Govi

Online data Critical for the detector operation and the data taking: Detector configuration XDAQ, TStore Trigger configuration TStore DAQ Slow control PVSS+RDB Monitoring Storage manager Most of the clients of the applications/services are devices (or humans) located at P5 Data should be accessible to a reduced community Special requirements: The system must be able to fully operate with no external network connection Private DB storage (cannot be shared with other systems) 10/19/10 Giacomo Govi

Offline data Critical for the physics data analysis chain: Wide category, includes the data involved in the offline production activities Calibration, detector condition Varying with time and frequently updated Detector description Static (or quasi static) Beam and luminosity information Run information Critical for the physics data analysis chain: Data are exposed to a large community Many institutions of the collaboration involved Potentially little control on volumes expected, technologies, standards, practices, access patterns 10/19/10 Giacomo Govi

Other data Related to some general services, critical for many activities of the experiment: File transfer management FedEx Basically ‘tactical’ data, but highly transactional File bookkeeping DBS Large volumes, write once… Jobs and file processing T0AST Transactional Authentication/Authorization SiteDB Quasi static read-only data 10/19/10 Giacomo Govi

Infrastructure 2 production clusters: 1) CMSONR, 6 nodes Oracle RAC located at P5 only ‘visible’ within the P5 network two logical databases: OMDS stores data for sub-detectors, trigger, slow control, luminosity, configuration, monitoring ORCON stores calibrations and other condition data. h/w from CMS, s/w + maintenance from CERN/IT 2) CMSR, 6 nodes Oracle RAC located at the IT center visible within the CERN network storage for condition, run, luminosity (ORCOFF) storage for workflow management data fully maintained by IT + Integration RAC: INT2R – visible from P5 10/19/10 Giacomo Govi

Data flow A subset (summary) of condition data is read from OMDS, reformatted and stored in ORCOFF processes running in dedicated nodes at P5 performed by a single application (PopCon) 2 Oracle streams populate ORCOFF with data from OMDS and ORCON: ORCON + Luminosity + Storage Manager data PVSS accounts and monitoring data from OMDS CMSONR CMSR OMDS ORCON PoPCon ORCOFF Detector, Trigger Streams Offline dropbox Offline reprocessing 10/19/10 Giacomo Govi

Condition Database Manages a large set of the categories listed before: required by the HLT, DQM and Offline reconstruction ~200 data sources feeding their tables managed by several institutions/groups within CMS with wide range of frequency and data volume accessed for reading by more than 15000 jobs/day (only from Tier0/1!) Stability/performance require to limit the access patterns By policy: no delete, no update By the architecture: write with a single application in ORCON (PopCon application) stream the ORCON into ORCOFF, read ORCOFF Severs overload in reading kept low with FronTier caches See Dave talk in next session 10/19/10 Giacomo Govi

Population PopCon (Populator of Condition object) application Package integrated in the CMSSW framework Stores/Retrieve data as C++ objects supporting more RDBMS technologies (Oracle, SQLite, FronTier) Depends on some externals from LCG/AA: Root, Coral Dependency on POOL recently dropped Writes in ORCON (O2O jobs and Offline dropbox) Offline Dropbox Export automatically condition data processed offline into the ORCON database Accessing the Oracle DB within the P5 private network Files are pulled into the P5 network every 10 minutes Python application based on Http proxy 10/19/10 Giacomo Govi

Condition data reading/distribution The Offline Reconstruction jobs running in the Tier0/1 are potentially creating a massive load on ORCOFF. jobs from Tier0 and Tier1s (~15000)+ a variable subset of jobs from Tier2s (~30000?) 200 condition objects to read with 3-5 tables => ~800 queries volumes involved are modest: ~40-50 Gb for the entire DB data is read-only at large extent FronTier caches allow to minimize the direct access to ORACLE servers At the price of a possible latency implied by the refreshing policy 2 Frontier services implemented ( ORCON to P5 and ORCOFF to Tier0/1 ) Snapshot from Oracle DB heavily used for reprocessing Data set are exported in a dedicated server SQLite files provide the additional, simple way to ship data through the network Used by the Offline DropBox to export calibration data into ORCON Can be also used to ship MC data to Tier1’s 10/19/10 Giacomo Govi

Luminosity data management Luminosity data have been recently added in the CMS DBs. Production flow Computed by processes running at P5 Written in a dedicated account in OMDS Streamed to ORCOFF (Condition Stream) Two main reading use cases Selected via FronTier and attached to reconstructed data in the jobs running at Tier0 Selected via FronTier for user queries of specific runs (python script) Reconstruction Job P5 Calculation Streaming FronTier Phyton Script 10/19/10 Giacomo Govi

Space usage CMSONR CMSR OMDS ~300 accounts R/W, ~200 RO 2,5 Tb + 30 Gb/week Monitoring + PVSS largest contribution ORCON ~ 200 accounts RW 50 Gb growing 1Gb/week CMSR 2,4 Tb + 30Gb/week ORCOFF Copy of ORCON 10/19/10 Giacomo Govi

Server load Several applications running With a variety of access patterns and data volumes Prior develop/deployment process in Test and Integration clusters (INT2R, INT9R) mandatory tuning before production: schema layout, queries, workflow Complex systems require iteration process with the DBA’s detailed feedback is required to obtain a good application Result is a load rather stable in terms of sessions and queries CPU and operating system load shows some spike from time to time… 10/19/10 Giacomo Govi

Streams Both streams are performing well Condition stream highly stable little or none direct DB access left to the users narrow access pattern allowed by the common software PVSS stream had few short breaks in the past the cause was some particular schema change or delete operations expert from the concerned applications have wide access to the DB In general, DB operation which might affect the streaming are discussed/planned with the DBAs. 10/19/10 Giacomo Govi

Planning I Move to Oracle 10.2.0.5 after Christmas break need to test all applications beforehand already upgraded test DBs (cmsdevr, int2r) Online: validated: TStore, Storage Manager To be done: PVSS (6th Dec) Offline: migrate ARCHDB + stress test offline sw (December) Improve alarming and monitoring for Online DB Reviewed list of contacts for Online accounts (at least one email) IT will provide automatic tools for alarms and warning (pwd expiring, login failure, account locked, anomalous account growth,…) Same will be done for offline DB (after applications review) 10/19/10 Giacomo Govi

Planning II A couple of episodes of offline DB instability due to high load in the last two months => Offline/Computing applications review Aim: analize the current use of the DB estimate the load increase for next years estimate the applications safety for the DB/ optimize Phedex presentation done 9th November Next: DBS, T0AST, SiteDB Other plans: Conditions schema improvements Will require to migrate (copy) a subset of data into the DB Various improvements in the Condition software 10/19/10 Giacomo Govi

Summary CMS computing relies on RDBMS for many critical production applications. The variety of requirements in terms of access patterns, data volumes and workflow represents a relevant complexity. Overall DB architecture is focused in few principles: Use RDBMS for quasi static and transactional data Online systems safety Distribute R/O data with web caches/files Operation during data taking brought more experience Ensure stability with a well defined software process for DB applications Monitoring systems essential for preventing performance degration 10/19/10 Giacomo Govi