Database Replication and Monitoring in ATLAS Computing Operations Suijian Zhou LCG Database Readiness Workshop Rutherford, UK March.23,2006
The ATLAS Tiers and roles Tier0: 1). Calibration and alignment 2). First-pass ESD,AOD production and TAG production 3). Archiving and Distribution of RAW, ESD, AOD and TAG data Tier1: 1). Storage of RAW,ESD, calibration data, meta-data, analysis data, simulation data and databases 2). Perform reprocessing of RAWESD Tier2: 1). Data processing for calibration and alignment tasks 2). Perform Monte Carlo simulation and end-user analysis – batch and interactive.
The ATLAS Databases Detector production, detector installation Survey data Detector geometry Online configuration, online run book-keeping, run conditions (DCS and others) Online and offline calibrations and alignments Offline processing configuration and book- keeping Event tag data
Conditions Database of ATLAS It refers to nearly all the non-event data produced during the operation of the ATLAS detector, and also those required to perform reconstruction and analysis Varies with time, characterized by “Interval of validity” (IOV) It includes: 1). data ahchived from ATLAS detector control system(DCS) 2). online book-keeping data, online and offline calibration and alignment data 3). Monitoring data charactering the performance of the detector
ATLAS DB Replication Task Conditions DB should be distributed worldwide to support the data processing tasks at Tier-1s and Tier-2s Conditions DB updates (e.g. improved calibration constants) generated worldwide should be brought back to the central CERN-based DB servers, for subsequent distribution to all sites that require them To avoid overloading the central Tier0 server at CERN (thousands of jobs requiring the database at the same time may exhaust the resources of a single DB server or even crash it), slave DB servers need to be deployed on at least 10 Tier-1s
The Conditions DB--COOL Interval-of-Validity (IOV) based storage and retrieval expressed as a range of absolute times or run and event numbers Data is stored in folders which are arranged in a hierarchical structure of foldersets Implements using Relational Access Layer (RAL), makes it possible for COOL database to be stored in Oracle, MySQL or SQLite technology
ATLAS DB Replication Strategies (1) Conditions data in POOL ROOT format can be replicated using the standard tools of the ATLAS Distributed Data Management (DDM) system DQ2 Small database such as Geometry DB using MySQL and SQLite technologies. Native Oracle Streams replication from Tier-0Tier-1s, where data are replicated ‘real-time’ from master to slave databases. (any Oracle data, also event TAG data etc.)
ATLAS DB Replication Strategies (2) COOL API-Level replication from OracleSQLite. The PyCoolCopy tool in PyCoolUtilities (Python-based COOL Utilities) enables subsets of COOL folder trees copied from one database to another. Currently ‘static’, will be ‘dynamic’ in the future. CORAL Frontier-based replication. It translate SQL database requests into http protocol request at the client. A Tomcat web server interacting with an Oracle database backend will return the query results to the client as html pages. Setup squid web-proxy cache servers at Tier-0,Tier-1s.
The Octopus Replicator for the Database Replication(1) It can work between different database backends as long as they contain equivalent schemas (e.g. Atlas GeometryDatabase, Tag database, etc.) It is configured to replicate between Oracle, MySQL and SQLite. It also works on other database and files: Access, MSQL, CJDBC, EXCEL, Informix, PostgreSQL, XML etc. Other functions as: Database backup/restore, and Database synchronization
The Octopus Replicator for the Database Replication(2) The Octopus Replicator works in two steps: 1). Generation of database schema description and conversions scripts (generate). 2). The actual database replication itself (load) Typical configurations for Atlas tasks are considered: Geometry Database: Oracle MySQL Oracle SQLite Tag Database: MySQL MySQL MySQL Oracle Oracle MySQL
Database replication monitoring(1) Dedicated machine “atlmysql04” for database replication monitoring and test is being set up. Currently: mysql-standard-4.0.26 MonALISA v1.4.14 MonAMI v0.4 are installed on this server. A “farm_name” of “atlasdbs” on MonALISA is given to this server
Database replication monitoring (2) Using MonALISA and MonAMI to monitor the DB replication activities(e.g. from Tier0Tier1s DB servers) System information of DB servers (Load, free memory etc.) Network information (traffic, flows, connectivity, topology etc.) The MonAMI (by Paul Millar etc.) monitoring daemon uses a plugin architecture to talk between the “monitoring targets” (a MySQL database, an Apache webserver etc.) and the “reporting targets” (MonAlisa, ganglia etc.)
The MonAlisa monitoring system
Next tasks: Support from MonAMI for plugins to monitor Oracle database. Deploy and test the monitoring as soon as possible.