Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group.

Similar presentations


Presentation on theme: "Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group."— Presentation transcript:

1 Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

2 Outline CMS Database infrastructure and data flow. Data access patterns. Requirements coming from the hardware and software infrastructure: – DB safety and security; – DB monitoring for Conditions. Requirements to be fulfilled by front-end applications (web): – 3 tier architecture; – Authorization. Monitoring WorkshopSalvatore Di Guida2

3 CMS Database Infrastructure CMS has two production Oracle Real Application Clusters: – CMSONR, 6 nodes Oracle RAC located in the CMS experimental area: Only visible from the CMS online network, Hosting two databases: – OMDS stores data for sub-detectors, trigger, conditions (slow control, configuration, detector status), luminosity, monitoring, – ORCON stores conditions (detector status data and calibration data); – CMSR, 4 nodes ORACLE RAC located at CERN IT Only visible within GPN, Hosting one database (ORCOFF), storing conditions, luminosity, workflow management data (file transfer, data bookkeeping, jobs processing, authentication and authorization). Monitoring WorkshopSalvatore Di Guida3

4 Condition Drop-box ORCOFF (Offline Reconstruction Condition Database Offline System) Streaming OMDS (Online Master Database System) ORCON (Offline Reconstruction Condition Database Online System) CMS Compact Muon Solenoid PopCon Online network at IP5GPN CMSONR CMSR CMS Database Data Flow OMDS stores all online conditions coming from the different sub-detectors. A subset (summary) of condition data is read from OMDS, reformatted in order to be retrieved as C++ object (payload) and stored in ORCON: – Using Object Relational Access (ORA) design pattern; – Performed by applications based on a Common API integrated in CMSSW (PopCon). Oracle streams populate ORCOFF with data from OMDS and ORCON. Condition Dropbox exports automatically data processed offline in ORCON. Monitoring WorkshopSalvatore Di Guida4

5 Database monitoring tools Many tools already available thanks to IT DB services: – For developers, they allow to check the status of all services, and the usage of DB resources: Main page: https://phydb.web.cern.ch/phydb/cmshttps://phydb.web.cern.ch/phydb/cms – SLS monitoring for all services deployed, – Lemon for all hardware involved, – Session monitoring for each DB service and for each schema; – For experts, they allow to deeply monitor each component of the system: Streams availability, DB resource usage (plenty of history plots); – Automatic alarm notifications: service failures (invalid objects, streams failure), high loads on nodes (high CPU load, high network traffic…). Monitoring WorkshopSalvatore Di Guida5

6 Monitoring requirements for hardware and software Database safety and security. Hardware and service monitoring across different networks: – Complying with the security policy of the different clusters; – With different levels of monitoring and a corresponding alarm system. Monitoring WorkshopSalvatore Di Guida6

7 Data access patterns In general, access patterns depend on how an application exploits data stored in its backend: – Transactional data should be accessed by the application itself in update mode, and not visible from other users; – Bookkeeping and authentication information is static and read only, but can be huge; – Conditions are of two kinds: static and read-only (construction, equipment), varying with time and requiring frequent lookups (conditions, calibrations). Monitoring WorkshopSalvatore Di Guida7

8 Access patterns for conditions Condition data produced by the CMS detector are essential for running HLT, DQM and the offline reconstruction chain: – Managed by several groups within the collaboration; – Wide range of update frequency and data volume. The stability and the availability of the infrastructure must be ensured, and its performance must not be downgraded neither in write nor in read access: therefore, this requires to limit the access patterns for these data: – establishing a strict policy: NO DELETE, NO UPDATE, INSERT ONLY (append data to time-based sequences of validity ranges – IOV), – promoting the usage of a reduced number of applications, for both data insertion and data retrieval: PopCon and Condition DropBox, Framework modules reading conditions (grouped consistently via Global Tag); – Servers load in reading reduced using a caching mechanism (FroNTier). Monitoring WorkshopSalvatore Di Guida8

9 Retrieving conditions The HLT (running at P5), DQM (running at P5 and Tier0/CAF), offline reconstruction jobs (running at Tier0/Tier1s, ~20000 per day) and a subset of analysis jobs (running at Tier2s, ~50000 per day) can create a massive load when retrieving data (conditions, luminosity) from ORCON/ORCOFF. Frontier caches allow to minimize direct access to Oracle in read- only mode: – 2 services implemented: at P5 on ORCON for HLT and DQM, and at CERN on ORCOFF for Tier0/1/2, Dedicated instances for Tier0 express/prompt reconstruction, luminosity workflows, MonteCarlo simulation, – The cache refreshing policy can imply some latency in retrieving data. The system is reliable w.r.t. the current workflows, but a change to the current infrastructure must lead to severe loads on one or more nodes/services. Scalability is an issue. Monitoring WorkshopSalvatore Di Guida9

10 FroNTier Architecture Monitoring WorkshopSalvatore Di Guida10

11 FroNTier monitoring Each one of the FroNTier services is monitored: – Availability of CERN launchpads and all squids; – HTTP requests for CERN launchpads and all squids; – Network traffic of CERN launchpads and all squids; – Objects stored in cache for CERN launchpads and all squids (object = payload of FroNTier request). Monitoring WorkshopSalvatore Di Guida11

12 Database safety and security Definition of a clearer account policy, and improvement of user privileges’ granting: – Based on application and user roles, See Oracle® Database Security GuideOracle® Database Security Guide This policy is beneficial for monitoring the access to all DB schemas: – Each account can be easily associated to an application or a group of developers: Transactions can be easily tracked – Reduce access with schema owner privileges: Identify quickly accesses trying to perform unauthorized actions (e.g. creating or inserting values in a read-only table). Monitoring WorkshopSalvatore Di Guida12

13 Monitoring access to Conditions PopCon monitors all payload transfers to production database: – Using a DB account where the status of all transfers is logged in relation tables; – Exposing the logs to developers, managers, users via a web-based application; See Antonio’s presentation this afternoon. From the DB point of view, all transactions against production schemas performing DML statements are logged. Monitoring WorkshopSalvatore Di Guida13

14 Monitoring access to Conditions The creation/modification of ORA schemas (i.e. schemas where a mapping between tables and C++ data members is defined) is not yet monitored: – From the DB point of view, this means logging also DDL statements, together with DML statements storing the mapping in the dedicated tables. This new monitoring instance will help to identify quickly: – Access to production schemas with wrong privileges; – Users/applications trying to perform illegal actions; – Corrupted schemas, providing help to experts for troubleshooting; Monitoring WorkshopSalvatore Di Guida14

15 Plans for conditions in CMSSW The new account policy and the schema modification monitoring are going to be put in the Condition Core software package. All actions will be performed with the help of IT DBAs: – Validation of code and procedures; – Testing. Monitoring WorkshopSalvatore Di Guida15

16 Hardware & service configuration The hardware involved in DB operations is split in two networks: – CERN GPN: Only applications approved by CERN Security Team can be visible from the outside network! – CMS online network at IP5 has a very strict security policy and a very constrained data transfer design: Files cannot be copied from GPN to CMS network, but they must be pulled in the online cluster from the offline network, Files must be pushed by the online network to offline network, Transferring data from GPN to CMS network is not envisaged in the online network design. Some services are deployed in one network, but others (e.g. condition drop-box) use resources in both networks: – The communication between networks must be monitored too! Monitoring WorkshopSalvatore Di Guida16

17 Front-end applications The different monitoring instances for Database tools have a frontend application: – Retrieving monitoring data; – Aggregating them according to metrics based on different use-case models; – Publishing them. The DB group focuses on web based front-end applications: – The monitoring data are read directly from Oracle: Small data volume, Reduce latency as much as possible, Checking Oracle availability (if Oracle fails, the application fails and an alarm is raised). Monitoring WorkshopSalvatore Di Guida17

18 Multi-tier architecture The monitoring system is based on three tier architecture: – The presentation tier (frontend server) supports user interaction and data presentation; – The logic tier (backend server) handles information exchange between the database and the user interface; – The data tier supports the access to the data stored in the database. This architecture has many advantages from the CMS DB monitoring point of view: – encapsulates the functionality processing related to user interaction in an application separated from the client application; – provides the possibility to program efficiently connection strategies (such as clients requests queuing, database access control); – all the code related to database connection can be totally separated from the client application, no queries issued by the client (users!); – Enforces security of backend and DB using firewall protection of GPN. Monitoring WorkshopSalvatore Di Guida18

19 ORCOFF Database Backend server Frontend server WSGI POST XMLHTTP cx_Oracle Web Interface Logic tierData tier Presentation tier Monitoring WorkshopSalvatore Di Guida19

20 Authorization Database activity must be controlled. Not all monitoring data should be visible worldwide: – DB service names; – Account names. Authentication mechanism for all web based applications, deployed in the frontend servers: – For Drop-box: access to machine where the service is deployed; – For Web browsing: SSO, e-groups. Monitoring WorkshopSalvatore Di Guida20

21 Technology There are many technologies available on the market and within the Open Source community. DB web based monitoring must be visible not only on desktops and laptops, but also on modern mobile devices: – See Antonio’s slides where this item is discussed in detail. Monitoring WorkshopSalvatore Di Guida21


Download ppt "Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group."

Similar presentations


Ads by Google