Common Analysis Framework, June 2013

Slides:

Advertisements

Similar presentations

ITEC474 INTRODUCTION.

Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Chapter 9. Performance Management Enterprise wide endeavor Research and ascertain all performance problems – not just DBMS Five factors influence DB performance.

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.

Database System Concepts and Architecture

Big Data Working with Terabytes in SQL Server Andrew Novick

Harvard University Oracle Database Administration Session 5 Data Storage.

Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.

Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.

07-June Database futures workshop, June 2011 CERN Enhance the ATLAS database applications by using the new Oracle 11g features Gancho Dimitrov.

Oracle Database Administration Database files Logical database structures.

12 Copyright © 2007, Oracle. All rights reserved. Database Maintenance.

05-June CERN Oracle tutorials 2013 Database solutions to ATLAS specific application requirements (real-life examples) Gancho Dimitrov (CERN) G.

Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.

IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.

Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.

Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.

CSE 781 – DATABASE MANAGEMENT SYSTEMS Introduction To Oracle 10g Rajika Tandon.

Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.

7202ICT Database Administration Lecture 7 Managing Database Storage Part 2 Orale Concept Manuel Chapter 3 & 4.

Extents, segments and blocks in detail. Database structure Database Table spaces Segment Extent Oracle block O/S block Data file logical physical.

Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.

The protection of the DB against intentional or unintentional threats using computer-based or non- computer-based controls. Database Security – Part 2.

1099 Why Use InterBase? Bill Todd The Database Group, Inc.

Module 11: Programming Across Multiple Servers. Overview Introducing Distributed Queries Setting Up a Linked Server Environment Working with Linked Servers.

Oracle Tuning Ashok Kapur Hawkeye Technology, Inc.

Triggers A Quick Reference and Summary BIT 275. Triggers SQL code permits you to access only one table for an INSERT, UPDATE, or DELETE statement. The.

Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.

Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,

Development of Hybrid SQL/NoSQL PanDA Metadata Storage PanDA/ CERN IT-SDC meeting Dec 02, 2014 Marina Golosova and Maria Grigorieva BigData Technologies.

ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.

Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.

ESRI User Conference 2004 ArcSDE. Some Nuggets Setup Performance Distribution Geodatabase History.

08-Nov Database TEG workshop, Nov 2011 ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements Gancho Dimitrov.

Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.

7 Strategies for Extracting, Transforming, and Loading.

17-Oct CHEP conference, Amsterdam Oct 2013 Next generation database relational solutions for ATLAS Distributed Computing Gancho Dimitrov (CERN)

3 Copyright © 2006, Oracle. All rights reserved. Using Recovery Manager.

Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.

Chapter 4 Logical & Physical Database Design

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

16-May ADC technical interchange meeting Tokyo, May 2013 Database aspects of ATLAS distributed computing Gancho Dimitrov (CERN) G. Dimitrov.

IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.

11-Nov Distr. DB Operations workshop - November 2008 The PVSS Oracle DB Archive in ATLAS ( life cycle of the data ) Gancho Dimitrov (LBNL)

Oracle Architecture - Structure. Oracle Architecture - Structure The Oracle Server architecture 1. Structures are well-defined objects that store the.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

1 11g NEW FEATURES ByVIJAY. 2 AGENDA  RESULT CACHE  INVISIBLE INDEXES  READ ONLY TABLES  DDL WAIT OPTION  ADDING COLUMN TO A TABLE WITH DEFAULT VALUE.

3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.

Oracle Standby Implementation Tantra Invedy. Standby Database Introduction Fail over Solution Disaster Recovery Solution if remote Ease of implementation.

Partitioning & Creating Hardware Tablespaces for Performance

Databases and DBMSs Todd S. Bacastow January 2005.

Jean-Philippe Baud, IT-GD, CERN November 2007

Practical Database Design and Tuning

Data, Space and Transaction Processing

Oracle structures on database applications development

Module 11: File Structure

Elizabeth Gallas - Oxford ADC Weekly September 13, 2011

Physical Database Design and Performance

Maximum Availability Architecture Enterprise Technology Centre.

Ákos Frohner EGEE'08 September 2008

Database Performance Tuning and Query Optimization

CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE

Case studies – Atlas and PVSS Oracle archiver

Data, Databases, and DBMSs

Data Lifecycle Review and Outlook

Practical Database Design and Tuning

Support for ”interactive batch”

Database Environment Transparencies

Chapter 7 Using SQL in Applications

Chapter 11 Database Performance Tuning and Query Optimization

Presentation transcript:

Common Analysis Framework, June 2013 Database solutions and data management for PanDA system Gancho Dimitrov (CERN) 27-June-2013 G. Dimitrov

Outline ATLAS databases – main roles and facts Used DB solutions and techniques : Partitioning types – range, automatic interval Data sliding windows Data aggregation Scheduled index maintenance Result set caching Stats gathering settings Potential use of Active Data Guard (ADG) JEDI component and relevant DB objects Conclusions 27-June-2013 G. Dimitrov

ATLAS databases topology Snapshot taken from the PhyDB Oracle Streams replication monitoring ADC applications PanDA, DQ2, LFC, PS, AKTR, AGIS are hosted on the ADCR database ACDR database cluster HW specifications: 4 machines 2 quad core CPUs Intel Xeon@ 2.53GHz 48 GB RAM 10 GigE for storage and cluster access NetApp NAS storage with 512 GB SSD cache 27-June-2013 G. Dimitrov

PanDA system The PanDA system is the ATLAS workload management system for production and user analysis jobs Originally based on a MySQL database. Migrated in 2008 to Oracle at CERN. Challenges: - PanDA system has to manage millions of grid jobs daily - Changes into jobs statuses, sites loads have to be reflected on the database fast enough. Fast data retrievals to the PanDA server and monitor are key requirements - Cope with spikes of user’s workload - DB system has to deal efficiently with two different workloads: transactional from PanDA server and (to some extent) data warehouse load from PanDA monitor. 27-June-2013 G. Dimitrov

Trend in number of daily PanDA jobs Jan – Dec 2011 Jan – Dec 2012 Jan- June 2013 27-June-2013 G. Dimitrov

The PANDA ‘operational’ and ‘archive’ data All information relevant to a single job is stored in 4 major tables. The most important stats are kept separately from the other space consuming attributes like job parameters, input and output files. The ‘operational’ data is kept in a separate schema which hosts active jobs plus finished ones of the most recent 3 days. Jobs that get status ‘finished’, ‘failed’ or ‘cancelled’ are moved to an archive PANDA schema (ATLAS_PANDAARCH). ATLAS_PANDA => ATLAS_PANDAARCH 27-June-2013 G. Dimitrov

PanDA data segments organization ATLAS_PANDA JOB, PARAMS, META and FILES tables: partitioned on a ‘modificationtime’ column. Each partition covers a time range of a day ATLAS_PANDAARCH archive tables partitioned on the ‘modificationtime’ column. Some table have defined partitions each covering three days window, others a time range of a month Partitions that can be dropped. An Oracle scheduler job is taking care of that daily Panda server DB sessions Oracle scheduler job Inserts the data of the last complete day Filled partitions Empty partitions relevant to the future. A job is scheduled to run every Monday for creating seven new partitions Time line Certain PanDA tables have defined data sliding window of 3 days, others 30 days. This natural approach showed to be adequate and not resource demanding ! Important: For being sure that job information will be not deleted without being copied in the PANDAARCH schema, a special verification PLSQL procedure is taking place before the partition drop event! 27-June-2013 G. Dimitrov

PanDA data segmentation benefits High scalability: the Panda jobs data copy and deletion is done on table partition level instead on a row level Removing the already copied data is not IO demanding (very little redo and does not produce undo ) as this is a simple Oracle operation over a table segment and its relevant index segments (alter table … drop partition ) Fragmentation in the table segments is avoided. Much better space utilization and caching in the buffer pool No need for indexes rebuild or coalesce operations for these partitioned tables. 27-June-2013 G. Dimitrov

PANDA=>PANDAARCH data flow machinery The PANDA => PANDAARCH data flow is sustained by a set of scheduler jobs on the Oracle server which execute a logic encoded in PL/SQL procedures Weekly job which creates daily partitions relevant to the near future days Daily job which copies data of a set of partitions from PANDA to PANDAARCH Daily job which verifies that all rows of certain partition have been copied to PANDAARCH and drops the PANDA partition if the above is true. 27-June-2013 G. Dimitrov

Monitoring on the PanDA => PANDAARCH data flow PanDA scheduler jobs have to complete successfully. Very important are the ones that pre-create partitions, copy data to the archive schema and removes partitions which data has been already copied. Note: Whenever new columns are added to the PanDA JOBS_xyz tables, the relevant PLSQL procedure for the data copying has to be updated to consider them. In case of errors, a DBA or other database knowledgeable person has to investigate and solve the issue(s). Oracle keeps a record from each scheduler job execution (for 60 days) If happen that the 3 days data sliding window in the PanDA ‘operational’ tables is temporary not sustained by whatever reason, the PanDA server is not affected. For the PanDA monitor is not very clear yet. However there were no complains for the two such occurrences in the last 30 months. 27-June-2013 G. Dimitrov

The PanDA PLSQL code Currently certain pieces in the PanDA PLSQL code are bound to the ATLAS PanDA account names (e.g. DB object names are full qualified with hardcoded object owner) Efforts done to parameterize the PLSQL procedures for being flexible on the schemes they interact on. Full set of the modified PLSQL procedures are under validation on the INTR testbed. After a successful validation of the modified PLSQL procedures and Oracle scheduler jobs these pieces of the PanDA database layer will be ready for the CAF (Common Analysis Framework) 27-June-2013 G. Dimitrov

Improvements on the current PanDA system Despite the increased activity on the grid, using several tuning techniques the server resource usage stayed in the low range: e.g. CPU usage in the range 20-30% Thorough studies on the WHERE clauses of the PanDA server queries resulted in: - revision of the indexes: removal or replacement with more approriate multi- column ones - weekly index maintenance (rebuids triggered by a scheduler job) on the tables with high transaction activity - tuned queries - an auxiliary table for the mapping Panda ID <=> Modification time 27-June-2013 G. Dimitrov

DB techniques used in PanDA (1) Automatic interval parititioning Partition is created automatically when a user transaction imposes a need for it (e.g. user inserts a row with a timestamp for which a partition does not yet exist) CREATE TABLE table_name ( … list of columns …) PARTITION BY RANGE(my_tstamp) INTERVAL(NUMTODSINTERVAL(1,’MONTH’)) ( PARTITION data_before_01122011 VALUES LESS THAN (TO_DATE('01-12-2011', 'DD-MM-YYYY')) ) In PanDA and other ATLAS applications interval partitioning is very handy for transient type of data where we impose a policy of agreed DATA SLIDING WINDOW (however partition removal is done via a home made PLSQL code) 27-June-2013 G. Dimitrov

DB techniques used in PanDA (2) Result set caching This technique was used on well selected set of PanDA server queries - useful in cases where data do not change often, but is queried on a frequent basis. The best metric to consider when tuning queries, is the number of Oracle block reads per execution. Queries for which result has been gotten from the result cache shows ‘buffer block reads = 0’ Oracle sends back to the client a cached result if the result has not been changed meanwhile by any transaction, thus improving the performance and scalability The statistics shows that 95% of the executions of the PanDA server queries (17 distinct queries with this cache option on) were resolved from the result set cache. 27-June-2013 G. Dimitrov

DB techniques used in PanDA (3) Data aggregation for fast result devilery PanDA server has to be able to get instantaneously details on the current activity at any ATLAS computing site (about 300 sites) – e.g. number of jobs at any site with particular priority, processing type, status, working group, …etc This is vital for justifying on which site the new coming jobs is best to be routed. Query execution addresing the above requirement showed to be CPU expensive because of high frequency execution. Solution: A table with aggregated stats is re-populated by a PLSQL procedure on an interval of 2 min by an Oracle scheduler job (usual elapsed time 1-2 sec) The initial approach relied on a Materialized view (MV), but it showed to be NOT reliable because the MV refresh interval relies on the old DBMS_JOB package 27-June-2013 G. Dimitrov

DB techniques used in PanDA (4) Customized table settings for the Oracle stats gathering Having up-to-date statistics on tables data is essential for having optimal queries’ data access path. We take advantage from statistics collection on partitioned tables called incremental statistics gathering. Oracle spends time and resources on collecting statistics only on partitions which are transactional active and computes the global table statistics using the previously ones in an incremental way. exec DBMS_STATS.SET_TABLE_PREFS ('ATLAS_PANDA', 'JOBSARCHIVED4', 'INCREMENTAL', 'TRUE'); 27-June-2013 G. Dimitrov

Potential use of ADG from PanDA monitor PanDA complete archive now hosts information of 900 million jobs – all jobs since the job system start in 2006 ADCR database has two standby databases: Data Guard for disaster recovery and backup offloading Active Data Guard (ADCR_ADG) for read-only replica PanDA monitor can benefit from the Active Data Guard (ADG) resources => An option is to sustain two connection pools: - one to the primary database ADCR - one to the ADCR’s ADG The idea is queries that span on time ranges larger than certain threshold to be resolved from the ADG where we can afford several paralell slave processes per user query. => Second option is to connect to the ADG only and fully rely on it. Read-only replica 27-June-2013 G. Dimitrov

New development: JEDI (a component of PanDA) JEDI is a new component of the PanDA server which dynamically defines jobs from a task definition. The main goal is to make PanDA task- oriented. Tables of initial relational model of the new JEDI schema (documented by T. Maeno) complement the existing PanDA tables on the INTR database Activities in the last months: - understanding the new data flow, requirements, access patterns - address the requirement of storing information on event level for keeping track of the active jobs’ progress. - studies for the best possible data organization (partitioning) from manageability and performance point of view. - get to the most appropriate physical implelentaion of the agreed relational model - tests with representative data volume 27-June-2013 G. Dimitrov

JEDI database relational schema Yellow: JEDI tables Orange: PanDA tables Green: Auxiliary tables for speeding up queries 27-June-2013 G. Dimitrov

Transition from current PanDA schema to a new one The idea is the transition from the current PanDA server to the new one with the DB backend objects to be transparent to the users. JEDI tables are complementary to the existing PanDA tables. The current schema and PANDA => PANDAARCH data copying will be not changed. However the relations between the existing and the new set of tables have to exist. In particular: - Relation between JEDI’s Task and PanDa’s Job by having a foreign key in all JOBS* tables to the JEDI_TASKS table - Relation between JEDI’s Work queue (for different shares of workload) and PanDa’s Job by having a foreign key in all JOBS* tables to the JEDI_WORK_QUEUE table - Relation between JEDI’s Task, Dataset and Contents (new seq. ID) and PanDA’s Job processing the file (or fraction of it) by having a foreign key in the FILESTABLE4 table to the JEDI_DATASET_CONTENTS table (when a task tries a file multiple times, there are multiple rows in PanDA’s FILESTABLE4 while there is only one parent row in the JEDI’s DATASET_CONTENTS table) Note: "NOT NULL" constraints will not be added to the new columns on the existing PANDA tables for allowing to be used standalone without the use of JEDI tables. 27-June-2013 G. Dimitrov

JEDI DB objects – physical implementation Data segmenting is based on a RANGE partitioning on the JEDI’s TASKID column with interval of 100000 IDs (tasks) on six of the JEDI tables (uniform data partitioning). The JEDI data segments are placed on dedicated Oracle tablespace (data file) separate from existing PanDA tables Thanks to the new CERN license agreement with Oracle, now we take advantage of the Oracle advanced compression features – compression of data within a data block while application does row inserts or updates. PanDA tests on tables with and without OLTP compression showed that Oracle was right in the predictions of the compression ratio. 27-June-2013 G. Dimitrov

PanDA database volumes Disk space used by the PanDA in 2011 is 1.3 TB in 2012 is 1.7 TB for the first half of 2013 is 1 TB. According to the current analysis and production submission tasks rates of 10K to 15K per day, the estimate for the JEDI needed disk space is in the range 2 to 3 TB per year. However with the OLTP compression is place, the disk space usage will be reduced. Activating the same type of compression on the existing PanDA tables would be beneficial as well. 27-June-2013 G. Dimitrov

Conclusions The slides content presented the current PanDA data organization and the planned new one with respect to the JEDI component Deployment of the new JEDI database objects in the ATLAS production database server is planned for 2th July 2013 Thank you! 27-June-2013 G. Dimitrov