Oracle Streams Performance

Slides:



Advertisements
Similar presentations
ITEC474 INTRODUCTION.
Advertisements

Replication solutions for Oracle database 11g Zbigniew Baranowski.
Database Architectures and the Web
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Backup and Recovery Copyright System Managers LLC 2008 all rights reserved.
Mecanismos de alta disponibilidad con Microsoft SQL Server 2008 Por: ISC Lenin López Fernández de Lara.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
CERN - IT Department CH-1211 Genève 23 Switzerland t Transportable Tablespaces for Scalable Re-Instantiation Eva Dafonte Pérez.
Harvard University Oracle Database Administration Session 2 System Level.
Database Backup and Recovery
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
Introduction to Oracle Backup and Recovery
Navigating the Oracle Backup Maze Robert Spurzem Senior Product Marketing Manager
CERN IT Department CH-1211 Genève 23 Switzerland t Streams new features in 11g Zbigniew Baranowski.
1 Copyright © 2005, Oracle. All rights reserved. Introduction.
1 Copyright © 2009, Oracle. All rights reserved. Exploring the Oracle Database Architecture.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Oracle Recovery Manager (RMAN) 10g : Reloaded
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
1 The Google File System Reporter: You-Wei Zhang.
Fermilab Oct 17, 2005Database Services at LCG Tier sites - FNAL1 FNAL Site Update By Anil Kumar & Julie Trumbo CD/CSS/DSG FNAL LCG Database.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
SRUTHI NAGULAVANCHA CIS 764, FALL 2008 Department of Computing and Information Sciences (CIS) Kansas State University -1- Back up & Recovery Strategies.
Backup & Recovery Backup and Recovery Strategies on Windows Server 2003.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
ORACLE GOLDENGATE AT CERN
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
Designing and Deploying a Scalable EPM Solution Ken Toole Platform Test Manager MS Project Microsoft.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review and Outlook Distributed Database Workshop PIC, 20th April 2009.
Database Competence Centre openlab Major Review Meeting nd February 2012 Maaike Limper Zbigniew Baranowski Luigi Gallerani Mariusz Piorkowski Anton.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review Distributed Database Workshop CERN, 27 th November 2009 Eva Dafonte.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
CERN IT Department CH-1211 Geneva 23 Switzerland t Eva Dafonte Perez IT-DB Database Replication, Backup and Archiving.
Oracle Architecture - Structure. Oracle Architecture - Structure The Oracle Server architecture 1. Structures are well-defined objects that store the.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
20 Copyright © 2006, Oracle. All rights reserved. Best Practices and Operational Considerations.
Oracle Clustering and Replication Technologies UK Metadata Workshop - Oxford Barbara Martelli Gianluca Peco.
Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Online/offline replication via Oracle Streams WLCG Collaboration.
Eric Grancher CERN IT department Overview of Database Technologies Computing and Astroparticle Physics 2 nd ASPERA Workshop /1.
WLCG Collaboration Workshop CMS online/offline replication
CS 540 Database Management Systems
Replication using Oracle Streams at CERN
Maria Girone, CERN – IT, Data Management Group
Streams Service Review
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Distributed Database Management Systems
Evolution of Data(base) Replication Technologies for WLCG
Database Services at CERN Status Update
Bernd Panzer-Steindel, CERN/IT
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
A Messaging Infrastructure for WLCG
Database Architectures and the Web
STREAMS failover and resynchronization
Oracle Database Monitoring and beyond
Introduction of Week 6 Assignment Discussion
3D Project Status Report
Working with Very Large Tables Like a Pro in SQL Server 2014
Case studies – Atlas and PVSS Oracle archiver
Data Lifecycle Review and Outlook
ATLAS DC2 & Continuous production
Working with Very Large Tables Like a Pro in SQL Server 2017
Presentation transcript:

Oracle Streams Performance Eva Dafonte Pérez

Outline Oracle Streams Replication Replication Performance Streams Optimizations Downstream Capture Split & Merge Procedure TCP and Network Rules Flow Control Lessons Learned Transportable Tablespaces for Scalable Resynchronization Summary 2

Oracle Streams Replication Technology for sharing information between databases Database changes captured from the redo-log and propagated asynchronously as Logical Change Records (LCRs) Apply Capture Redo Logs Propagate Target Database Source Database 3

LCG Distributed Architecture The mission of the Worldwide LHC Computing Grid Project (WLCG) is to develop, build and maintain a computing infrastructure for storage and analysis of the LHC data. Data will be distributed around the globe according to a four -tiered model. A primary backup will be recorded at CERN, the "Tier-0" centre of WLCG. After initial processing, these data will be distributed to a series of Tier-1 centers, large computer centers with sufficient compute and storage capacities as well as round-the-clock support for Grid operations. The Tier-1 centers will perform recurrent reconstruction passes and make data available to Tier-2 centers, each consisting of one or several collaborating computing facilities, which can store sufficient data and provide adequate computing power for specific analysis tasks. Individual scientists will access these facilities through Tier-3 computing resources, which can consist of local clusters in a University Department or even individual PCs. 4

Streams Setup for ATLAS RAC: 6 nodes dual CPUs RAC: 4 nodes quad-core CPUs RAC: 3 nodes quad-core CPUs 20 RAC databases (up to 6 nodes) 124 servers, 150 disk arrays (+1700 disks) Or: 450 CPU cores, 900GB of RAM, 550 TB of raw disk space(!) RAC: 2 nodes dual CPUs 5

Streams Setup for LHCb (Conditions and LFC) RAC: 4 nodes quad-core CPUs RAC: 3 nodes quad-core CPUs RAC: 2 nodes dual CPUs 6

Replication performance The atomic unit is the change record: LCR LCRs can vary widely in size Throughput is not a fixed measure Capture performance: Read LCRs information from the redo From redo log buffer (memory - much faster) From archive log files (disk) Convert changes into LCRs depends on the LCR size and number of columns Enqueue the LCRs Independent of the LCR size 7

Replication performance Propagation performance: Browse LCRs Transmit LCRs over the network Remove LCRs from the queue Done in separate process to avoid any impact Apply performance: Execute LCRs Manipulate the database is slower than the redo generation Execute LCRs serially => apply cannot keep up with the redo generation rate The main problem when applying an LCR is that the act of manipulating the database is slower than the generation of the redo that represents the change. So, if apply were to execute LCRs serially, it could not keep up with the redo generation rate. 8

Downstream Capture Downstream capture to de-couple Tier 0 production databases from destination or network problems source database availability is highest priority more resources allocated to the Streams processes Optimizing redo log retention on downstream database to allow for sufficient re-synchronisation window we use 5 days retention to avoid tape access Optimizing capture parameters retention time and checkpoint frequency Apply Capture Propagate Target Database Downstream Database Redo Logs Source Database Redo Transport method By default, capture process writes checkpoints very frequently and stores their data for long time. On a very write-intensive system, this can cause the logminer tables to become extremely large. Consequences: Table to eat up all the space in its tablespace Capture process has to read this tables very frequently. As a result, if the table grows too large, then the performance of the capture process can become severely degraded. 9

Split & Merge Procedure If one site is down LCRs are not removed from the queue capture process might be paused by flow control  impact on replication performance Objective: isolate replicas against each other split the capture process (original) Real-Time capture for sites “in good shape” (new) normal capture for site/s unstable/s new capture queue and propagation job original propagation job is dropped spilled LCRs are dropped from the original capture queue merge the capture processes resynchronization Separate capture can even run in different node => more resources allocated!!! suggested by Patricia McElroy ( Principal Product Manager Distributed Systems/Replication) 10

TCP and Network Optimizations TCP and Network tuning adjust system max TCP buffer (/etc/sysctl.conf) parameters to reinforce the TCP tuning DEFAULT_SDU_SIZE=32767 RECV_BUF_SIZE and SEND_BUF_SIZE Optimal: 3 * Bandwidth Delay Product Reduce the Oracle Streams acknowledgements alter system set events '26749 trace name context forever, level 2'; Getting good TCP performance over high-latency high-bandwidth networks (basically you need to maintain the pipe full is order to use the available bandwidth and reduce the latency impact) It is critical to use the optimal TCP send and receive socket buffer sizes for the link you are using Default TCP buffer sizes are too small (you can only get a small % of the available bandwidth) 11

Streams Rules Used to control which information to share Rules on the capture side caused more overhead than on the propagation side Avoid Oracle Streams complex rules Complex Rule condition => '( SUBSTR(:ddl.get_object_name(),1,7) IN (''COMP200'', ''OFLP200'', ''CMCP200'', ''TMCP200'', ’'TBDP200'', ''STRM200'') OR SUBSTR (:ddl.get_base_table_name(),1,7) IN (''COMP200'', ''OFLP200'', ''CMCP200'', ''TMCP200'', ''TBDP200'', ''STRM200'') ) ' Simple Rule condition => '(((:ddl.get_object_name() >= ''STRM200_A'' and :ddl.get_object_name() <= ''STRM200_Z'') OR (:ddl.get_base_table_name() >= ''STRM200_A'' and :ddl.get_base_table_name() <= ''STRM200_Z'')) OR ((:ddl.get_object_name() >= ’'OFLP200_A'' and :ddl.get_object_name() <= ''OFLP200_Z'') OR (:ddl.get_base_table_name() >= ’'OFLP200_A'' and :ddl.get_base_table_name() <= ''OFLP200_Z'')) Avoid complex rules: LIKE Functions NOT 12

Streams Rules Example: ATLAS Streams Replication 80 600 filter tables by prefix 80 600 Complex Rules Simple Rules 13

Streams Rules Example: ATLAS Conditions 2 GB per day Offline (CERN) to Tier1 (Gridka) 14

Flow Control By default, flow control kicks when the number of messages is larger than the threshold Buffered publisher: 5000 Capture publisher: 15000 Manipulate default behavior 10.2.0.3 + Patch 5093060 = 2 new events 10867: controls threshold for any buffered message publisher 10868: controls threshold for capture publisher 10.2.0.4 = 2 new hidden parameters “_capture_publisher_flow_control_threshold” “_buffered_publisher_flow_control_threshold” 15

Flow Control Example: ATLAS PVSS tests Online to Offline (CERN), 6 GB per day default: 2500 - 3000 LCRs per second new thresholds: 3500 - 4000 LCRs per second 16

Lessons Learned SQL bulk operations (at the source db) May map to many elementary operations at the destination side Need to control source rates to avoid overloading Long transactions (non-frequent commits) Total number of outstanding LCRs is too large LCRs are in memory too long LCRs are spilled over to disk Apply performance is impacted All LCRs in a single transaction must be applied by one apply server Parallel servers cannot be used efficiently Too many unbrowsed messages enables flow control Streams processes are paused 17

Lessons Learned Example: CMS replication Online to Offline (CERN) Single transaction mapping 428400 LCRs 7 objects have to be inserted in a single transaction (object equivalent to table) Each object = 1.63 MB, mapping 61200 rows Single transaction = 428400 row inserts (equivalent to more or less same number of LCRs) Time to replicate one transaction: 300 sec 18

Lessons Learned Example: CMS replication Use BLOB objects: single transaction mapping 3600 LCRs Performance depends on the number of LCRs per transaction, rather than LCR size 1 object = 1 BLOB (1.63 MB) Insert 7 * BLOB objects Single transaction mapping 3600 LCRs Time to replicate one transaction: 30 sec - factor 10 difference!!!! 19

Scalable Resynchronization Target site out of the Streams recovery window Complete transfer of data (schemas and tables) using Oracle Data Pump utility to destination database may take days Example ATLAS Conditions data Transportable Tablespaces: move a set of tablespaces from one Oracle database to another Export metadata of tablespace instead of data in tablespace Moving data using transportable tablespaces is much faster than Data Pump export/import 20

Scalable Resynchronization Restrictions: Database block size and character set must be the same at source and target The Oracle release and the compatibility must be the same or higher at target Tablespaces must be self contained Tablespaces need to be set to read-only while the files are copied Cross-Platform since Oracle 10g Oracle provides a byte-order-conversion solution that uses Oracle Recovery Manager (RMAN) It is NOT possible to use tablespaces from Tier0 BUT we can use tablespaces from Tier1 21

Scalable Resynchronization Source Database Target Databases C A 1 2 4 A 3 Create database links between databases Create directories pointing to datafiles Stop replication to site 5 Ensure tablespaces are read-only Transfer the data files of each tablespace to the remote system Import tablespaces metadata in the target Make tablespaces read-write Reconfigure Streams A 5 C A 3 C A 5 22

Summary The LCG Database Deployment Project (LCG 3D) has set up a world-wide distributed database infrastructure for LHC some 124 RAC nodes = 450 CPU cores at CERN + several tens of nodes at 10 partner sites are in production now Monitoring of database & streams performance has been implemented building on grid control and strmmon tools key to maintain and optimise any larger system Important optimizations have been implemented in order to increase the Streams performance collaboration with Oracle as part of the CERN openlab has been essential and beneficial for both partners 23

Questions? 24