GoldenGate Performance Tuning

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

An Overview of GoldenGate Replication Brian Keating December 31, 2009.
Replication solutions for Oracle database 11g Zbigniew Baranowski.
Jos van Lammeren
Oracle Architecture. Instances and Databases (1/2)
GoldenGate Monitoring and Troubleshooting
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Harvard University Oracle Database Administration Session 2 System Level.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Oracle 10g Database Administrator: Implementation and Administration
Maintaining and Updating Windows Server 2008
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Navigating the Oracle Backup Maze Robert Spurzem Senior Product Marketing Manager
Overview SAP Basis Functions. SAP Technical Overview Learning Objectives What the Basis system is How does SAP handle a transaction request Differentiating.
CERN IT Department CH-1211 Geneva 23 Switzerland t Lorena Lobato Pardavila T1s Coordination Meeting Introduction to GoldenGate administration.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams new features in 11g Zbigniew Baranowski.
Virtual Memory Tuning   You can improve a server’s performance by optimizing the way the paging file is used   You may want to size the paging file.
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
C Copyright © 2009, Oracle. All rights reserved. Using Diagnosis and Debugging Techniques.
Introduction and simple using of Oracle Logistics Information System Yaxian Yao
15 Copyright © 2004, Oracle. All rights reserved. Proactive Maintenance.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
Oracle Tuning Considerations. Agenda Why Tune ? Why Tune ? Ways to Improve Performance Ways to Improve Performance Hardware Hardware Software Software.
Oracle Tuning Ashok Kapur Hawkeye Technology, Inc.
Monitoring Windows Server 2012
Oracle9i Performance Tuning Chapter 12 Tuning Tools.
SQLRX – SQL Server Administration – Tips From the Trenches SQL Server Administration – Tips From the Trenches Troubleshooting Reports of Sudden Slowdowns.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
7 Strategies for Extracting, Transforming, and Loading.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
Oracle9i Performance Tuning Chapter 11 Advanced Tuning Topics.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Copyright Sammamish Software Services All rights reserved. 1 Prog 140  SQL Server Performance Monitoring and Tuning.
20 Copyright © 2006, Oracle. All rights reserved. Best Practices and Operational Considerations.
Maintaining and Updating Windows Server 2008 Lesson 8.
SQL Advanced Monitoring Using DMV, Extended Events and Service Broker Javier Villegas – DBA | MCP | MCTS.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Online/offline replication via Oracle Streams WLCG Collaboration.
WLCG Collaboration Workshop CMS online/offline replication
SQL Database Management
Planning a Migration.
With Temporal Tables and More
Monitoring Windows Server 2012
SQL Replication for RCSQL 4.5
How To Pass Oracle 1z0-060 Exam In First Attempt?
SQL Server Data Collector From Every Angle
Antonio Abalos Castillo
Database Performance Tuning &
SQL Server Monitoring Overview
IBM DATASTAGE online Training at GoLogica
Some less known facts about log file sync and other LGWR-related waits
Database Performance Tuning and Query Optimization
Oracle Architecture Overview
Oracle Streams Performance
MANAGING DATA RESOURCES
Cloud computing mechanisms
In Memory OLTP Not Just for OLTP.
Chapter 11 Database Performance Tuning and Query Optimization
Performance And Scalability In Oracle9i And SQL Server 2000
An Overview of GoldenGate Replication
David Gilmore & Richard Blevins Senior Consultants April 17th, 2012
Presentation transcript:

GoldenGate Performance Tuning Tips & Techniques Gavin Soorma

Agenda What is Lag and what can contribute to lag in a GoldenGate replication environment Compare Classic Extracts and Replicats with Integrated Extracts and Replicats New performance tuning challenges introduced by the Log Mining Server component What tools do we have available in OGG 12.2 to monitor performance Using those tools to examine and investigate a real-life performance problem and how the problem was resolved

Oracle GoldenGate Architecture

Where is the problem? x x x x x x x

Is the problem because of a Goldengate component? Extract in reading the archive log and writing the data to a trail (or remote host) Datapump reading the extract trail and writing to a remote host  Network  Collector (server.exe) on the target receiving network data and writing it to a local trail Replicat reading the local trail and writing to the database Logmining Server issues – both source as well as target

Measuring OGG Performance Typically a GoldenGate performance problem is centered around Lag LAG is the elapsed time between when a transaction is committed and written to a storage medium such as an archive log or redo log on the source and the time when Replicat writes the same transaction to the target database

Classic Extract

Integrated Extract Extract Logmining Server •Reader: Reads logfile and splits into regions •Preparer: Scans regions of logfiles and prefilters based on extract parameters •Builder: Merges prepared records in SCN order •Capture: Formats Logical Change Records(LCRs) and passes to GoldenGate Extract Extract •Requests LCRs from logmining server •Performs Mapping and Transformations •Writes Trail File

Classic Replicat

Integrated Replicat Replicat •Reads the trail file •Constructs logical change records (LCRs) •Transmits LCRs to Oracle Database via the Lightweight Streaming API Inbound Server (Database Apply Process) •Receiver: Reads LCRs •Preparer: Computes the dependencies between the transactions (primary key, unique indexes, foreign key) , grouping transactions and sorting in dependency order. •Coordinator: Coordinates transactions, maintains the order between applier processes. •Applier: Performs changes for assigned transactions, including conflict detection and error handling.

Do we still use Classic Extracts and Replicats? Any reason why we are not using BOTH Integrated Extracts Integrated Replicats Do we have source/target Oracle databases on versions less than 11.2.0.3 or 11.2.0.4? Consider Downstream Capture if Integrated Extracts not allowed on the source because it is ‘invasive’ Do we use RAC, ASM, TDE? Do we want RMAN integration with Oracle GoldenGate?

A case for Integrated Replicat Integrated Replicat offers automatic parallelism which automatically increases or decreases the number of apply processes based on the current workload and database performance Co-ordinated replicat provides multiple threads, but dependent objects had to be handled by the same replicat thread – otherwise Replicat will abend Integrated Replicat ensures referential integrity and DDL/DML operations are automatically applied in the correct order Management and tuning of Replicat performance is simplified since you do not have to manually configure multiple Replicat processes to distribute the tables between them. Tests have shown that a single Integrated Replicat can out-perform multiple Classic Replicats as well as multi-thread Co-ordinated Replicat

Tune the database before tuning GoldenGate! Is the target database already having I/O issues? Are the redo logs properly configured – size and location? Data replication is I/O intensive, so fast disks are important, particularly for the online redo logs. Redo logs are constantly being written to by the database as well as being read by GoldenGate Extract processes Do we have any significant ‘Log File Sync’ wait events? Also consider the effect of adding supplemental logging which will increase the redo logging

Key Points Identify and isolate tables with significantly high DML activity Separate Extract and Replicat process groups for such tables Dedicated Extract and Replicat process groups for tables with LOB columns Possibly dedicated process groups for tables with long running transactions Run the Oracle GoldenGate database Schema Profile check script to identify tables with missing PKs/UKs/Deferred Constraints/NOLOGGING/Compression Start with a single Replicat process (as well as Extract process) Add replicat processes until latency is acceptable (Classic)

Key Points In its classic mode, Replicat process can be a source of performance bottlenecks because it is a single-threaded process that applies operations one at a time by using regular SQL Consider BATCHSQL to increase performance of Replicat particularly in OLTP type environments characterized by smaller row changes in terms of data BATCHSQL causes Replicat to organize similar SQL statements into arrays which leads to faster processing as opposed to serial apply of SQL statements If tables can be separated based on PK/FK relationships consider Co-Ordinated replicats with multiple threads For Integrated Replicats check the parameters PARALLELISM, MAX_PARALLELISM, COMMIT_SERIALIZATION, EAGER_SIZE

Tune the Network for OGG The network is an important component in GoldenGate replication The two RMTHOSTparameters, TCPBUFSIZE and TCPFLUSHBYTES are very useful for increasing the buffer sizes and network packets sent by Data Pump over the network from the source to the target system. This is especially beneficial for high latency networks Use Data Pump compression if network bandwidth is constrained and when CPU headroom is available

Tuning the Network - Before GGSCI (ti-p1-bscs-db-01) 1> send pbsprd2 gettcpstats Sending GETTCPSTATS request to EXTRACT PBSPRD2 ... RMTTRAIL ./dirdat/rt000113, RBA 38351713 Buffer Size 2266875 Flush Size 2266875 SND Size 2097152 Streaming Yes Inbound Msgs 2710 Bytes 54259, 3 bytes/second Outbound Msgs 20541 Bytes 13539482811, 795925 bytes/second Recvs 5420 Sends 20541 Avg bytes per recv 10, per msg 20 Avg bytes per send 659144, per msg 659144 Recv Wait Time 1558113382, per msg 574949, per recv 287474 Send Wait Time 7514461569, per msg 365827, per send 365827

Tuning the Network - After GGSCI (pl-p1-bscs-db-01) 12> send pbsprd1 gettcpstats Sending GETTCPSTATS request to EXTRACT PBSPRD1 ... RMTTRAIL ./dirdat/rt000000, RBA 98558417 Buffer Size 200000000 Flush Size 200000000 SND Size 134217728 Streaming Yes Inbound Msgs 258 Bytes 4746, 1 bytes/second Outbound Msgs 2402 Bytes 98675058, 37893 bytes/second Recvs 516 Sends 2402 Avg bytes per recv 9, per msg 18 Avg bytes per send 41080, per msg 41080 Recv Wait Time 63143512, per msg 244742, per recv 122371 Send Wait Time 486941, per msg 202, per send 202 Compare it with the earlier figures Recv Wait Time 1558113382, per msg 574949, per recv 287474 Send Wait Time 7514461569, per msg 365827, per send 365827

Allocate memory for the Log Mining Server Set the STREAMS_POOL_SIZE initialization parameter for the database Set the MAX_SGA_SIZE parameter for both Integrated Extracts and Integrated Replicats Controls amount of memory used by logmining server – default is 1 GB STREAMS_POOL_SIZE= (MAX_SGA_SIZE * PARALLELISM) + 25% head room For example, using the default values for the MAX_SGA_SIZE and PARALLELISM parameters: ( 1GB * 2 ) * 1.25 = 2.50GB STREAMS_POOL_SIZE = 2560M

Allocate memory for the Log Mining Server Log mining Server is running on both source as well as target STREAMS_POOL_SIZE needs to be properly sized on IE as well as IR end SQL> SELECT state FROM GV$GG_APPLY_RECEIVER; STATE ---------------------------------------------- Waiting for memory SQL> show parameter streams NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ streams_pool_size big integer 2G SQL> alter system set streams_pool_size =24G sid='bsprd1' scope=both; System altered. SQL> SQL> SELECT state FROM GV$GG_APPLY_RECEIVER; Enqueueing LCRS

Typically a GoldenGate performance problem is centered around Lag LAG is the elapsed time between when a transaction is committed and written to a storage medium such as an archive log or redo log on the source and the time when Replicat writes the same transaction to the target database Automatic Heartbeat Tables GGSCI LAG, REPORT RATE

AWR report now have section for GoldenGate

Use ASH and ASH Analytics to diagnose an OGG performance problem

Automatic Heartbeat Table NEW OGG 12.2 Heartbeat Tables were recommended but involved a fair bit of work to setup and configure Single 12.2 command – ADD HEARTBEATTABLE Record End-to-End Replication Lag in Tables Creates database level tables, views and jobs GG_LAG view – INCOMING_LAG, OUTGOING_LAG for bi-directional replication GG_LAG_HISTORY – retains historical lag information until purged

Automatic Heartbeat Table GG_LAG GG_LAG_HISTORY How much is the lag? GG_HEARTBEAT GG_HEARTBEAT_HISTORY Which process is responsible for the lag?

OGG 12.2 https://java.net/projects/oracledi/downloads/download/GoldenGate/OGGPTK.jar

Fine grained performance monitoring window which can be accessed through the RESTful Web Services

Integrated Extract/Replicat Health Check GoldenGate Integrated Capture and Integrated Replicat Healthcheck Script (Doc ID 1448324.1) Available for both Oracle 12c as well as 11g (> 11.2.0.3) Script generated in HTML format Unlike AWR report , report not for a period of time but as is snapshot – so run when performance is worst! SQL> spool /tmp/ogg_perf.html SQL> @icrhc_11204.sql -- Output will appear SQL> exit

Integrated Extract/Replicat Health Check Comprehensive point-in-time snapshot of the Database as well as individual components of Integrated Extract and Integrated Replicat. Database Configuration – Key init.ora parameters like STREAMS_POOL_SIZE Wait Event Analysis – Identify root cause of slow extracts/replicats Extract and Replicat Configuration – Parameters used Extract and Replicat Statistics – identify tables with most DML activity

Streams Performance Advisor Package Has been around since Oracle Streams days Also known as SPADV Install the UTL_SPADV package The UTL_SPADV PL/SQL package provides subprograms to collect and analyze statistics for the LogMiner server processes. The statistics help identify any current areas of contention such as CPU or I/O. @$ORACLE_HOME/rdbms/admin/utlspadv.sql

SPADV Gather statistics for a 30-60 minute time period during which you are troubleshooting performance. Also gather statistics during a 30-60 minute time period where performance is good, serving as a baseline comparison. To gather statistics every 60 seconds, run the following SQL*Plus command as the Oracle GoldenGate administrator: SQL> exec UTL_SPADV.START_MONITORING(interval=>60); To stop statistics gathering, run the following command: SQL> exec UTL_SPADV.STOP_MONITORING; To view SPADV statistics: SQL> set serveroutput size 50000 SQL> exec utl_spadv.show_stats;

Interpreting SPADV Output PARALLELISM changed from EE default value of 2 to 1 LMP is Log Miner Preparer Process CPU utilization has gone down from 100% to 70% (140%/2) Extract throughput has gone up from 129851 messages processed to 169361

Performance Tuning Real-life Example Batch job on source loading 100000 customer records took ~ 10 minutes Replication on the target took over 30 minutes SLA < 5 minutes lag Active-Active Bi-Directional Replication 20 GB redo generation per hour 18 million Logical Change Records per hour

Initial Investigation Conclusions Integrated Replicat issues Not constrained by CPU Not constrained by Trail File I/O Disabled FK’s and tested with Co-Ordinated Replicat Performance was good – so that ruled out the network or the Extract side of things Possibly due to Integrated Apply processes Apply Reader Apply co-ordinator Apply Server/Servers

ASH Analytics

ASH Analytics

ASH Analytics

ASH Analytics

Lets look at some SPADV output PATH 4 RUN_ID 78 RUN_TIME 2015-SEP-25 00:13:14 CCA Y |<R> RBSPRD2 3737 1371119 0 1.7% 93.3% 3.3% "" |<Q> "OGGSUSER"."OGGQ$RBSPRD2" 3737 0.01 4494 |<A> OGG$RBSPRD2 3734 484 -1 APR 1.7% 95% 3.3% "" APC 98.3% 0% 1.7% "" APS (6) 198.3% 0% 191.7% "REPL Apply: dependency" |<B> OGG$RBSPRD2 APS 6209 7869 53.3% "REPL Apply: dependency" PATH 4 RUN_ID 79 RUN_TIME 2015-SEP-25 00:14:14 CCA Y |<R> RBSPRD2 4141 1517685 0 1.7% 90% 6.7% "" |<Q> "OGGSUSER"."OGGQ$RBSPRD2" 4141 0.01 5001 |<A> OGG$RBSPRD2 4161 570 -1 APR 1.7% 93.3% 5% "" APC 96.7% 0% 3.3% "" APS (6) 190% 0% 195% "REPL Apply: dependency" |<B> OGG$RBSPRD2 APS 22142 10596 38.3% "REPL Apply: dependency" PATH 4 RUN_ID 80 RUN_TIME 2015-SEP-25 00:15:14 CCA Y |<R> RBSPRD2 4234 1569723 0 3.3% 88.3% 8.3% "" |<Q> "OGGSUSER"."OGGQ$RBSPRD2" 4244 0.01 5001 |<A> OGG$RBSPRD2 4233 549 -1 APR 3.3% 90% 6.7% "" APC 95% 0% 5% "" APS (6) 198.3% 0% 210% "REPL Apply: dependency" |<B> OGG$RBSPRD2 APS 19183 24681 55.% "REPL Apply: dependency“

View the Integrated Health Check Report

We have a problem …       APPLY#  SERVER_ID STATE                TOTAL_MESSAGES_APPLIED ---------- ---------- -------------------- ----------------------          5          9 WAIT DEPENDENCY                      261519          5         10 WAIT DEPENDENCY                      139849          5          1 WAIT DEPENDENCY                      281381          5          2 WAIT DEPENDENCY                      203907          5          3 WAIT DEPENDENCY                      278303          5          4 WAIT DEPENDENCY                      296481          5          5 EXECUTE TRANSACTION                  222312          5          6 WAIT DEPENDENCY                      292009          5          7 INACTIVE                             202222          5          8 INACTIVE                             111042 At any given time we see only one Apply Server executing transactions Rest are all in WAIT DEPENDENCY state When Apply Server currently executing transaction completes, one of the others which is waiting starts executing transactions Relates to the ASH Analytics investigation which showed the main wait event as REPL Apply: Dependency

Get additional information from AWR Report

Do we have a ‘big’ transaction ?

Large transactions and EAGER_SIZE Goldengate considers a transaction to be large if it changes more than 15100 rows in a table (changed in version 12.2. It used to a value of 9500 in earlier versions) An important parameter enforces how Goldengate applies these “large” transactions. It is called EAGER_SIZE Sets a threshold for the size of a transaction (in number of LCRs) after which Oracle GoldenGate starts applying data before the commit record is received. In essence for Oracle GoldenGate it means when I see a large number of LCR’s in a transaction, do I start applying them straight away (that I guess is where the “eager” part of the parameter name is derived from) or do I wait for the entire transaction to be committed and only then start applying changes This “waiting” seems to serialize the apply process and adds to the apply lag on the target in a big way

View the Integrated Health Check Report Note the Transaction ID of transaction being executed by the only apply server in state EXECUTE TRANSACTION AS05: 83.19.44854

Transaction 8.17.18382 is waiting on 95.3.40904 to complete Transactions 29.25.246732, 89.2.45500 and 95.3.40904 are waiting on 109.24.24253 Transaction 109.24.24253 is waiting on 46.13.28116 Transaction 46.13.28116 is waiting on 105.27.24651 Transaction 105.27.24651 is waiting on 83.19.44854 which is the only transaction currently executing

Now that’s better! DBOPTIONS INTEGRATEDPARAMS (eager_size 25000) APPLY# SERVER_ID STATE TOTAL_MESSAGES_APPLIED ---------- ---------- -------------------- ---------------------- 5 9 EXECUTE TRANSACTION 272374 5 10 EXECUTE TRANSACTION 150630 5 1 EXECUTE TRANSACTION 292175 5 2 EXECUTE TRANSACTION 225412 5 3 EXECUTE TRANSACTION 289161 5 4 EXECUTE TRANSACTION 317736 5 5 EXECUTE TRANSACTION 240507 5 6 EXECUTE TRANSACTION 302893 5 7 INACTIVE 202222 5 8 INACTIVE 111042 DBOPTIONS INTEGRATEDPARAMS (eager_size 25000)

To Wrap Up ….. Replication of ‘batch’ type transactions needs special considerations as opposed to replication of ‘oltp’ type transactions A GoldenGate performance problem is not always related to GoldenGate Tune the database, operating system and network first Using the Integrated Extract and Replicats adds an additional log mining server component which presents it’s own separate tuning challenges Consider all the performance tuning tools and options available

Thanks for attending! http://gavinsoorma.com prosolutions@gavinsoorma.com