Download presentation
Presentation is loading. Please wait.
1
GoldenGate Performance Tuning
Tips & Techniques Gavin Soorma
2
Agenda What is Lag and what can contribute to lag in a GoldenGate replication environment Compare Classic Extracts and Replicats with Integrated Extracts and Replicats New performance tuning challenges introduced by the Log Mining Server component What tools do we have available in OGG 12.2 to monitor performance Using those tools to examine and investigate a real-life performance problem and how the problem was resolved
3
Oracle GoldenGate Architecture
4
Where is the problem? x x x x x x x
5
Is the problem because of a Goldengate component?
Extract in reading the archive log and writing the data to a trail (or remote host) Datapump reading the extract trail and writing to a remote host Network Collector (server.exe) on the target receiving network data and writing it to a local trail Replicat reading the local trail and writing to the database Logmining Server issues – both source as well as target
6
Measuring OGG Performance
Typically a GoldenGate performance problem is centered around Lag LAG is the elapsed time between when a transaction is committed and written to a storage medium such as an archive log or redo log on the source and the time when Replicat writes the same transaction to the target database
7
Classic Extract
8
Integrated Extract Extract Logmining Server
•Reader: Reads logfile and splits into regions •Preparer: Scans regions of logfiles and prefilters based on extract parameters •Builder: Merges prepared records in SCN order •Capture: Formats Logical Change Records(LCRs) and passes to GoldenGate Extract Extract •Requests LCRs from logmining server •Performs Mapping and Transformations •Writes Trail File
9
Classic Replicat
10
Integrated Replicat Replicat •Reads the trail file
•Constructs logical change records (LCRs) •Transmits LCRs to Oracle Database via the Lightweight Streaming API Inbound Server (Database Apply Process) •Receiver: Reads LCRs •Preparer: Computes the dependencies between the transactions (primary key, unique indexes, foreign key) , grouping transactions and sorting in dependency order. •Coordinator: Coordinates transactions, maintains the order between applier processes. •Applier: Performs changes for assigned transactions, including conflict detection and error handling.
11
Do we still use Classic Extracts and Replicats?
Any reason why we are not using BOTH Integrated Extracts Integrated Replicats Do we have source/target Oracle databases on versions less than or ? Consider Downstream Capture if Integrated Extracts not allowed on the source because it is ‘invasive’ Do we use RAC, ASM, TDE? Do we want RMAN integration with Oracle GoldenGate?
12
A case for Integrated Replicat
Integrated Replicat offers automatic parallelism which automatically increases or decreases the number of apply processes based on the current workload and database performance Co-ordinated replicat provides multiple threads, but dependent objects had to be handled by the same replicat thread – otherwise Replicat will abend Integrated Replicat ensures referential integrity and DDL/DML operations are automatically applied in the correct order Management and tuning of Replicat performance is simplified since you do not have to manually configure multiple Replicat processes to distribute the tables between them. Tests have shown that a single Integrated Replicat can out-perform multiple Classic Replicats as well as multi-thread Co-ordinated Replicat
13
Tune the database before tuning GoldenGate!
Is the target database already having I/O issues? Are the redo logs properly configured – size and location? Data replication is I/O intensive, so fast disks are important, particularly for the online redo logs. Redo logs are constantly being written to by the database as well as being read by GoldenGate Extract processes Do we have any significant ‘Log File Sync’ wait events? Also consider the effect of adding supplemental logging which will increase the redo logging
14
Key Points Identify and isolate tables with significantly high DML activity Separate Extract and Replicat process groups for such tables Dedicated Extract and Replicat process groups for tables with LOB columns Possibly dedicated process groups for tables with long running transactions Run the Oracle GoldenGate database Schema Profile check script to identify tables with missing PKs/UKs/Deferred Constraints/NOLOGGING/Compression Start with a single Replicat process (as well as Extract process) Add replicat processes until latency is acceptable (Classic)
15
Key Points In its classic mode, Replicat process can be a source of performance bottlenecks because it is a single-threaded process that applies operations one at a time by using regular SQL Consider BATCHSQL to increase performance of Replicat particularly in OLTP type environments characterized by smaller row changes in terms of data BATCHSQL causes Replicat to organize similar SQL statements into arrays which leads to faster processing as opposed to serial apply of SQL statements If tables can be separated based on PK/FK relationships consider Co-Ordinated replicats with multiple threads For Integrated Replicats check the parameters PARALLELISM, MAX_PARALLELISM, COMMIT_SERIALIZATION, EAGER_SIZE
16
Tune the Network for OGG
The network is an important component in GoldenGate replication The two RMTHOSTparameters, TCPBUFSIZE and TCPFLUSHBYTES are very useful for increasing the buffer sizes and network packets sent by Data Pump over the network from the source to the target system. This is especially beneficial for high latency networks Use Data Pump compression if network bandwidth is constrained and when CPU headroom is available
17
Tuning the Network - Before
GGSCI (ti-p1-bscs-db-01) 1> send pbsprd2 gettcpstats Sending GETTCPSTATS request to EXTRACT PBSPRD2 ... RMTTRAIL ./dirdat/rt000113, RBA Buffer Size Flush Size SND Size Streaming Yes Inbound Msgs 2710 Bytes 54259, 3 bytes/second Outbound Msgs Bytes , bytes/second Recvs 5420 Sends Avg bytes per recv 10, per msg 20 Avg bytes per send , per msg Recv Wait Time , per msg , per recv Send Wait Time , per msg , per send
18
Tuning the Network - After
GGSCI (pl-p1-bscs-db-01) 12> send pbsprd1 gettcpstats Sending GETTCPSTATS request to EXTRACT PBSPRD1 ... RMTTRAIL ./dirdat/rt000000, RBA Buffer Size Flush Size SND Size Streaming Yes Inbound Msgs Bytes , bytes/second Outbound Msgs Bytes , bytes/second Recvs Sends Avg bytes per recv , per msg Avg bytes per send , per msg Recv Wait Time , per msg , per recv Send Wait Time , per msg , per send Compare it with the earlier figures Recv Wait Time , per msg , per recv Send Wait Time , per msg , per send
19
Allocate memory for the Log Mining Server
Set the STREAMS_POOL_SIZE initialization parameter for the database Set the MAX_SGA_SIZE parameter for both Integrated Extracts and Integrated Replicats Controls amount of memory used by logmining server – default is 1 GB STREAMS_POOL_SIZE= (MAX_SGA_SIZE * PARALLELISM) + 25% head room For example, using the default values for the MAX_SGA_SIZE and PARALLELISM parameters: ( 1GB * 2 ) * 1.25 = 2.50GB STREAMS_POOL_SIZE = 2560M
20
Allocate memory for the Log Mining Server
Log mining Server is running on both source as well as target STREAMS_POOL_SIZE needs to be properly sized on IE as well as IR end SQL> SELECT state FROM GV$GG_APPLY_RECEIVER; STATE Waiting for memory SQL> show parameter streams NAME TYPE VALUE streams_pool_size big integer 2G SQL> alter system set streams_pool_size =24G sid='bsprd1' scope=both; System altered. SQL> SQL> SELECT state FROM GV$GG_APPLY_RECEIVER; Enqueueing LCRS
21
Typically a GoldenGate performance problem is centered around Lag
LAG is the elapsed time between when a transaction is committed and written to a storage medium such as an archive log or redo log on the source and the time when Replicat writes the same transaction to the target database Automatic Heartbeat Tables GGSCI LAG, REPORT RATE
22
AWR report now have section for GoldenGate
23
Use ASH and ASH Analytics to diagnose an OGG performance problem
25
Automatic Heartbeat Table
NEW OGG 12.2 Heartbeat Tables were recommended but involved a fair bit of work to setup and configure Single command – ADD HEARTBEATTABLE Record End-to-End Replication Lag in Tables Creates database level tables, views and jobs GG_LAG view – INCOMING_LAG, OUTGOING_LAG for bi-directional replication GG_LAG_HISTORY – retains historical lag information until purged
26
Automatic Heartbeat Table
GG_LAG GG_LAG_HISTORY How much is the lag? GG_HEARTBEAT GG_HEARTBEAT_HISTORY Which process is responsible for the lag?
27
OGG 12.2
29
Fine grained performance monitoring window which can be accessed through the RESTful Web Services
30
Integrated Extract/Replicat Health Check
GoldenGate Integrated Capture and Integrated Replicat Healthcheck Script (Doc ID ) Available for both Oracle 12c as well as 11g (> ) Script generated in HTML format Unlike AWR report , report not for a period of time but as is snapshot – so run when performance is worst! SQL> spool /tmp/ogg_perf.html -- Output will appear SQL> exit
31
Integrated Extract/Replicat Health Check
Comprehensive point-in-time snapshot of the Database as well as individual components of Integrated Extract and Integrated Replicat. Database Configuration – Key init.ora parameters like STREAMS_POOL_SIZE Wait Event Analysis – Identify root cause of slow extracts/replicats Extract and Replicat Configuration – Parameters used Extract and Replicat Statistics – identify tables with most DML activity
32
Streams Performance Advisor Package
Has been around since Oracle Streams days Also known as SPADV Install the UTL_SPADV package The UTL_SPADV PL/SQL package provides subprograms to collect and analyze statistics for the LogMiner server processes. The statistics help identify any current areas of contention such as CPU or I/O. @$ORACLE_HOME/rdbms/admin/utlspadv.sql
33
SPADV Gather statistics for a minute time period during which you are troubleshooting performance. Also gather statistics during a minute time period where performance is good, serving as a baseline comparison. To gather statistics every 60 seconds, run the following SQL*Plus command as the Oracle GoldenGate administrator: SQL> exec UTL_SPADV.START_MONITORING(interval=>60); To stop statistics gathering, run the following command: SQL> exec UTL_SPADV.STOP_MONITORING; To view SPADV statistics: SQL> set serveroutput size SQL> exec utl_spadv.show_stats;
34
Interpreting SPADV Output
PARALLELISM changed from EE default value of 2 to 1 LMP is Log Miner Preparer Process CPU utilization has gone down from 100% to 70% (140%/2) Extract throughput has gone up from messages processed to
35
Performance Tuning Real-life Example
Batch job on source loading customer records took ~ 10 minutes Replication on the target took over 30 minutes SLA < 5 minutes lag Active-Active Bi-Directional Replication 20 GB redo generation per hour 18 million Logical Change Records per hour
36
Initial Investigation Conclusions
Integrated Replicat issues Not constrained by CPU Not constrained by Trail File I/O Disabled FK’s and tested with Co-Ordinated Replicat Performance was good – so that ruled out the network or the Extract side of things Possibly due to Integrated Apply processes Apply Reader Apply co-ordinator Apply Server/Servers
37
ASH Analytics
38
ASH Analytics
39
ASH Analytics
40
ASH Analytics
41
Lets look at some SPADV output
PATH 4 RUN_ID 78 RUN_TIME 2015-SEP-25 00:13:14 CCA Y |<R> RBSPRD % 93.3% 3.3% "" |<Q> "OGGSUSER"."OGGQ$RBSPRD2" |<A> OGG$RBSPRD APR 1.7% 95% 3.3% "" APC 98.3% 0% 1.7% "" APS (6) 198.3% 0% 191.7% "REPL Apply: dependency" |<B> OGG$RBSPRD2 APS % "REPL Apply: dependency" PATH 4 RUN_ID 79 RUN_TIME 2015-SEP-25 00:14:14 CCA Y |<R> RBSPRD % 90% 6.7% "" |<Q> "OGGSUSER"."OGGQ$RBSPRD2" |<A> OGG$RBSPRD APR 1.7% 93.3% 5% "" APC 96.7% 0% 3.3% "" APS (6) 190% 0% 195% "REPL Apply: dependency" |<B> OGG$RBSPRD2 APS % "REPL Apply: dependency" PATH 4 RUN_ID 80 RUN_TIME 2015-SEP-25 00:15:14 CCA Y |<R> RBSPRD % 88.3% 8.3% "" |<Q> "OGGSUSER"."OGGQ$RBSPRD2" |<A> OGG$RBSPRD APR 3.3% 90% 6.7% "" APC 95% 0% 5% "" APS (6) 198.3% 0% 210% "REPL Apply: dependency" |<B> OGG$RBSPRD2 APS % "REPL Apply: dependency“
42
View the Integrated Health Check Report
43
We have a problem … APPLY# SERVER_ID STATE TOTAL_MESSAGES_APPLIED 5 9 WAIT DEPENDENCY 5 10 WAIT DEPENDENCY 5 1 WAIT DEPENDENCY 5 2 WAIT DEPENDENCY 5 3 WAIT DEPENDENCY 5 4 WAIT DEPENDENCY 5 5 EXECUTE TRANSACTION 5 6 WAIT DEPENDENCY 5 7 INACTIVE 5 8 INACTIVE At any given time we see only one Apply Server executing transactions Rest are all in WAIT DEPENDENCY state When Apply Server currently executing transaction completes, one of the others which is waiting starts executing transactions Relates to the ASH Analytics investigation which showed the main wait event as REPL Apply: Dependency
44
Get additional information from AWR Report
45
Do we have a ‘big’ transaction ?
46
Large transactions and EAGER_SIZE
Goldengate considers a transaction to be large if it changes more than rows in a table (changed in version It used to a value of 9500 in earlier versions) An important parameter enforces how Goldengate applies these “large” transactions. It is called EAGER_SIZE Sets a threshold for the size of a transaction (in number of LCRs) after which Oracle GoldenGate starts applying data before the commit record is received. In essence for Oracle GoldenGate it means when I see a large number of LCR’s in a transaction, do I start applying them straight away (that I guess is where the “eager” part of the parameter name is derived from) or do I wait for the entire transaction to be committed and only then start applying changes This “waiting” seems to serialize the apply process and adds to the apply lag on the target in a big way
47
View the Integrated Health Check Report
Note the Transaction ID of transaction being executed by the only apply server in state EXECUTE TRANSACTION AS05:
48
Transaction 8.17.18382 is waiting on 95.3.40904 to complete
Transactions , and are waiting on Transaction is waiting on Transaction is waiting on Transaction is waiting on which is the only transaction currently executing
49
Now that’s better! DBOPTIONS INTEGRATEDPARAMS (eager_size 25000)
APPLY# SERVER_ID STATE TOTAL_MESSAGES_APPLIED EXECUTE TRANSACTION EXECUTE TRANSACTION EXECUTE TRANSACTION EXECUTE TRANSACTION EXECUTE TRANSACTION EXECUTE TRANSACTION EXECUTE TRANSACTION EXECUTE TRANSACTION INACTIVE INACTIVE DBOPTIONS INTEGRATEDPARAMS (eager_size 25000)
50
To Wrap Up ….. Replication of ‘batch’ type transactions needs special considerations as opposed to replication of ‘oltp’ type transactions A GoldenGate performance problem is not always related to GoldenGate Tune the database, operating system and network first Using the Integrated Extract and Replicats adds an additional log mining server component which presents it’s own separate tuning challenges Consider all the performance tuning tools and options available
51
Thanks for attending! http://gavinsoorma.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.