An Overview of GoldenGate Replication Brian Keating December 31, 2009
Introduction The GoldenGate replication utility provides a flexible and powerful framework, for automatic propagation of data changes. This presentation will provide an overview of GoldenGate replication; and it will display some examples of actually using GoldenGate. In order to provide examples of using GoldenGate, a proof of concept (POC) environment, which uses GoldenGate, has been set up.
Overview of GoldenGate Replication GoldenGate replication consists of three basic utilities: extract, replicat, and data pump. In addition, all three of those utilities access trail files, which are flat files that contain formatted GoldenGate data. An extract group mines a databases redo logs; and writes information about the changes it finds into trail files. Extract is functionally similar to the Streams capture process. A replicat group reads information from trail files; and then writes those changes into database tables. Replicat is functionally similar to the Streams apply process. A data pump group copies information between trail files. Data pump is functionally similar to the Streams propagate process.
Overview of the POC Environment In the GoldenGate proof of concept environment, two separate types of replication have been implemented – audit-style replication, and archive-style replication. Audit-style replication maintains an audit trail of DML operations that have been executed on source tables. This is similar to the replication provided by Oracle Change Data Capture (CDC). Archive-style replication causes inserts and updates, and some DDL statements, to be replicated from source tables into target tables. Deletes, drops, and truncates on source tables are not replicated to target tables, however. As the name implies, this type of replication is appropriate for maintaining archive databases.
Diagram of Audit-Style Replication Here is a basic diagram of the various steps involved in archive-style replication: Source Extract Group Source Replicat Group 1 Source Audit Table Source Redo Log Source Trail File 243
Description of Audit-Style Replication The following is a text description of the diagram on the last slide. With audit- style replication, whenever any DML operation is committed on a source table, the following items will occur: 1.The source extract group notices the DML operation, in the source databases redo log. 2.The source extract group writes information about that DML operation, into the source trail file. 3.The source replicat group reads that information from the source trail file. 4.The source replicat group inserts that information into the source audit table. Note: with audit-style replication, DDL operations do not get replicated at all. Also note: in this particular POC environment, audit-style replication is implemented with all objects on the source system – i.e., everything is self- contained, on the same source host. It is also possible to implement audit- style replication with objects spread out among multiple systems.
Diagram of Archive-Style Replication Here is a basic diagram of the various steps involved in archive-style replication: Source Redo Log 1 Source Extract Group Source Trail File Target Replicat Group Target Archive Table 6 Target Trail File 2 Source Data Pump 345
Description of Archive-Style Replication The following is a text description of the diagram on the last slide. With archive- style replication, whenever any DML or DDL operation is committed on a source table, the following items will occur: 1.The source extract group notices the change (the DML or DDL operation) in the source databases redo log. 2.If the DML operation is NOT a delete, or if the DDL operation is NOT a drop or truncate, then the source extract group writes that change into the source trail file. 3.The source data pump group reads that change from the source trail file. 4.The source data pump group writes that change into the target trail file. 5.The target replicat group reads that change from the target trail file. 6.The target replicat group executes that DML or DDL operation on the target archive table. Note: as mentioned above, delete DML statements are not replicated. In addition, any DDL statements that contain the strings drop or truncate are not replicated.
Examples of Audit-Style Replication The next four slides contain some examples of audit-style replication. That is, those slides will provide some examples of DML and DDL operations on a test source table – and then they will show you what data would end up getting replicated into the corresponding audit table, through audit-style replication. In these examples, the name of the source table is ggs.ts_test, and the name of the audit table is ggs.ts_test_ct. Here are the descriptions of those tables: SQL> desc ggs.ts_test Name Null? Type ID_VALUE NOT NULL NUMBER STRING_VALUE VARCHAR2(20) DATE_VALUE DATE SQL> desc ggs.ts_test_ct Name Null? Type OPERATION$ VARCHAR2(20) COMMIT_TIMESTAMP$ TIMESTAMP(6) ID_VALUE NUMBER STRING_VALUE VARCHAR2(20) DATE_VALUE DATE
Audit-Style Example 1: Insert Insert statements: insert into ggs.ts_test values (1, test 1, sysdate); insert into ggs.ts_test values (2, test 2, sysdate); commit; Results: SQL> select * from ggs.ts_test; ID_VALUE STRING_VALUE DATE_VALUE test 1 12/29/ :25:49 2 test 2 12/29/ :25:57 SQL> select * from ggs.ts_test_ct; OPERATION$ COMMIT_TIMESTAMP$ ID_VALUE STRING_VALUE DATE_VALUE INSERT 12/29/ :26:02 1 test 1 12/29/ :25:49 INSERT 12/29/ :26:02 2 test 2 12/29/ :25:57
Audit-Style Example 2: Update Update statement: update ggs.ts_test set string_value = update where id_value = 1; commit; Results: SQL> select * from ggs.ts_test; ID_VALUE STRING_VALUE DATE_VALUE update 12/29/ :25:49 2 test 2 12/29/ :25:57 SQL> select * from ggs.ts_test_ct; OPERATION$ COMMIT_TIMESTAMP$ ID_VALUE STRING_VALUE DATE_VALUE INSERT 12/29/ :26:02 1 test 1 12/29/ :25:49 INSERT 12/29/ :26:02 2 test 2 12/29/ :25:57 UPDATE OLD 12/29/ :29:07 1 test 1 12/29/ :25:49 UPDATE NEW 12/29/ :29:07 1 update 12/29/ :25:49
Audit-Style Example 3: Delete Delete statement: delete from ggs.ts_test where id_value = 1; commit; Results: SQL> select * from ggs.ts_test; ID_VALUE STRING_VALUE DATE_VALUE test 2 12/29/ :25:57 SQL> select * from ggs.ts_test_ct; OPERATION$ COMMIT_TIMESTAMP$ ID_VALUE STRING_VALUE DATE_VALUE INSERT 12/29/ :26:02 1 test 1 12/29/ :25:49 INSERT 12/29/ :26:02 2 test 2 12/29/ :25:57 UPDATE OLD 12/29/ :29:07 1 test 1 12/29/ :25:49 UPDATE NEW 12/29/ :29:07 1 update 12/29/ :25:49 DELETE 12/29/ :31:42 1 update 12/29/ :25:49
Audit-Style Example 4: Truncate Truncate statement: truncate table ggs.ts_test; Results: SQL> select * from ggs.ts_test; no rows selected SQL> select * from ggs.ts_test_ct; OPERATION$ COMMIT_TIMESTAMP$ ID_VALUE STRING_VALUE DATE_VALUE INSERT 12/29/ :26:02 1 test 1 12/29/ :25:49 INSERT 12/29/ :26:02 2 test 2 12/29/ :25:57 UPDATE OLD 12/29/ :29:07 1 test 1 12/29/ :25:49 UPDATE NEW 12/29/ :29:07 1 update 12/29/ :25:49 DELETE 12/29/ :31:42 1 update 12/29/ :25:49
Examples of Archive-Style Replication The next four slides contain some examples of archive-style replication. That is, those slides will provide some examples of DML and DDL operations on a test source table – and then they will show you what data would end up getting replicated into the corresponding archive table, through archive-style replication. In these examples, the name of the source table, and the name of the archive table, is ggs.bp_test. (Those two tables reside in separate databases.) So, in my queries I will refer to those tables by the aliases source and archive. Here are the descriptions of those tables: SQL> desc ggs.bp_test Name Null? Type ID_VALUE NOT NULL NUMBER STRING_VALUE VARCHAR2(20) DATE_VALUE DATE SQL> desc ggs.bp_test Name Null? Type ID_VALUE NOT NULL NUMBER STRING_VALUE VARCHAR2(20) DATE_VALUE DATE
Archive-Style Example 1: Insert Insert statements: insert into ggs.bp_test values (1, test 1, sysdate); insert into ggs.bp_test values (2, test 2, sysdate); commit; Results: SQL> select * from ggs.bp_test source; ID_VALUE STRING_VALUE DATE_VALUE test 1 12/29/ :20:27 2 test 2 12/29/ :20:31 SQL> select * from ggs.bp_test archive; ID_VALUE STRING_VALUE DATE_VALUE test 1 12/29/ :20:27 2 test 2 12/29/ :20:31
Archive-Style Example 2: Update Update statement: update ggs.bp_test set string_value = update where id_value = 1; commit; Results: SQL> select * from ggs.bp_test source; ID_VALUE STRING_VALUE DATE_VALUE update 12/29/ :20:27 2 test 2 12/29/ :20:31 SQL> select * from ggs.bp_test archive; ID_VALUE STRING_VALUE DATE_VALUE update 12/29/ :20:27 2 test 2 12/29/ :20:31
Archive-Style Example 3: Delete Delete statement: delete from ggs.bp_test where id_value = 1; commit; Results: SQL> select * from ggs.bp_test source; ID_VALUE STRING_VALUE DATE_VALUE test 2 12/29/ :20:31 SQL> select * from ggs.bp_test archive; ID_VALUE STRING_VALUE DATE_VALUE update 12/29/ :20:27 2 test 2 12/29/ :20:31
Archive-Style Example 4: Truncate Truncate statement: truncate table ggs.bp_test; Results: SQL> select * from ggs.bp_test source; no rows selected SQL> select * from ggs.bp_test archive; ID_VALUE STRING_VALUE DATE_VALUE update 12/29/ :20:27 2 test 2 12/29/ :20:31
Rowid / Unused Column Information GoldenGate is able to replicate the rowid datatype without any problems. In other words, tables that have columns with the rowid datatype can be replicated with GoldenGate. (Note that Oracle Streams is not able to replicate the rowid datatype.) GoldenGate is able to replicate tables that have unused columns, as long as certain conditions are met. First, the DBOPTIONS ALLOWUNUSEDCOLUMN parameter must be set in the GoldenGate extract group. Also, if any columns get set to unused in a source table, then those columns must also manually get set to unused in the corresponding target table. (In other words, GoldenGate cannot replicate the alter table set unused command.)
Performance / Scalability Information GoldenGate has a relatively limited ability to do parallel processing; i.e. to use multiple CPUs as part of its replication. Basically, using parallel processing in GoldenGate involves creating multiple replicat groups – and then manually splitting up the overall processing between those groups. For example, you could create two replicat groups – and then have one group process table a while the other group processes table b. Of course, this will only work if there are no referential integrity constraints (or even logical constraints) between tables a and b. GoldenGate also has a very useful ability to do batch-style processing. Basically, the replicat utility has the ability to apply changes with array processing – rather than using row-at-a-time processing. This can dramatically improve the performance of replicat operations. From my preliminary tests, batch-style processing can reduce the overall replication latency by up to 67% - i.e., the latency can be reduced down to one third of its original, non-batch latency.