1 Oracle Corporation Oracle Change Data Capture Jack Raitto, Development Manager Oracle NEDC NYOUG Long Island SIG October 7, 2004
2 Oracle Corporation Capture your change data for FREE!* * Zero additional license cost over Oracle10g EE Virtually zero source system processing cost
3 Oracle Corporation What is Oracle CDC? Captures change data from operational system(s) as it occurs Part of Extract / Transform / Load (ETL) process for DSS / Data warehouse, potentially other applications Optimizes the extract phase Unleashes SQL power for transformations Provides management framework for change data
4 Oracle Corporation How was it done before (old way)? Method Major Issues Application logging / triggers Maintenance, transaction impacts Timestamp / change key column Application design & performance impact, no before image Table differencingImpractical for large tables, high transport costs, not timely Log sniffingNot supported, does not track DB releases, security issues, rocket science
5 Oracle Corporation CDC Advantages Built in, custom fit, evolves with the database Delivers change data when you need it, where you need it Offers several tradeoffs between timely change delivery vs. source system overhead (sync, async hotlog, async autolog, etc.) Assumes complete change management responsibility
6 Oracle Corporation CDC Advantages (concl.) Captures all change data along with transaction information – see all changes a given transaction made and who made them Transactional consistency for changes across multiple source tables is guaranteed Transparently coordinates sharing of change data across users and applications You don ’ t need rocket scientists on your staff!
7 Oracle Corporation CDC Configurations Sync CDCAsync CDC HotLog Async CDC AutoLog AvailableOracle 9i EE Oracle 10 g SE Oracle 10 g EE Source system cost Transaction delay, system resources System resources Minimal (~2%) Part of txnYESNO LatencyReal timeNear real timeVaries w / topology, checkpoint & log switch interval Systems112
8 Oracle Corporation How CDC Works: Sync CDC Uses internal triggers to capture before and/or after images of new and updated rows Has the same performance implications as capture via user triggers Delivers change data in real-time Uses the same interface as async CDC
9 Oracle Corporation Synchronous CDC HotLog Order Customer Combined Source / Operational BI System Upsert to Load Dimension Tables CDC Change Tables Direct Path Insert to load Fact Tables CDC ETL Process Triggers
10 Oracle Corporation How CDC Works: Async CDC Relational interface to Streams Prepackaged Streams application Asynchronously captures change data from redo/archive logs Presents relational interface to change data stream Can operate on source system (hot log) or staging system (auto log)
11 Oracle Corporation Foundations of Async CDC LogMiner Streams Async CDC Replication Message queuing Warehouse loading Event notification Data protection Change capture Change management Warehouse loading Redo log inspection Debugging Auditing Reversing transactions
12 Oracle Corporation Asynchronous CDC HotLog Order Customer Combined Source / Operational BI System Active Redo Log LogMiner Upsert to Load Dimension Tables CDC Change Tables Direct Path Insert to load Fact Tables Streams CDC ETL Process
13 Oracle Corporation Asynchronous CDC AutoLog Order Customer Source Database Data Warehouse / Staging System Redo Logs LogMiner Upsert to Load Dimension Tables CDC Change Tables Direct Path Insert to load Fact Tables Archived Redo Logs Arch Process Streams CDC ETL Process
14 Oracle Corporation Using CDC: Publish/Subscribe Publisher supplies, subscribers consume change data Model allows sharing of change data across users and applications Coordinates retention / purge of change data Prevents application from accidentally processing change data more than once Guarantees transactional consistency of change data across source tables via change sets
15 Oracle Corporation Using CDC: Publish/Subscribe Publisher Change Data Publication Subscriber 1 Subscription CustNoLastFirst 123SmithFrank 124JonesMary 125SteinLinda Subscriber 2 Subscription CustNoLastFirst 125SteinLinda 126VineAbe 127BlockGreg CustNoLastFirst 123SmithFrank 124JonesMary 125SteinLinda 126VineAbe 127BlockGreg TableColumnType CustCustNonumber CustLastvarchar CustFirstvarchar
16 Oracle Corporation Publisher Concepts Change source Defines the source system to CDC Change set Collection of source tables for which transactionally consistent change data is needed Change table Container to receive change data Is published to subscribers
17 Oracle Corporation Publisher Concepts Source Database: HQStaging Database: DW Change Source: HQ_SRC Change Set: SH_SET Change table: sales_ct PROD_ID CUST_ID PROMO_ID AMOUNT_SOLD Change table: promo_ct PROMO_ID PROMO_SUBCAT PROMO_CAT Source table: sh.sales PROD_ID CUST_ID PROMO_ID AMOUNT_SOLD QUANTITY_SOLD Source table: sh.promotions PROMO_ID PROMO_SUBCAT PROMO_CAT PROMO_COST
18 Oracle Corporation Publish Package DBMS_CDC_PUBLISH CREATE / ALTER / DROP_AUTOLOG_CHANGE_SOURCE CREATE / ALTER / DROP_CHANGE_SET CREATE / ALTER / DROP_CHANGE_TABLE PURGE PURGE_CHANGE_SET PURGE_CHANGE_TABLE DROP_SUBSCRIPTION
19 Oracle Corporation Using Change Data: Subscribers The subscriber creates a subscription from an available publication The subscription provides a moving window (view) to the change data Subscriptions go against a single change set and are therefore transactionally consistent When all subscribers have advanced past old change data, CDC automatically and efficiently purges
20 Oracle Corporation Subscription: sales_promo_list Subscriber Concepts Staging Database: DW Change Set: SH_SET Publication on : sh.sales PROD_ID CUST_ID PROMO_ID AMOUNT_SOLD Publication on: sh.promotions PROMO_ID PROMO_SUBCAT PROMO_CAT Subscriber view: spl_sales Subscriber view: spl_promos
21 Oracle Corporation Subscriber View Subscriber view: spl_sales OPERATION$CSCN$USERNAME$PROD_IDCUST_IDPROMO_ID I587322GRIFFIN UO587482SLOAN UN587482SLOAN I594312BRIGGS I602311GRIFFIN D711413SLOAN I796122BRIGGS I796122BRIGGS Insert Update before Update after Delete
22 Oracle Corporation Subscriber Package DBMS_CDC_SUBSCRIBE CREATE_SUBSCRIPTION SUBSCRIBE ACTIVATE_SUBSCRIPTION EXTEND_WINDOW PURGE_WINDOW DROP_SUBSCRIPTION
23 Oracle Corporation Security Sync publisher must have SELECT access to the source table Async publisher must have EXECUTE_CATALOG_ROLE privilege Publisher uses GRANT and REVOKE on change tables to control subscriber access
24 Oracle Corporation Performance Benchmark* Objectives: Determine impact on transaction time Determine latency Source system: Oracle 10 g R1 Beta, SunFire 4800 SMP 8x900Mhz/16GB w/striped 8 x Sun StorEdge T3 arrays (9X36.4MB each) Customer insurance quote OLTP application run at Oracle, 250 concurrent users / 175 TPS, system “ warmed up ” (steady state) Mixture of Inserts, Updates, Deletes, Singleton Selects, Cursor Fetches, Rollbacks / Commits, savepoints Capture changes on all tables * Your mileage will vary!
25 Oracle Corporation Transaction Performance Transaction elongated by 10% Relative impact varies depending on other overhead
26 Oracle Corporation Transaction Performance Transaction elongated by 8% Can reduce elongation by adding RAC nodes / CPUs
27 Oracle Corporation Transaction Performance Transaction elongation virtually eliminated Change capture processing moved off system
28 Oracle Corporation HotLog Latency Performance About ½ the change data arrived in 1 second Virtually all the change data arrived in 2 seconds
29 Oracle Corporation Summary CDC assumes the burden of change capture for you Change data is guaranteed consistent and complete Change data can be shared across users and applications effortlessly CDC delivers change data where you need it, when you need it, and with minimal overhead
30 Oracle Corporation For More Information Oracle Data Warehousing Guide, 10 g R1, Chapter 16 Oracle PL/SQL Packages and Types Reference, 10 g R1, packages DBMS_CDC_* cle/03-nov/o63tech_bi.html cle/03-nov/o63tech_bi.html db/10g/pdf/twp_dss_ontime_etl_10gr1_0304.p df db/10g/pdf/twp_dss_ontime_etl_10gr1_0304.p df (Oracle9i)
31 Oracle Corporation Questions?