Download presentation
Presentation is loading. Please wait.
Published byKristin Simpson Modified over 9 years ago
1
10-1 9 Oracle Data Integrator Changed Data Capture
2
10-2 Objectives Understand why CDC can be needed Understand the CDC infrastructure in ODI What types of CDC implementations are possible with ODI How to setup CDC After completing this lesson, you will:
3
10-3 Introduction The purpose of Changed Data Capture is to allow applications to process changed data only Loads will only process changes since the last load The volume of data to be processed is dramatically reduced CDC is extremely useful for near real time implementations, synchronization, Master Data Management
4
10-4 CDC Techniques in General Multiple techniques are available for CDC Trigger based – ODI will create and maintain triggers to keep track of the changes Logs based – for some technologies, ODI can retrieve changes from the database logs. (Oracle, AS/400) Timestamp based – If the data is time stamped, processes written with ODI can filter the data comparing the time stamp value with the last load time. This approach is limited as it cannot process deletes. The data model must have been designed properly. Sequence number – if the records are numbered in sequence, ODI can filter the data based on the last value loaded. This approach is limited as it cannot process updates and deletes. The data model must have been designed properly.
5
10-5 CDC in ODI CDC in ODI is implemented through a family of KMs: the Journalization KMs These KMs are chosen and set in the model Once the journals are in place, the developer can choose from the interface whether he will use the full data set or only the changed data
6
10-6 CDC Infrastructure in ODI CDC in ODI relies on a Journal table This table is created by the KM and loaded by specific steps implemented by the KM This table has a very simple structure: Primary key of the table being checked for changes Timestamp to keep the change date A flag to allow for a logical “lock” of the records A series of views is created to join this table with the actual data When other KMs will need to select data, they will know to use the views instead of the tables
7
10-7 CDC Strategies and Infrastructure Triggers will directly update the journal table with the changes. Log based CDC will load the journal table when the changed data are loaded to the target system: Update the journal table Use the views to extract from the data tables Proceed as usual
8
10-8 Simple CDC Limitations One issue with CDC is that as changed data gets processed, more changes occur in the source environment As such, data transferred to the target environment my be missing references Example: process changes for orders and order lines Load all the new orders in the target (11,000 to 25,000) While we load these, 2 new orders come in: 25,001, 25,002. The last two orders are not processed as part of this load, they will be processed with the next load. Then load the order lines: by default, all order lines are loaded – including order lines for orders 25,001 and 25,002 The order lines for 25,001 and 25,002 are rejected by the target database (invalid foreign keys)
9
10-9 Consistent CDC The mechanisms put in place by Consistent CDC will solve the issues faced with simple CDC The difference here will be to lock children records before processing the parent records As new parent records and children records come in, both parent and children records are ignored
10
10-10 Consistent CDC: Infrastructure Processing Consistent Set CDC consists in the next 4 phases: Extend Window: Compute the consistent parent/child sets and assign a sequence number to these sets. Lock Subscriber: for the application processing the changes, record the boundaries of records to be processed (between sequence number xxx and sequence number yyy). Note that changes keep happening in the source environment, other subscribers can be extending the window while we are processing the data. After processing the changes, unlock the subscriber (i.e. record the value of the last sequence number processed). Purge the journal: remove from the journal all the records that have been processed by all subscribers. Note: all these steps can either be implemented in the Knowledge Modules or done separately, as part of the Workflow management.
11
10-11 Using CDC Set a JKM in your model For all the following steps, right-click on a table to process just that table, or right-click on the model to process all tables of the model: Add the table to the CDC infrastructure: Right-click on a table and select Changed Data Capture / Add to CDC For consistent CDC, arrange the datastores in the appropriate order (parent/child relationship): in the model definition, select the Journalized tables tab and click the Reorganize button Add the subscriber (The default subscriber is SUNOPSIS) Right- click on a table and select Changed Data Capture / Add subscribers Start the journals: Right-click on a table and select Changed Data Capture / Start Journal
12
10-12 View Data / Changed Data Data and changed data can be viewed from the model and from the interfaces In the model, right click on the table name and select Data to view the data or Changed Data Capture / Journal Data to view the changes From the interface, click on the caption of the journalized source table and select or unselect Journalized data only to view only the changes or all the data.
13
10-13 Using Journalized Tables Keep in mind that only one journalized table can be used per interface If you were to use two journalized tables, there is a very highly likelihood that the data sets will be dis- joined. No data would be loaded as a result.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.