Introduction to OpenEdge Change Data Capture

Slides:



Advertisements
Similar presentations
By: Jose Chinchilla July 31, Jose Chinchilla MCITP: SQL Server 2008, Database Administrator MCTS: SQL Server 2005/2008, Business Intelligence DBA.
Advertisements

An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
SQL Server Accelerator for Business Intelligence (SSABI)
Management Information Systems, Sixth Edition
Chapter 3 Database Management
Database Management: Getting Data Together Chapter 14.
Chapter 4: Organizing and Manipulating the Data in Databases
Chapter 4-1. Chapter 4-2 Database Management Systems Overview  Not a database  Separate software system Functions  Enables users to utilize database.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
Multiplicity – Progress Data Replication Methodologies.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Converting COBOL Data to SQL Data: GDT-ETL Part 1.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
1 Oracle Database 11g – Flashback Data Archive. 2 Data History and Retention Data retention and change control requirements are growing Regulatory oversight.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Chapter 4: Organizing and Manipulating the Data in Databases
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Databases and Data Warehouses: Supporting the Analytics-Driven.
Session 4: The HANA Curriculum and Demos Dr. Bjarne Berg Associate professor Computer Science Lenoir-Rhyne University.
Chapter 3 and Module C DATABASES AND DATA WAREHOUSES Building Business Intelligence.
Horizontal Table Partitioning Dealing with a manageable slice of the pie. Norwegian PUG Event Richard Banville Fellow, OpenEdge Development April 8, 2014.
Understanding SQL Server 2008 Change Data Capture Bret Stateham Training Manager Vortex Learning Solutions blogs.netconnex.com.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Oracle Data Integrator Transformations: Adding More Complexity
A State Perspective Mentoring Conference New Orleans, LA 2/28/2005 RCRAInfo Network Exchange.
SQL Jan 20,2014. DBMS Stores data as records, tables etc. Accepts data and stores that data for later use Uses query languages for searching, sorting,
9 Copyright © 2009, Oracle. All rights reserved. Deploying and Reporting on ETL Jobs.
02 | Data Flow – Extract Data Richard Currey | Senior Technical Trainer–New Horizons United George Squillace | Senior Technical Trainer–New Horizons Great.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Two-Tier DW Architecture. Three-Tier DW Architecture.
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
Physical Layer of a Repository. March 6, 2009 Agenda – What is a Repository? –What is meant by Physical Layer? –Data Source, Connection Pool, Tables and.
uses of DB systems DB environment DB structure Codd’s rules current common RDBMs implementations.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
2 Copyright © 2008, Oracle. All rights reserved. Building the Physical Layer of a Repository.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
10 Copyright © 2007, Oracle. All rights reserved. Managing Undo Data.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Platform and Data Migration With Little Downtime
Plan for Populating a DW
With Temporal Tables and More
Intro to MIS – MGS351 Databases and Data Warehouses
Katowice,
Antonio Abalos Castillo
Overview of MDM Site Hub
We Have Found Nirvana with Online Dump and Load (224)
CONTENT MANAGEMENT SYSTEM CSIR-NISCAIR, New Delhi
Data warehouse and OLAP
Physical Database Design and Performance
Efficiently Searching Schema in SQL Server
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
IBM DATASTAGE online Training at GoLogica
Databases and Data Warehouses Chapter 3
Translation of ER-diagram into Relational Schema
tRelational/DPS Overview
Teaching slides Chapter 8.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
VIEWS / TSS Overview.
Data Warehouse.
Data Warehousing Concepts
Chapter 11 Managing Databases with SQL Server 2000
Chapter 3 Database Management
Change Tracking Live Data Warehouse
Updating Databases With Open SQL
Updating Databases With Open SQL
David Gilmore & Richard Blevins Senior Consultants April 17th, 2012
Presentation transcript:

Introduction to OpenEdge Change Data Capture June 5, 2017 Rakhi Grover, Rama Murthy, Garry Hall Progress

Agenda Introduction - Change Data Capture Overview - OpenEdge Change Data Capture OpenEdge Change Data Capture Policies Tracking changes and capturing change data Configuring Capture detail Configuring CDC using OpenEdge Management ABL APIs Processing Captured Changes (ETL) Q&A Progress

Introduction - Change Data Capture Why do we need to capture changed data? Data Warehousing Consolidated repository of data from various sources Used for making strategic business decisions Bulk load of all data to data warehouse is time consuming and data is irrelevant to business needs A tailored approach is needed that allows businesses to capture data that is changed and is of interest Data sources Data Files load Data Warehouse OLTP Source tables load OLAP analysis Reporting Data Mining Progress

Introduction - Change Data Capture Acquisition of modified data from OLTP sources from tables of interest Data changes may be stored in logs or relational tables Optional ETL (Extract, Transform, Load) tool can transform captured data further Capture Process Trigger based Transaction log Delta file Data source Changed Data Source Tables OLTP Capture process Data Warehouse ETL Process OLAP analysis Reporting Data Mining Progress

OpenEdge Change Data Capture Trigger-based capture Capture policies Change Tracking and Change Data Capture ETL languages – SQL and ABL Requires feature enablement Change Tables Source Tables OLTP CDC Database triggers Record operation Progress

OpenEdge Change Data Capture - Overview Source Table (Customer) User Data Enable CDC on source table Record operations: create, update, delete _Cdc-Table-Policy, _Cdc-Field-Policy CDC Policies Cache CDC Internal Database Triggers Change Tracking Table (_Cdc-Change-Tracking) Change Table – (Customer) CDC_Customer(captured data) Staging Area - ETL process Data Warehouse Progress

OpenEdge Change Data Capture – Schema Source tables – user tables that need to be enabled for change data capture Change Data Capture Policy tables – store CDC policies Change data capture Table Policy table - _Cdc-Table-Policy Change data capture Field Policy table - _Cdc-Field-Policy Change Tracking Table - _Cdc-Change-Tracking Change Table - One change table for each source table Store change tracking information and captured data. Progress

Change Data Capture – Policies Policies define what information will be tracked and captured against a source table CDC Policies: CDC table policies CDC field policies Policies are created and modified through OpenEdge Management and ABL API Table policies are stored in policy tables CDC table policy table _Cdc-Table-Policy CDC field policy table _Cdc-Field-Policy Progress

Change Data Capture Policy table (_Cdc-Table-Policy) and its indexes _Policy-Id _Policy-Name _Policy-Desc _Policy-State _Policy-Instance _Source-File-Recid _Area-Ianum _Area_Index-Ianum _Change-Tablename _ObjectId _Identifying-Fields _Level _Change-Table-Owner _Encrypt-Policy _Last-Modified First-User-Field _Misc _Policy-Id(Unique,Primary) _Policy-Id _Policy-Name(Unique) _Policy-Name _Policy-Source-recid _Source-File-Recid _Policy-Instance Progress

Change Data Capture – Field Policies Change Data Capture Field Policy table (_cdc-Field-Policy) and its indexes _cdc-Field-Policy _Policy-Id _Field-Position _Identifying-Fields _Field-Recid _Misc _Policy-Id (Unique,Primary) _Policy-Id _Field-Position _Identifying-Field (Unique) _Policy-Id _Identifying-Field _Field-Recid (unique) _Policy-Id Progress

Table Policy – Field policy relationship Customer (Source Table) 2 3 4 5 6 7 8 9 CustNum Name Address Address2 City State Country Phone Field position # Table Policies - _cdc-Table-Policy _Policy-Id _Policy-State _Level _Change-Tablename _Identifying-Fields uMzutjNmIrUlFObr2B 3 CDC_customer 1 Field Policies - _cdc-Field-Policy _Policy-Id _Field-Position uMzutjNmIrUlFObr2B 2 4 6 8 Progress

Change Data Capture Tracking Table - _cdc-Change-Tracking Change Tracking Table and its indexes _cdc-Change-Tracking _Policy-Id _Tran-Id _Time-Stamp _Change-Sequence _Operation _Change-Field-Map _Recid _Source-Table -Number Partition-Id _Tenant-Id _Version _Misc _Sequence-Id (Unique,Primary) _Source-Table-Number _Change-Sequence _Time-Stamp-Sequence (Non-Unique) _Source-Table-Number _Time-Stamp _Change-Sequence _Part-Rec-Id (Unique) _Source-Table-Number _Partition-Id _Recid _Change-Sequence   Progress

Change Data Capture Tracking Table _Cdc-Change-Tracking _Policy-Id _Tran-Id _Time-Stamp _Operation _Change-Sequence _Change-Field-Map _Source-Table-Number uMzutjNmIrUlFObr2B 345 05/20/2017 1 346 2 367 4 3 1010 jhgNUluMzTYbrmIr6U 511 05/25/2017 624 06/06/2017 5 629 08/26/2017 111 Create Delete Customer Update Order Progress

Change Data capture – change tables Change table is created when a source table is enabled for CDC (define a policy) One Change table for each source table that stores subset of source table data No change table for CDC policy level 0 One record is inserted for each Create, Delete operation. Two records for update operation if policy level is maximum (3) Change Tables metadata columns and user columns Change Table – CDC_customer _Tran_id _Time-Stamp _Change-Sequence _Continuation-Position _ArrayIndex _Fragment Capture Col1 Capture Col2 Col3 Meta-data columns Captured User-defined columns Progress

Change Data Capture – Change Table Customer 2 3 4 5 6 7 8 9 CustNum Name Address Address2 City State Country Phone Field position # Change Table – CDC_customer _Tran_id _Time-Stamp _Change-Sequence _Continuation-Position _ArrayIndex CustNum Country Address City Table: _cdc-Field-Policy _Policy-Id _Field-Position _Identifying-Field uMzutjNmIr 2 1 4 6 8 _Identifying-Index CustNum Country _Change-Sequence _Change-Sequence-Id (Unique, Primary) _Change-Sequence _Operation _Continuation-Position _Time-Sequence _Time-Stamp _Change-Sequence Progress

How do I configure “Capture detail” ? Customer CustNum Name Address Address2 City State Country 2 Brooks 1 Oak dr Hollis NH USA Update Customer Set Address=“2 Hickory”, City=“Acton” where CustNum=2; _Cdc-Table-Policy._Level : 1 2 3 Change Tracking Table: _Cdc-Change-Tracking _Policy-Id _Operation _Change-Sequence _Change-Field-Map . . . uMzutjNmIr 4 1 . . . . 0011 Change Table: CDC_Customer _Tran-Id _Operation _Change-Sequence CustNum Country Address City 345 4 1 2 Hickory Acton 3 1 Oak dr. Hollis Progress

Configuring Policies

Configuring CDC Policies in OpenEdge OpenEdge Management/OpenEdge Explorer Change Data Capture ABL API Progress

Enabling CDC

Enabling CDC for a Database 4/17/2018 Enabling CDC for a Database CDC feature can be enabled using Data Administration Console in OEM/OEE CDC feature can be enabled when Database is online/offline Before CDC is enabled After CDC is enabled Progress

Policy Configuration and Governance

Configuring Level 0 Policy Progress

Configuring Level >0 Policies Change Table Properties

CDC Field Policies At least one field policy is required 4/17/2018 CDC Field Policies At least one field policy is required Unlimited field policies are allowed Change data will be captured only for selected fields Progress

Setting Identifying Fields 4/17/2018 Setting Identifying Fields Up to 15 Identifying fields are allowed Provide Field order Select YES to enable identifying Field on Field policy Progress

Viewing List of CDC Policies Progress

Activate/Deactivate CDC Policies Policies can be activated/deactivated Individually In bulk Progress

Generate CDC Policy Program Progress

Generating Policy Program Generates .p file with the supplied CDC details It can be done before or after submit Progress

Dump and Load CDC Policies Progress

4/17/2018 Dumping CDC Policies List of existing policies can be dumped to a .cd file Dump status can be monitored Progress

Loading CDC Policies Indicates an error while loading 4/17/2018 Loading CDC Policies CDC Policies can be loaded from a .cd file Acceptable Error Percentage Indicates an error while loading Indicates success Progress

ABL API for CDC

Dump & Load CDC Policies ABL API for CDC CDC Table Policies Create List Edit Delete CDC Field Policies Dump & Load CDC Policies Dump Load CDC ABL API Reference Guide: https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dvpin/change-data-capture-abl-references.html Progress

Processing Captured Changes

Processing Captured Changes ETL (Extract, Transform, Load) CDC facilitates the Extraction part of ETL Many tools and frameworks for doing ETL/BI OpenEdge Analytics 360 An alternative to trigger-based replication Pro2 can use CDC For more information on Analytics 360 or Pro2 OpenEdge Analytics 360 Integration - Monday 9:45 am, Curriers A Holistic View of OpenEdge Pro2 – Tuesday 8:30 am, Curriers Or Contact: Mike Marriage (mmarriag@progress.com) Brian Bowman (bowman@progress.com) Progress

When To Process How often to extract data When to purge data Determined by business need When to purge data Busy tables will generate a lot of CDC data -> big change tables Can purge data during extraction, or mark extracted data for later purging (_User-Misc) Monitor your db growth Progress

How To Process Extraction can be done by ABL or SQL Only SQL clients can access SQL change tables Driven by _Cdc-Change-Tracking Extraction should access only committed data Provide range in search criteria E.g. WHERE _Time-Stamp < LastMidnight Prevent dirty reads SHARE lock from ABL Transaction isolation level stronger than READ UNCOMMITTED from SQL Progress

Extracting Change Data through ABL

Extracting Change Data Through ABL Write a query against the _Cdc-Change-Tracking table and change table for your given source table OpenEdge.DataAdmin.Util.CDCTrackingHelper ABL helper class to provide convenience functionality Reduces need for some boilerplate code Converts _Change-FieldMap to an extent of changed field names Maps the _Operation to a CDCOperation enum Purges all change records associated with a _Cdc-Change-Tracking record Uses the current record in a _Cdc-Change-Tracking buffer No requirement to use this

IF (ohelper:FieldChanged("OrderTotal")) THEN ABL Extraction Sample /* Get the change table record. */ FIND FIRST CDC_Order WHERE CDC_Order._Change-Sequence = _Cdc-Change-Tracking._Change-Sequence AND CDC_Order._Operation = _Cdc-Change-Tracking._Operation NO-ERROR. /* perform whatever logic needs to occur for ETL */ DEF VAR ohelper AS CDCTrackingHelper NO-UNDO. /* Get a CDCTrackingHelper for the Order table, using the default buffer * of the _Cdc-Change-Tracking table */ ohelper = NEW CDCTrackingHelper("Order", BUFFER _Cdc-Change-Tracking:Handle). /* Iterate through the change tracking records */ FOR EACH _Cdc-Change-Tracking WHERE _Cdc-Change-Tracking._Source-Table-Number = ohelper:SourceTableNumber: IF (ohelper:IsUpdate()) THEN /* handle updates only */ DO: IF (ohelper:FieldChanged("OrderTotal")) THEN /* Get the change table record. */ FIND FIRST CDC_Order WHERE CDC_Order._Change-Sequence = _Cdc-Change-Tracking._Change-Sequence AND CDC_Order._Operation = _Cdc-Change-Tracking._Operation NO-ERROR. /* perform whatever logic needs to occur for the ETL */ END. _Cdc-Change-Tracking._User-Misc = "PROCESSED". /* mark as processed */ DEF VAR ohelper AS CDCTrackingHelper NO-UNDO. /* Get a CDCTrackingHelper for the Order table, using the default buffer of the _Cdc-Change-Tracking table */ ohelper = NEW CDCTrackingHelper( "Order", BUFFER _Cdc-Change-Tracking:Handle). /* mark as processed */ _Cdc-Change-Tracking._User-Misc = "PROCESSED". /* Iterate through the change tracking records */ FOR EACH _Cdc-Change-Tracking WHERE _Cdc-Change-Tracking._Source-Table-Number = ohelper:SourceTableNumber: /* handle updates only */ IF (ohelper:IsUpdate()) THEN IF (ohelper:FieldChanged("OrderTotal")) THEN Progress

/* purge records in the _Cdc-Change-Tracking and change table */ ABL Purge Sample /* purge records in the _Cdc-Change-Tracking and change table */ ohelper:DeleteChangeTrackingRecord(). /* Alternatively: FOR EACH CDC_Order WHERE CDC_Order._Change-Sequence = _Cdc-Change-Tracking._Change-Sequence: DELETE CDC_Order. END. DELETE _Cdc-Change-Tracking. */ DEF VAR ohelper AS CDCTrackingHelper NO-UNDO. /* Get a CDCTrackingHelper for the Order table, using the default buffer * of the _Cdc-Change-Tracking table */ ohelper = NEW CDCTrackingHelper("Order", BUFFER _Cdc-Change-Tracking:Handle). /* Iterate through the processed records */ FOR EACH _Cdc-Change-Tracking WHERE _Cdc-Change-Tracking._Source-Table-Number = ohelper:SourceTableNumber: IF _Cdc-Change-Tracking._User-Misc = "PROCESSED" THEN DO: /* purge records in the _Cdc-Change-Tracking and change table */ ohelper:DeleteChangeTrackingRecord(). /* Alternatively: FOR EACH CDC_Order WHERE CDC_Order._Change-Sequence = _Cdc-Change-Tracking._Change-Sequence: DELETE CDC_Order. END. DELETE _Cdc-Change-Tracking. */ /* Iterate through the processed records */ FOR EACH _Cdc-Change-Tracking WHERE _Cdc-Change-Tracking._Source-Table-Number = ohelper:SourceTableNumber: IF _Cdc-Change-Tracking._User-Misc = "PROCESSED" THEN Progress

Extracting Change Data through SQL

Extracting Change Data Through SQL New scalar functions CDC_get_changed_columns - list of changed columns from _Change-FieldMap CDC_is_column_changed - whether a column changed Progress

SQL Extraction Sample Query select ct.”_Change-Sequence”, c.*, CDC_is_column_changed(pub.CDC_Order, OrderTotal, _Change-FieldMap) from pub."_Cdc-Change-Tracking" ct inner join pub.CDC_Order c on ct."_Change-Sequence" = c."_Change-Sequence" where ct."_ Source-Table-Number" = <Order table number> order by ct."_Change-Sequence"; Progress