02 | Data Flow – Extract Data Richard Currey | Senior Technical Trainer–New Horizons United George Squillace | Senior Technical Trainer–New Horizons Great Lakes
Connection Managers Data Sources and Data Destinations Using Change Data Capture to Extract Information Using Change Tracking to Extract Information Module 2 Overview
Topic: Connection Managers
What Is a Connection Manager? Package vs. Project Connection Managers Parameterization of Connection Strings
What Is a Connection Manager? The definition of the technology being used and the location that contains data Defines authentication to be used Specifies the connection string
Package vs. Project Connection Managers Package connection managers are defined within the package –Used when the data source is unique to that package Project connection managers are defined at the project level –Used by any package in the project –Used when multiple packages need to access the same data source
Parameterization of Connection Strings What is a property expression? Properties can be modified through configurations Properties can be modified at execution
DEMO Creating a Connection Manager
Topic: Data Sources and Data Destinations
What Are Data Sources and Data Destinations? Available Technologies Creating Data Sources and Destinations
What Are Data Sources and Data Destinations? Data sources are the starting points for all data flows Data destinations are the ending points for all data flows Data flows can have multiple data sources and multiple data destinations
Available Technologies Data sources and destinations –Database ADO.NET OLE DB CDC source –File Excel Flat file XML Raw file Destinations only –SSAS Data mining model training Dimension processing Partition processing –Rowset Data reader Recordset
Creating Data Sources and Data Destinations All data sources and data destinations are based on a connection manager Source table or query must be defined for data source Destination table must be defined for destination
DEMO Creating Data Sources and Destinations
Topic: Using Change Data Capture to Extract Information
What Is Change Data Capture? Enabling Change Data Capture How to Extract Data
What Is Change Data Capture? Introduced in SQL Server 2008 –Available in the Enterprise Edition Uses transaction log sequence numbers (LSN’s) to track changes to data –fn_cdc_map_time_to_lsn Tracks a full history of data over time –fn_cdc_get_net_changes_
Enabling Change Data Capture Identify tables that will need to use change data capture Enable the source database for change data capture –sys.sp_cdc_enable_db Enable specific tables for change data
How to Extract Data 1.A CDC control task records the starting LSN 2.A data flow extracts all records 3.A CDC control task records the ending LSN Initial Extraction Incremental Extraction CDC Control Mark initial load start Source Staged inserts CDC Control Mark Initial load end CDC State Table CDC state variable CDC Control Get processing range CDC source Staged inserts CDC Control Mark processed range CDC state variable CDC splitter Staged updates Staged deletes Data flow Data Flow CDC 1.CDC control task establishes the range of LSNs to be extracted 2.A CDC source extracts records and CDC metadata 3.Optionally, a CDC splitter splits the data flow into inserts, updates, and deletes 4.A CDC control task records the ending LSN
DEMO Implementing Change Data Capture
Topic: Using Change Tracking to Extract Information
What Is Change Tracking? Enabling Change Tracking Using Change Tracking to Extract Data
What Is Change Tracking? A lightweight technology first introduced in SQL Server 2008 Tracks the fact that a row has changed –Date of change and nature of change are not available Intermediate solution targeted between change data capture and self-developed tracking
Enabling Change Tracking Identify tables to target for change tracking Enable the database for change tracking –Alter database set Change_Tracking = on Enable individual tables for change tracking –Alter table enable change_tracking
Using Change Tracking to Extract Information 1.Retrieve the last version number that was extracted from an extraction log 2.Extract and transfer records that were modified since the last version, retrieving the current version number 3.Replace the logged version number with the current version number Staging DatabaseData Source Extraction Log
DEMO Implementing Change Tracking