Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin ganeshlohani@Hotmail.com.

Similar presentations


Presentation on theme: "Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin ganeshlohani@Hotmail.com."— Presentation transcript:

1 Implementing ETL solution for Incremental Data Load in Microsoft SQL Server
Ganesh Lohani SR. Data Analyst Lockheed Martin

2 Business Case/Requirement
Northwind operates 10 call centers in the United states. It runs the business 24 hours 7 days a week. Management is looking to build a Data Warehouse for a single source of reporting data for all call center data The report/Dashboard should be refreshed every 15 minutes. It means we need to load the new data into Data Warehouse every 15 minutes Suppose you are working for the company as a SQL/BI/ETL Developer Your manager ask you the Question: What are the different ways to do incremental data load using SQL and SSIS in SQL Server environment? How do you respond to your Manager?

3 Agenda/Learning Outcome
You will be able to answer the following question after attending this session What are the methods to do incremental data load in on premise SQL Server environment? The following methods will be discussed in this session: Left Join Merge Statement Look Up Transformation Merge Join Transformation Slowly Changing Dimension (SDC) Change Data Capture (CDC) Demo on some of these methods

4 Operation Data Store (ODS)
Let’s Get Started ETL (Extract Transform and Load) is an essential task of a Business Intelligence Developer, especially when someone is working in Data Warehouse environment. A simple diagram of a Business Intelligence Environment Report/Dashboard Source1 Operation Data Store (ODS) Source2 Source3 Data Warehouse/Data Mart

5 Operation Data Store (ODS)
What is ETL ETL is the process of moving data from point A to point B Some kind of Transformations between two points The terms Source, Transformation, and Destination are used in ETL language SSIS is a tool used for ETL process Source: SQL table Excel file Flat File Transformation: Look Up Derived Column Conditional Split Destination: SQL Table Flat File ( Text and CSV) Excel Source1 SQL Table Operation Data Store (ODS) Source2 Excel file Source3 Text File

6 Two Common Types of Data Load Pattern
Full Load A simple ETL process Deletes destination data and Loads Source data into destination Typically used in Initial data load in DW environment and small data set It takes more time if the source data set is large History is lost Incremental Load Relatively a complex ETL process but requires less time to process data It Processes only new or updated records Typically used in DW environment for larger data sets History is maintained Most of the time, date Time stamp is used to load the incremental data Point A(Source): Call Center Data Point B(Destination): Data Warehouse

7 Incremental Data Load: Method 1
SQL Left Join: Left join returns all the values from the left table, and matched values from the right table Use Execute SQL Task in SSIS to implement this method Source and destination tables must be in the same server

8 Incremental Data Load: Method 2
Merge Join Transformation: The Merge Join transformation lets us join data from more than one data source The Merge Join transformation is similar to performing a join in a TSQL Use this method if we are joining data from different data sources in the SSIS pipeline Data must be shorted in order to use Merge Transformation

9 Incremental Data Load: Method 3
SQL Merge Statement: Combination of three SQL statements: INSERT, UPDATE and DELETE. INSERT - when there is data in source table and not in target table  UPDATE - the data in source table is matched with target table but any entry other than the primary key is not matched DELETE - there is data in target table and not in source table

10 Incremental Data Load: Method 4
Look Up Transformation: The Lookup transformation performs lookups by joining data in input columns with columns in a reference dataset. The reference dataset can be a cache file, an existing table or view, a new table, or the result of an SQL query.  If there is no matching entry in the reference dataset, no join occurs. If there are multiple matches in the reference table, the Lookup transformation returns only the first match returned by the lookup query.  Useful in Dimension Data Load and small dataset

11 Incremental Data Load: Method 5
Slowly changing Dimension (SCD) Transformation: Some data attributes change Over time Slowly Changing Dimensions (SCD) are dimensions that have data that slowly changes. For example, you may have a Dimension in your database that tracks the sales records of your company's salespeople. The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a name. The Type 2 method tracks historical data by creating multiple records in the dimensional tables with separate keys. The Type 3 method tracks changes using separate columns.

12 Incremental Data Load: Method 6
Change Data Capture: Another way to do incremental data load if the system supports Change Data Capture technology Available in SQL Server 2008 and newer version Need to Enable Change Data Capture on a Database and tables Need to have SQL Server Agent Started and running for CDC to work correctly Change data capture records insert, update, and delete activity in separate table.

13 Demo 1. Full Load 2. Incremental Load SQL Left Outer Join
SSIS Merge Join Transformation SQL Merge Statement SSIS Look Up Transformation SSIS Slowly Changing Dimension Change Data Capture

14 Conclusion What are Incremental Data Load methods available using SQL and SSIS? Left Join Merge Statement Look Up Transformation Merge Join Transformation Slowly Changing Dimension (SDC) Change Data Capture Question: What is the best way to do the incremental data load in SQL Server? The answer is: It depends: Data Source Complexity of Business rules and data transformations Project Time Line Hardware/Software environment ( SQL Server, SSIS Server) Company Policy and Procedure Developer skillsets

15 Q & A Thank you for attending the session !!! Questions:
What feedback you have for me? Questions: (Ganesh Lohani)


Download ppt "Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin ganeshlohani@Hotmail.com."

Similar presentations


Ads by Google