Deep Dive into ETL Implementation with SQL Server Integration Services

Slides:



Advertisements
Similar presentations
Jose Chinchilla, MCTS, MCITP. Nuevo Ambiente de Desarrollo SQL Server 2012 Habilidades T-SQL a Super Poderes SSIS Demo BIDS Fuentes de Datos (Data Sources)
Advertisements

SSIS Dataflow Performance Tuning 1 st October 2010 Jamie Thomson.
Supervisor : Prof . Abbdolahzadeh
Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
Introduction to ETL Using Microsoft Tools By Dr. Gabriel.
Data Manager Business Intelligence Solutions. Data Mart and Data Warehouse Data Warehouse Architecture Dimensional Data Structure Extract, transform and.
James Serra – Data Warehouse/BI/MDM Architect
Data Manager Best Practices Business Intelligence Solutions.
SQL Server 2005 Integration Services Mike Taulty Developer & Platform Group Microsoft Ltd
ETL Design and Development Michael A. Fudge, Jr.
ETL By Dr. Gabriel.
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
Module 18 Monitoring SQL Server 2008 R2. Module Overview Monitoring Activity Capturing and Managing Performance Data Analyzing Collected Performance Data.
Implementing a Data Warehouse with SQL 2012
SSIS 2012: A Deep Dive into the SSIS Catalog
Performance Tuning SSIS. HR Departments are no fun. Don’t mention the stalking incident with Clay Aiken What happened in Vegas My prom date with a puppet.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
Creating Data Marts from COBOL Files (ISAM to RDBMS)
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
2 Overview of SSIS performance Troubleshooting methods Performance tips.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
Activity Running Time DurationIntro0 2 min Setup scenario 2 2 min SQL BI components & concepts 4 5 min Data input (Let’s go shopping) 9 7 min Whiteboard.
Session 4: The HANA Curriculum and Demos Dr. Bjarne Berg Associate professor Computer Science Lenoir-Rhyne University.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
DTS Conversion to SSIS Conversion Best Practices Mike Davis
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 1 Cornell University’s.
Data Management Console Synonym Editor
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
DAT 360: DTS in SQL Server 2000 Best Practices Euan Garden Group Manager, SQL Server Microsoft Corporation.
Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
DAT 332 SQL Server 2000 Data Transformation Services (DTS) Best Practices Euan Garden Product Unit Manager SQL Server Development Microsoft Corporation.
6 Copyright © 2009, Oracle. All rights reserved. Using the Data Transformation Operators.
7 Strategies for Extracting, Transforming, and Loading.
02 | Data Flow – Extract Data Richard Currey | Senior Technical Trainer–New Horizons United George Squillace | Senior Technical Trainer–New Horizons Great.
© 2012 Saturn Infotech. All Rights Reserved. Oracle Hyperion Data Relationship Management Presented by: Prasad Bhavsar Saturn Infotech, Inc.
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Please note that the session topic has changed
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
Know your data source well. Who am I? Nik – Shahriar Nikkhah Microsoft MVP 2010 – SQL Server MCITP SQL 2008 MCTS SQL 2008 and s:
Copyright © 2006, Oracle. All rights reserved. Czinkóczki László oktató Using the Oracle Warehouse Builder.
Copyright 2015 Varigence, Inc. Unit and Integration Testing in SSIS A New Approach Scott @varigence.
Pulling Data into the Model. Agenda Overview BI Development Studio Integration Services Solutions Integration Services Packages DTS to SSIS.
Practical MSBI(SSIS, SSAS,SSRS) online training. Contact Us: Call: Visit:
Microsoft BI Online Training AcuteSoft: India: , Land Line: +91 (0) USA: , UK.
Pass Implementing a Data Warehouse with Microsoft SQL Server 2012 exam in just 24 HOURS! 100% REAL EXAM QUESTIONS ANSWERS Implementing a.
Carlos Bossy Quanta Intelligence SQL Server MCTS, MCITP BI CBIP, Data Mining Real-time Data Warehouse and Reporting Solutions.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Supervisor : Prof . Abbdolahzadeh
ETL Design - Stage Philip Noakes May 9, 2015.
Data Warehouse ETL By Garrett EDmondson Thanks to our Gold Sponsors:
Presented By: Jessica M. Moss
What Is The SSIS Catalog and Why Do I Care?
Antonio Abalos Castillo
SQL Server Integration Services
IBM DATASTAGE online Training at GoLogica
Presented by: Warren Sifre
Swagatika Sarangi (Jazz), MDM Expert
Populating a Data Warehouse
BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -
Patterns and Best Practices in SSIS
David Gilmore & Richard Blevins Senior Consultants April 17th, 2012
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Deep Dive into ETL Implementation with SQL Server Integration Services Anton Rozenson Anton.Rozenson@GnetGroup.com

About this training event First event with core focus on Microsoft BI technology Help the community by sharing our learning and experience based on real world scenarios Network with peers and learn from you! These training events will be held every 2 months

Agenda Importance and complexity of ETL process ETL Architecture Changed Data Capture challenge and options Data Flow design and performance considerations SSIS project deployment Package execution options Performance monitoring in SSIS catalog

Business Intelligence GNet Group Offerings SharePoint Business Intelligence Frameworks Cloud

Moving data “Data Warehouse is a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making” – Ralph Kimball, Joe Casertam. (2004) “The Data Warehouse ETL Toolkit Estimated 80% of work in building Data warehouse solution is related to ETL design and implementation Data Warehouse is only as good as the data that it contains

Common ETL Architecture Sources Staging EDW Flat File Data warehouse RDBMS Changed data Reference data Artifacts and Error tables Consumption ready De-normalized Clean data Cloud

ELT Architecture Sources EDW Flat File Data warehouse RDBMS Changed data Reference data Artifacts and Error tables De-normalized Retains traceable business key Cloud

CDC Challenge What Changed? When changed ? Unable to modify source systems to include CDC attribution? Reliable timestamps? Need to reliably determine modified data for incremental loading New data Updated data Deleted data

CDC options Source system can provide time stamps Reliability of process and completeness of information? Did all yesterday’s transactions committed? Does Source system include work flows that can cause late arrival of data records? How to determine deleted records? Comparing EDW data to Source system to determine differences Very expensive query affecting Source system Does Source System incorporates archival process? Operational Data Store (ODS) can help Keeping change history Clearly define state of data records Stores metadata

CDC in SQL 2014 In SQL Server, change data capture offers an effective solution to the challenge of efficiently performing incremental loads from source tables to data marts and data warehouses. Change Data Capture process stores transaction information from SQL log into system tables in CDC Schema. Data from CDC tables can be extracted by using table valued functions generated when CDC is enabled on the table. Key concepts: LSN – a binary timestamp representation used to restrict changed set All Changes – changed set includes all DML transactions Net Changes – changed set includes last DML transactions based on unique index.

Demo

Data Flow design challenges How should Data Flow perform? Consider following factors when designing an ETL Solution Source System structure and ability to execute logic such as sorting and filtering. Parallel processing should be used with caution. Fastest way to load a table is Fast Load with table lock. This prevents loading data in parallel. Partition switching can be an answer. Requirements for data availability in EDW Service Level Agreement (SLA)

Data Flow blocking tasks Data flow transformations in SSIS use memory/buffers in different ways.  The way a transformation uses memory can dramatically impact the performance of your package.  Transformation buffer usage can be classified into 3 categories: Non Blocking, Partially Blocking, and (Full) Blocking. Non Blocking transformations: Audit, Character Map, Conditional Split, Copy Column, Data Conversion, Derived Column, Import Column, Lookup, Multicast, Percentage sampling, Row count, Row sampling, Script component Partially Blocking transformations: Data mining, Merge, Merge Join, Pivot/Unpivot, Term Extraction, Term Lookup, Union All Blocking transformations: Aggregate, Fuzzy Grouping, Fuzzy Lookup, Sort

Work around blocking transformations in Data Flow It is not always possible to avoid using blocking or partially blocking transformations, but in some cases it is possible. For example Merge transformation requires data set to be sorted. While Sort transformation is expensive, in some cases sorting can be handled in the source query. Make sure to set IsSorted property of Data Source output to true and assign proper SortKeyPosition to output columns. Another example is Aggregate Transformation. Use Script transformation to perform aggregation of data and return result to a variable.

Demo

Project Deployment model Project deployment model allows following features: Parameters can be used in expressions or tasks. Parameters can reference an environment variable. Environment variable values are resolved at the time of package execution. An environment is a container of variables that can be referenced by Integration Services projects. Environments allow you to organize the values that you assign to a package. For example, you might have environments named "Dev", "test", and "Production". SSISDB catalog allows you to use folders to organize your projects and environments. Catalog stored procedures and views can be used to manage Integration Services objects in the catalog.

Package Execution An execution is an instance of a package execution. Package execution can be scheduled via SQL Agent job. SQL Agent provides an easy to use interface for mapping of Project parameters to environment variables. Packages can also be executed via Execute package tasks from another SSIS package. This allows creation of robust workflow incorporated into the Master Package.

Package Execution SSIS catalog allows package execution to be controlled programmatically from within T-SQL. A Number of stored procedures are provided to manage Package Execution. catalog.create_execution creates an instance of package execution and assigns Execution_ID. catalog.set_execution_parameter_value assigns parameters to the instance of package execution. Execution parameters control Logging Level, Dump settings, Synchronized execution option as well as ability to assign values to Project or Package scoped parameters. catalog.start_execution starts an instance of execution.

Execution Monitoring Catalog provides a set of standard reports allowing administrators easy access to execution performance and statistics. For details about executions, validations, messages that are logged during operations, and contextual information related to errors, query these views. executions list of Executions includes environmental data execution_data_statistics data flow performance information execution_parameter_values list of run time parameters event_messages messages that were logged during executions

Demo

Questions

Additional Resources What's New in SQL Server 2014 http://msdn.microsoft.com/en-us/library/bb500435.aspx SSIS Catalog http://msdn.microsoft.com/en-us/library/hh479588.aspx Deployment of Projects and Packages http://msdn.microsoft.com/en-us/library/hh213290(v=sql.120).aspx Change Data Capture http://technet.microsoft.com/en-us/library/bb522489(v=sql.105).aspx Change Data Capture (SSIS) http://msdn.microsoft.com/en-us/library/bb895315.aspx CDC Flow Components http://msdn.microsoft.com/en-us/library/hh231087(v=sql.120).aspx Enable and Disable Change Data Capture (SQL Server) http://msdn.microsoft.com/en-us/library/cc627369.aspx SQL Server OLE DB Deprecation and Integration Services http://blogs.msdn.com/b/mattm/archive/2012/01/09/sql-server-ole-db-deprecation-and-integration-services.aspx oData Data source setup http://www.microsoft.com/en-us/download/details.aspx?id=42280 oData samples http://services.odata.org/

Contact Us www.gnetgroup.com Neelesh Raheja VP, Consulting Services Neelesh.Raheja@gnetgroup.com @PracticalBI Anton Rozenson BI Solution Architect Anton.Rozenson@gnetgroup.com blog.gnetgroup.com facebook.com/gnetgroup youtube.com/user/GNetGroup linkedin.com/company/143712 twitter.com/GnetGroup