Helping Your Data Warehouse Succeed: 10 Mistakes to Avoid in Data Integration Rafael Salas w:www.Rafael-Salas.com

Slides:



Advertisements
Similar presentations
Data Manager Business Intelligence Solutions. Data Mart and Data Warehouse Data Warehouse Architecture Dimensional Data Structure Extract, transform and.
Advertisements

SSIS Field Notes Darren Green Konesans Ltd. SSIS Field Notes After years of careful observation and recording of the Species SSIS, Genus ETL, in both.
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
1 The Pain and Gain of Test Automation – the early days Andy Redwood Portman Building Society
Components and Architecture CS 543 – Data Warehousing.
Accelerated Access to BW Al Weedman Idea Integration.
Data Warehouse success depends on metadata
Defensive ETL Tim Mitchell Artis Consulting The World’s Largest Community of SQL Server Professionals.
Top 10 SSIS Best Practices Tim Mitchell Artis Consulting The World’s Largest Community of SQL Server Professionals.
© 2013 IBM Corporation Information Management Discovering the Value of IBM InfoSphere Information Analyzer IBM Software Group 1Discovering the Value of.
Centralizing Distributed Reference Data A World Where All Data Has a Place to Live Shane Risk BlueGranite, Inc.
University of Nevada, Reno Business Performance Management 1 Class Agenda: 2/26 and 2/28  Answer questions about the SQL Server BI labs.  Finish discussion.
Created by the Community for the Community Building a RFID solution in BTS 09.
SQL Server 2005 Integration Services Mike Taulty Developer & Platform Group Microsoft Ltd
ETL Design and Development Michael A. Fudge, Jr.
ETL By Dr. Gabriel.
Introducing ETL: Components & Architecture Michael A. Fudge, Jr.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Intro Informatica Productivity Pack Save Time and Money while Increasing the Quality of Your PowerCenter Deployment Louis Hausle.
 First two parts of class ◦ Part 1: What is business intelligence and why should organizations consider incorporating more technology-related intelligence.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
Data Warehouse Chapter 11. Multiple Files Problem Added complexity of multiple source files Start simple Multiple Source files Extracted data Logic to.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Activity Running Time DurationIntro0 2 min Setup scenario 2 2 min SQL BI components & concepts 4 5 min Data input (Let’s go shopping) 9 7 min Whiteboard.
Instrumenting, Monitoring and Auditing of SSIS ETL Solutions SQL Bits Manchester Davide Mauri
Data: Migrating, Distributing and Audit Tracking Michelle Ayers, Advisory Solution Consultant
Web/App Performance How to keep you out of the News
SEATTLE BI MEETUP BI & BIG FISH April 2 nd, 2014 Emre Motan.
1 SharePoint Real World Deployment Encouraging business and user adoption for SharePoint Steve Smith – MVP SharePoint Server Owner Combined Knowledge
More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from.
DTS Conversion to SSIS Conversion Best Practices Mike Davis
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 1 Cornell University’s.
Learningcomputer.com SQL Server 2008 – Profiling and Monitoring Tools.
ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through.
Software Engineering Saeed Akhtar The University of Lahore Lecture 6 Originally shared for: mashhoood.webs.com.
GENERIC ETL DESIGN VARADARAJAN VASU SENIOR PROJECT MGR/ARCHITECT POLARIS SOFTWARE LAB
DAT 360: DTS in SQL Server 2000 Best Practices Euan Garden Group Manager, SQL Server Microsoft Corporation.
LS Retail BI Information/requirements/deployment steps.
Transportation: Loading Warehouse Data Chapter 12.
7 Strategies for Extracting, Transforming, and Loading.
Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation.
Do It Strategically with Microsoft Business Intelligence! Bojan Ciric Strategic Consultant
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Project Workspace (Coach) Overview. Copyright © , Compass Partners, Inc. Unauthorized Duplication Prohibited. 2 About SigmaFlow  Founded in.
Base SAS ® vs. SAS ® Data Integration Studio Greg Nelson and Danny Grasse.
6/13/2015 Visit the Sponsor tables to enter their end of day raffles. Turn in your completed Event Evaluation form at the end of the day in the Registration.
Explore engage elevate Data Migration Without Tears Mike Feingold Empoint Ltd Tuesday 10th November 2015.
Easy ETL with Andrzej Kukuła – Marcin Szeliga –
Copyright 2015 Varigence, Inc. Unit and Integration Testing in SSIS A New Approach Scott @varigence.
C Copyright © 2007, Oracle. All rights reserved. Introduction to Data Warehousing Fundamentals.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
Practical MSBI(SSIS, SSAS,SSRS) online training. Contact Us: Call: Visit:
Building Your ETL Framework with Biml Meagan Longoria March 19, 2016.
SQL Server DML Change Capture An overview of several useful SQL Server data change capture technologies Matt Smith Software Architect, Enterprise Data.
Supervisor : Prof . Abbdolahzadeh
Making the Case for Business Intelligence
SSIS Templates, Configurations & Variables
ETL Design - Stage Philip Noakes May 9, 2015.
Designing and Implementing an ETL Framework
SQL Server Integration Services
DAT381 Team Development with SQL Server 2005
Baisc Of Software Testing
Patterns for designing a supportable Data Warehouse
Data Warehousing Concepts
Case Study 1 By : Shweta Agarwal Nikhil Walecha Amit Goyal
David Gilmore & Richard Blevins Senior Consultants April 17th, 2012
Presentation transcript:

Helping Your Data Warehouse Succeed: 10 Mistakes to Avoid in Data Integration Rafael Salas w:

About Rafael DW BI – 12 years SQL Server MVP Solution Architect - Quaero a CSG Systems Solution Charlotte, NC Lots mistakes along the way!

Mistakes are the portals of discovery. James Joyce

Today’s Plan 10 mistakes to avoid What, why, how to prevent them Share real life examples No magic formulas

Ignoring Data Realities …Or finding them too late. 1

The Problem Relying on common knowledge: –The data is ‘good’ –I know this data well We don’t have time Cycle: Code  Load  Explode! Research-Recode-Retest = Rework 1

The Fix Requirements Data Profiling Compare Clue: Business want ‘good’ quality data: Accurate Timeliness Relevant Complete Understood Trusted 1

Benefits Early awareness about data quality issues Better ETL development estimates Uncover new business rules Better understanding of business requirements 1

How? 3 rd Party tools Hand crafted SQL queries SSIS: Data Profiling Task –Decent profiling –Get up and running quickly –SQL Server data sources only –Output is XML –Results can be loaded in a table – XSLT required 1

Exception Handling …Actually the lack thereof 2

The Problem Data’s ‘Buts and Ifs’ nobody mentioned Unreliable data sources Missed homework: data profiling –Data type mismatches and overflow –Referential integrity Cycle: Run  Fail  Patch 2

The Fix 2 Consider exceptions at different levels –Data/Database –Network –Operative System Design a system-wide strategy –Design Patterns  Templates Log and notify!

How? Data/Database: –In SSIS: Use dataflow error outputs to redirect offending rows Network: –Pre-process: test connectivity –In SSIS: Event handlers, precedence constraints with conditional logic O/S –Pre-process: Validate space available, File available, etc. –In SSIS: Event handlers, precedence constraints with conditional logic 2

Inadequate Logging …What, when, how? 3

The Problem No/Little Logging Too Much Logging Meaningless Logging Error troubleshooting Execution monitoring Performance tracking Auditing 3

The Fix 3 Add logging capabilities – Start with key events, add more as needed –Start – End date & Times –Row Counts –On Error –On Warning Create reports on top of logging tables Don’t forget to clean/prune logs Logging I/O are expensive

The Fix 3 SSIS logging SSIS event handler Be aware of the concept of containers in SSIS - events ‘bubble-up’ Have to be included on each package –Use package templates

No Recovery & Restart …Game Over! 4

The Problem Re-starting after failure is not automated It requires manual clean-up of partial results Prone to human error May require to start process from the beginning Risk of ‘skipping’ data Risk of duplicating data 4

The Fix 4 Create restart-ability points Consider piggybacking on logging Use ternary logic at each recovery point: –Skip –Run –Clean-up and re-run Staging source data is handy Custom

Staging Area Unauthorized Use …could cause injuries. 5

The Problem Failing to understand staging area is a ‘construction zone’ Reports and applications accessing staging data Using staging tables as on-line data archive 5

The Fix 5 Easy: Keep staging area off-limit Make all required data in data presentation layer Keep staging data available only for required time Use appropriate data aging and archiving policies and processes

How not to write a report? A Classic Example 5

Performance: Losing the focus 6 …

Very Fast, but… 6

Vanity Testing …good for feeling awesome. 7

No Portability …deployment in progress! 8

Forgetting the Owner’s Manual …aka the beloved documentation. 9

Missing the Bigger Picture …the architecture. 10

The problem Jumping to coding without a blueprint Break it down into group of tasks List all tasks and functionality you can’t live without Place the tasks in the appropriate group 10

The Fix Create an attack plan Embrace an architecture Divide and conquer! List all tasks and functionality you require Place the tasks in the appropriate group 10

Extract Changed data capture Data Staging Transform Data cleansing Other Data Transformations Deduplication Exception Handling Load Data LoadLoad Aggregates OLAP Cube Processing ETL Management Job SchedulerRecovery Restart Activity Monitor ABC SupportBackup Data Error tracking Other Post Load Actions AlertingSecurityCompliance An example 10

Helping Your Data Warehouse Succeed: 10 Mistakes to Avoid in Data Integration Rafael Salas w: