INTRODUCTION TO DATA QUALITY SERVICES Presentation by Tim Mitchell (Artis Consulting)
Today’s Agenda Overview of DQS Structure Knowledge Base DQS Project Operations Matching Cleansing Administration SSIS Component Shortcomings 2
About the Presenter Tim Mitchell BI Consultant, Artis Consulting North Texas SQL Server User Group SQL Server MVP Contributing author, MVP Deep Dives Vol 2 Coauthor, SSIS Design Patterns TimMitchell.net | twitter.com/Tim_Mitchell 3
Housekeeping Questions Surveys 4
v v Overview of Data Quality Services
What is DQS? 6 DQS is a knowledge driven data cleansing and matching services Built on top of SQL Server 2012 Simple yet powerful interface
7 What is DQS?
8 Replaces manual data quality work you’re already doing Stored procedures Triggers Custom applications
v v DQS Structure
10 Knowledge Base DQS Structure and Flow Domains Matching Policies Composite Domains Matching Project Cleansing Project Matching Project Cleansing Project
Knowledge Base 11 Starting point for data quality provisioning Uses locally customized data stores or marketplace data sources Highly reusable and evolutionary Key elements: Domains Matching policies
Knowledge Base 12 Create by: Knowledge discovery Domain management Matching rule
Knowledge Base 13
Domains 14 Domain = data field Domain rules Composite domains Allows greater flexibility in domain rules
Data Quality Project 15 Create interactive projects for data matching and cleansing Leverage one or more domains in an existing knowledge base Somewhat reusable
Data Quality Project 16 Nondestructive – no changes to source of data to be cleansed No changes to the KB either Separately, DQS project data can be used to improve the knowledge base
Data Quality Project 17
DQS Operations 18 Cleansing Process data against known entities and domain rules Similar to Fuzzy Lookup transform in SSIS Matching Group data together Similar to Fuzzy Grouping transform in SSIS
DQS Administration 19 Monitor past activity Set logging options Set confidence thresholds
DQS Administration 20
DQS and SSIS 21 SQL Server Integration Services has integrated hook into DQS DQS Cleansing Component Provide automated, noninteractive data cleansing operations
DQS and SSIS 22
v v Demos
Shortcomings 24 V1 product No API – must use DQS client interactively SSIS component only does cleansing
Final Thoughts 25 CU1 performance improvements DQS videos / blogs My blog ( DQS/MDS virtual chapter masterdata.sqlpass.org
v v Questions?