DQS: Business Logic Meets Enterprise Integration September 14th, 2013
About Me Senior Consultant at Pragmatic Works Present at SQL Saturday’s, code camps, SQL chapters Blog at intelligentsql.wordpress.com Twitter : @sqlbischmidt
DQS DQS was introduced in SQL Server 2012 Allows us to bring data cleansing and business logic into our data warehouse/data mart and apply rules and standardization to it to create a cleaner reporting environment NOT a replacement for Master Data Management
Why use it? Fixes “incorrect” data Clean up bad data So our inserted row into our final table is clean
Knowledge Base The database of knowledge! About data! Understands the data, and helps maintain integrity over itself i.e. Florid is the same as Florida Consists of domains Domain Management creates and manages domains within the knowledge base (KB) Knowledge discovery learns patterns in your data and adds that machine knowledge into your knowledge base Matching policy teaches DQS where one records equals another. John Smith is the same as John B Smith
Domains Single domains are individual representations of data in a data field Manage key attributes about that field Distinct list of values that should be allowed Composite domains exist from one of more single domains and can use cross-domain rules or reference data sets to further clean the data Collections of single domains
Data Quality Project Uses a knowledge base as the source Improve the data by Cleaning & Matching Run against already existing data. Warehouse, anyone? Exports data to SQL Server or Excel Clean it, then match it
Administration Activity Monitoring Configuration Cleansing and creating that has occurred in the environment What consumed it and when? Configuration Add Azure Data Market account Set min score for cleansing (70%) and matching (80%)
SSIS Integration In 2012, there is a DQS component that can consume a knowledge base Clean the data as it’s coming in!
The DQS Ecosystem Discover Match Cleanse Identify your data Create the standards Cleanse Standardize your company data Run the rules
The End Comments are welcome! Please feel free to contact me via twitter (@sqlbischmidt) or email at sqlbischmidt@gmail.com or cschmidt@pragmaticworks.com