November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012
Our Day Together … 2
DATA QUALITY 101
4
5 Top 3 impediments Source: Information Week Reports, 2011
6 Top Barrier for BI Source: Information Week Reports, 2011
7 DQ is MDM top driver Source: Information Week Reports, 2011
8 Demand is on the rise. Overall market size for DQ software in 2010 was $800M. 12.6% increase over Forecasted 16% yearly grow in next five years. - Gartner, 2011 It’s not only the breadth of functional capabilities. Focus on the business User. Leverage your business resources. - Gartner, 2011 Business process – For data quality (and MDM) initiatives to be a success – they need to support integration with the existing business processes Data Integration market ($2.6B in 2009) Source: Gartner
9 Data Quality IssueSample Data Problem Standard Are data elements consistently defined and understood? Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system Complete Is all necessary data present?20% of customers’ last name is blank, 50% of zip-codes are Accurate Does the data accurately represent reality or a verifiable source? A Supplier is listed as ‘Active’ but went out of business six years ago Valid Do data values fall within acceptable ranges? Salary values should be between 60, ,000 Unique Data appears several timesBoth John Ryan and Jack Ryan appear in the system – are they the same person?
10 DQ Issues and DQ Dimensions NameGenderStreetHouse #Zip codeCityStateD.O.B John DoeMale60th street45 New York 08/12/64 Jane DoeMaleJonathan ln PoughkeepsyNY21-dec-1954 NameGenderStreetHouse # Zip code CityStateD.O.B John DoeMaleE 60th St45W10022New YorkNY 08/12/64 Jane DoeFemaleJonathan Lane PoughkeepsieNY12/21/54 NameAddressPostal CodeCityState John Smith 545 S Valley View Drive # AnytownNew York Margaret & John smith 545 Valley View ave unit AnytownNew York Maggie Smith 545 S Valley View Dr AnytownNew York John Smith545 Valley Drive St.34253NY NameAddressZip CodeCityStateCluster John Smith 545 S Valley View Drive # AnytownNew York1 Margaret & John smith 545 Valley View ave unit AnytownNew York1 Maggie Smith 545 S Valley View Dr AnytownNew York1 John Smith545 Valley Drive St.34253NY 2 Before After Completeness Accuracy Conformity Consistency Uniqueness
11 Amend, remove or enrich data that is incorrect or incomplete. This includes correction, enrichment and standardization. Identifying, linking or merging related entries within or across sets of data. CleansingMatching ProfilingMonitoring Analysis of the data source to provide insight into the quality of the data and help to identify data quality issues. Tracking and monitoring the state of Quality activities and Quality of Data.
INTRODUCE DQS
AlwaysOn ColumnStore Index Power View Data Quality Services Distributed Replay Reporting Alerts Multiple Secondaries Availability Groups Unstructured Data Performance Flexible Failover Policy Contained Database Authentication SharePoint Active Directory Support
14 High quality data is critical to effective business intelligence and to business activities DQS is an on-premise Data Quality product in SQL Server 2012, extendible with knowledge from multiple parties thru Azure DataMarket Richer DQ knowledge and capabilities in the cloud will make it even easier to provide high quality data Data Quality Services (DQS) is a Knowledge-Driven data quality solution enabling data stewards to easily improve the quality of their data
Knowledge-Driven Semantics Knowledge Discovery Based on a Data Quality Knowledge Base (DQKB) Data Domains capture the semantics of your data Acquires additional knowledge the more you use it Open and Extendible Easy to use Add user-generated knowledge & 3 rd party reference data providers User experience designed for increased productivity
16 Build Use DQ Projects Knowledge Management Match & De-dupe Correct & standardize Manage Knowledge Connect Enterprise Data Reference Data Reference Data Cloud Services Integrated Profiling Notifications Progress Status Knowledge Base Discover / Explore Data
Matching Reference Data DQ Clients DQS UI DQ Server DQ Projects StoreCommon Knowledge Store Knowledge Base Store DQ Engine 3 rd Party / Internal MS DQ Domains Store MS DQ Domains Store Reference Data Services Reference Data Sets SSIS DQ Component DQ Active Projects MS Data Domains Local Data Domains Published KBs Knowledge Discovery Data Profiling & Exploration Cleansing Knowledge Discovery and Management Interactive DQ Projects Data Exploration Azure Market Place Categorized Reference Data Categorized Reference Data Services Reference Data API (Browse, Get, Update…) Reference Data API (Browse, Get, Update…) RD Services API (Browse, Set, Validate…) RD Services API (Browse, Set, Validate…) MDS Excel Add in Future Clients – Excel, Dynamics
With DQS the IW / Data Expert can get actively involved in Data Quality initiatives
Knowledge-Driven Rich semantic Knowledge Base Continuous improvement as knowledge is discovered Build once, reuse for multiple DQ improvements Open and Extendible Easy to use Focus on cloud-based Reference Data User-generated knowledge Integration with SSIS and MDS Focus on productivity and user experience Designed for business users Out-of-the-box knowledge (DQ content)
Resources Sessions On-Demand & CommunityMicrosoft Certification & Training Resources Resources for IT ProfessionalsResources for Developers Connect. Share. Discuss.
DQS Blog Tips, tricks and guidance on best practices for using DQS – courtesy of the DQS team DQS Blog Tips, tricks and guidance on best practices for using DQS – courtesy of the DQS team DQS Movies A set of getting started movies for an easy introduction to DQS DQS Movies A set of getting started movies for an easy introduction to DQS DQS Forum Come participate in DQS related discussions in our DQS forum on MSDN DQS Forum Come participate in DQS related discussions in our DQS forum on MSDN Available Here blogs.msdn.com/b/dqs Available Here