Download presentation
Presentation is loading. Please wait.
Published byMeagan Elliott Modified over 9 years ago
2
SQL S ERVER D ATA Q UALITY S ERVICES Marc Jellinek Principal Consultant – Neudesic marc.jellinek@neudesic.com
3
A BOUT M E Experience Principal Consultant - Neudesic Assistant Director (SQL Team) – Application Engineering at Ernst & Young IT Manager at MLB Network Sr. Technology Specialist at Microsoft Technologies Microsoft SQL Server 6.0, 6.5, 7.0, 2000, 2005, 2008, 2008 R2 and 2012 Relational Engine, Analysis Services, Integration Services and Reporting Services Marc Jellinek – marc.jellinek@neudesic.com
4
S ESSION O BJECTIVES Introduction to SQL Server Data Quality Services (DQS) Understanding the problem Demo Where do we go from here?
5
S ETTING THE STAGE Building from the SQL Server Series: Master Data Services in SQL Server 2012, presented by Patrick Gallucci http://www.neudesic.com/media/webcasts/20120501/20120501.wmv (start at 6:26) PASS MDM/DQS Virtual Chapter http://masterdata.sqlpass.orghttp://masterdata.sqlpass.org Based on demos from “SQL Server 2012 Developers Update”
6
Data QualityIssueSample Data Problem Standard Are data elements consistently defined and understood? Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system Complete Is all necessary data present?20% of customers’ last name is blank, 50% of zip-codes are 99999 Accurate Does the data accurately represent reality or a verifiable source? A Supplier is listed as ‘Active’ but went out of business six years ago Valid Do data values fall within acceptable ranges? Salary values should be between 60,000-120,000 Unique Data appears several timesBoth John Ryan and Jack Ryan appear in the system – are they the same person? T HE D ATA Q UALITY P ROBLEM S PACE
7
W HAT ’ S T HE P ROBLEM My name is Marc Jellinek Marc <> “Mark”, “Marck” or “March” Jellinek <> “Jelinek”, “Jellineck”, “Jelineck”, “Jelliner”, “Jeliner” or “Jellyneck” RrRr
8
W HAT ’ S THE P ROBLEM JellinekJelinekJellineckJelineckJellinerJelinerJellyneck Marc Mark Marck March
9
T HE NIGHTMARE S CENARIO The Customer Dimension – Jellinek, Marc – Jellinek, Mark – Jellinek, Marck – Jellinek, March – Jelinek, Marc – Jelinek, Mark – Jelinek, Marck – Jelinek, March – Jellineck, Marc – Jellineck, Mark – Jellineck, Marck – Jellineck, March – Jelineck, Marc – Jelineck, Mark – Jelineck, Marck – Jelineck, March – Jelliner, Marc – Jelliner, Mark – Jelliner, Marck – Jelliner, March – Jelliner, Marc – Jelliner, Mark – Jelliner, Marck – Jelliner, March – Jellyneck, Marc – Jellyneck, Mark – Jellyneck, Marck – Jellyneck, March
10
A NALYTIC I MPACT Average Revenue per customer Average Profit per customer Number of customers Customers per Geography Customers by Income Customers by Gender Customers by Educational Level Customers by Product Bought
11
O BLIGATORY T RUISMS The accuracy of your reporting is determined by the accuracy of your data (Garbage In, Garbage Out) Decisions made based on data will only be as good as the data on which you are basing your decisions. You can’t manage what you can’t measure. Inaccurate measurements lead to interesting management challenges.
12
T HE D ATA Q UALITY S OLUTION S PACE Amend, remove or enrich data that is incorrect or incomplete. This includes correction, enrichment and standardization. Identifying, linking or merging related entries within or across sets of data. CleansingMatching ProfilingMonitoring Analysis of the data source to provide insight into the quality of the data and help to identify data quality issues. Tracking and monitoring the state of Quality activities and Quality of Data.
13
SQL S ERVER 2012 D ATA Q UALITY S ERVICES High quality data is critical to effective business intelligence and to business activities DQS is an on-premise Data Quality product in SQL Server 2012, extendible with knowledge from multiple parties thru Azure DataMarket Richer DQ knowledge and capabilities in the cloud will make it even easier to provide high quality data Data Quality Services (DQS) is a Knowledge-Driven data quality solution enabling IT Pros and data stewards to easily improve the quality of their data Included with SQL Server 2012 Enterprise and BI Editions
14
K EY D ATA Q UALITY S ERVICES C ONCEPTS Knowledge-Driven Semantics Knowledge Discovery Based on a Data Quality Knowledge Base (DQKB) Data Domains capture the semantics of your data Acquires additional knowledge the more you use it Open and Extendible Easy to use Add user-generated knowledge & 3 rd party reference data providers User experience designed for increased productivity
15
DQS A RCHITECTURE Matching Reference Data DQ Clients DQS UI DQ Server DQ Projects StoreCommon Knowledge Store Knowledge Base Store DQ Engine 3 rd Party / Internal MS DQ Domains Store MS DQ Domains Store Reference Data Services Reference Data Sets SSIS DQ Component DQ Active Projects MS Data Domains Local Data Domains Published KBs Knowledge Discovery Data Profiling & Exploration Cleansing Knowledge Discovery and Management Interactive DQ Projects Data Exploration Azure Market Place Categorized Reference Data Categorized Reference Data Services Reference Data API (Browse, Get, Update…) Reference Data API (Browse, Get, Update…) RD Services API (Browse, Set, Validate…) RD Services API (Browse, Set, Validate…) MDS Excel Add in
16
D ATA Q UALITY S ERVICES P ROCESSES Build Use DQ Projects Knowledge Management Match & De-dupe Correct & standardize Manage Knowledge Enterprise Data Reference Data Reference Data Cloud Services Integrated Profiling Notifications Progress Status Knowledge Base Discover / Explore Data
17
B ASIC D EFINITIONS Knowledge Base –Stores all the knowledge related to a specific type of data source –Container for domains Domain –Semantic representation of a type of data in a data field or column –Trusted values, invalid values and erroneous data –Synonym associations, term relationships, validation and business rules, matching policies Matching Rule –Set of rules and conditions that determine a match or duplicate
18
D ATA Q UALITY S ERVICES C OMPONENTS Data Quality Server Data Quality Client DQS Cleansing Component for SQL Server Integration Data Quality Processes in Master Data Management
19
D ATA Q UALITY S ERVICES C OMPONENTS Data Quality Server –SQL Server Databases DQS_MAIN –DQS Stored Procedures, the DQS Engine and published Knowledge Bases DQS_PROJECTS –Data required for knowledge base management and DQS project activities DQS_STAGING_AREA –Intermediate staging area where source data is copied and processed
20
D ATA Q UALITY S ERVICES C OMPONENTS Data Quality Client –Standalone application Designed for both data stewards and DQS Administrators Perform knowledge management, data quality projects and administration in one user interface Allows for domain management, matching policy creation, data cleansing, matching, profiling, monitoring and server administration. Can be installed on a remote computer
21
D ATA Q UALITY S ERVICES C OMPONENTS DQS Cleansing Component in SQL Server Integration Services –Performs data cleansing as a part of an SSIS package –Alternative to running a cleansing project within the Data Quality Services Client application Data Quality Processes within Master Data Services –Perform de-duplication on source data and master data within the Microsoft SQL Server Data Services Add-in for Microsoft Excel.
22
DEMO
23
R ESOURCES DBI207: Using Knowledge to Cleanse Data with Data Quality Services http://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI207 Data Quality Services Blog http://blogs.msdn.com/b/dqs Books Online for SQL Server - Data Quality Services http://technet.microsoft.com/en-us/library/ff877925(SQL.110).aspx Install Data Quality Services http://technet.microsoft.com/en-us/library/gg492277.aspx Used Master Data Services Configuration Manager, set up IIS http://msdn.microsoft.com/library/ee633744%28SQL.110%29.aspx Troubleshoot Installation and Configuration Issues (Master Data Services in SQL Server 2012 http://go.microsoft.com/fwlink/?LinkId=226284 SQL Server 2012 Developer Training Kit Web Installer http://www.microsoft.com/en-us/download/details.aspx?id=27721 SQL Server 2012 Update for Developers Training Workshop http://social.technet.microsoft.com/wiki/contents/articles/6981.sql-server-2012-update-for-developers-training-workshop.aspx SQL Server 2012 Update for Developers Training Kit http://social.technet.microsoft.com/wiki/contents/articles/6982.sql-server-2012-developer-training-kit-bom-en-us.aspx SQL Server 2012 Update for Developers Training Kit Content http://social.technet.microsoft.com/wiki/contents/articles/6976.sql-server-2012-developer-training-kit-content-en-us.aspx MSDN – Data Quality Services http://msdn.microsoft.com/en-us/library/ff877925(v=sql.110).aspx MSDN Discussions – Data Quality Services http://social.msdn.microsoft.com/Forums/en-US/sqldataqualityservices/threads Technet – Data Quality Services http://technet.microsoft.com/en-us/library/ff877925.aspx PASS MDM/DQS Virtual Chapter http://masterdata.sqlpass.org
24
Driving Innovation with SAP On-Premise and Beyond Monish Nagisetty Daniel Sepp 7 June 2012 10:00-11:00 PDT http://www.clicktoattend.co m/?id=159718 Delivering a Semantic Model for Ad-Hoc Reporting Steve Muise21 June 2012 10:00-11:00 PDT http://www.clicktoattend.co m/?id=159917 Exploring the New Hadoop Implementation in Azure Jacob Saunders10 July 2012 10:00-11:00 PDT http://www.clicktoattend.co m/?id=160452 Amerishore – an Innovative, Socially Conscious Approach to Offshoring Tracy Derr17 July 2012 10:00-11:00 PDT http://www.clicktoattend.co m/?id=160650 Data Integration Improvements within SQL Server 2012 Kola Bolarin24 July 2012 10:00-11:00 PDT http://www.clicktoattend.co m/?id=160322 EDI: The Reports of My Death Have Been Greatly Exaggerated Abhilash Shanmug9 August 2012 10:00-11:00 PDT http://www.clicktoattend.co m/?id=159720 http://www.neudesic.com/About/Events/Pages/RecentWebcasts.aspx
25
THANK YOU
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.