November 10 th, 2011 DQS MATCHING G ADI P ELEG, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012.

Slides:



Advertisements
Similar presentations
How to use the DET (Data Entry Tool) Core data Set H.
Advertisements

How to use the DET (Data Entry Tool) Core data Set J.
CC SQL Utilities.
MS-Access XP Lesson 1. Introduction to MS-Access Database Management System Software (DBMS) Store data in databases Database is a collection of table.
Google Refine Tutorial April, Sathishwaran.R - 10BM60079 Vijaya Prabhu - 10BM60097 Vinod Gupta School of Management, IIT Kharagpur This Tutorial.
Konstanz, Jens Gerken ZuiScat An Overview of data quality problems and data cleaning solution approaches Data Cleaning Seminarvortrag: Digital.
Jewelry Inventory Management Software Your Logo Here Welcome to a demonstration of Del Mar Data Systems Jewelry Inventory Management.
Track, View, Manage and Report on all aspects of the Recruitment Process… with ease!
Data Quality Class 10. Agenda Review of Last week Cleansing Applications Guest Speaker.
ETEC 100 Information Technology
Database Management An Introduction.
DT211 Stage 2 Databases Lab 1. Get to know SQL Server SQL server has 2 parts: –A client, running on your machine, in the lab. You access the database.
Database Design Concepts INFO1408 Term 2 week 1 Data validation and Referential integrity.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
DBI207 3 Data QualityIssueSample Data Problem Standard Are data elements consistently defined and understood ? Gender code = M, F, U in one system and.
Jewelry Inventory Management Software
Managing Master Data with MDS and Microsoft Excel
Troy Eversen | 19 May 2015 Data Integrity Workshop.
Lesson 32: Designing a Relational Database. 2 Lesson Objectives After studying this lesson, you will be able to:  Identify and apply principles for good.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Page 1 ISMT E-120 Introduction to Microsoft Access & Relational Databases The Influence of Software and Hardware Technologies on Business Productivity.
November 10 th, 2011 DQS BOOTCAMP D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012.
Page 1 ISMT E-120 Desktop Applications for Managers Introduction to Microsoft Access.
Database Applications – Microsoft Access Lesson 2 Modifying a Table and Creating a Form 45 slides in presentation Accessibility check 9/14.
Classroom User Training June 29, 2005 Presented by:
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 4 – Creating New.
©Silberschatz, Korth and Sudarshan5.1Database System Concepts Chapter 5: Other Relational Languages Query-by-Example (QBE) Datalog.
Introduction to database systems
INTRODUCTION TO DATA QUALITY SERVICES Presentation by Tim Mitchell (Artis Consulting)
Introduction to Microsoft Access 2003 Mr. A. Craig Dixon CIS 100: Introduction to Computers Spring 2006.
Lead Management Tool Partner User Guide March 15, 2013
By BuilderMT BMT Cloud Models and Options Manager by BuilderMT Using Cloud MoM to build and manage a Builder’s Model & Option database BuilderMT Cloud.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Siebel 8.0 Module 5: EIM Processing Integrating Siebel Applications.
Crystal And Elliott Edward M. Kwang President. Objective A brief demo of Crystal Report to entice you –People spend thousand of dollars to attend Crystal.
WEIGH STAR A Software for Weighing Systems. Features Weigh STAR is a S/W that is designed for weighing systems. It reads the weight (both Gross Weight.
1 IRU – database design part one Geoff Leese September 2009.
Enhancing Forms with OLE Fields, Hyperlinks, and Subforms – Project 5.
1 Duplicate Analyzer Exercises. 2 Installation and Initial Configuration: Exercises Exercises 1.Install Duplicate Analyzer on your local PC. 2.Configure.
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
Database Application Design and Data Integrity AIMS 3710 R. Nakatsu.
ITGS Databases.
© ABB Group November 12, 2015 | Slide 1 ICV Implementation in Region ERP- Status update March 2011 & Plan for Go-Live REMSC, 2011.
Gold – Crystal Reports Introductory Course Cortex User Group Meeting New Orleans – 2011.
Internal and Confidential Cognos CoE COGNOS 8 – Event Studio.
Data Verification and Validation
Constraints Lesson 8. Skills Matrix Constraints Domain Integrity: A domain refers to a column in a table. Domain integrity includes data types, rules,
Introduction to KE EMu Unit objectives: Introduction to Windows Use the keyboard and mouse Use the desktop Open, move and resize a.
Session 1 Module 1: Introduction to Data Integrity
1 CA202 Spreadsheet Application Focusing on Specific Data using Filters Lecture # 5.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Subscribers – List Model
Validation & Verification Today will look at: The difference between accuracy and validity Explaining sources of errors and how they could be overcome.
Rationale Databases are an integral part of an organization. Aspiring Database Developers should be able to efficiently design and implement databases.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Level 2 Objectives: Understanding and Creating Table.
November 10 th, 2011 C LEANSING D ATA IN SSIS D AVID F AIBISH, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
User Manual for Contact Management Customer Relationship Management (CRM) for Bursa Malaysia 2014 Version 1.0 | 4 September 2014.
Prepared By: Bobby Wan Microsoft Access Prepared By: Bobby Wan
GO! with Microsoft Office 2016
GO! with Microsoft Access 2016
Data Quality By Suparna Kansakar.
06 | Managing Enterprise Data
Creating Tables & Inserting Values Using SQL
Data Quality in the BI Life Cycle
Relational Database Design
Presentation transcript:

November 10 th, 2011 DQS MATCHING G ADI P ELEG, S ENIOR P ROGRAM M ANAGER SQL S ERVER D ATA Q UALITY S ERVICES Microsoft SQL Server 2012

Agenda Matching Project What is record matching? Data Issues DQS Matching Process DQS Data Matching Principles Matching Policy

Record matching is the task of identifying records that match the same real world entity.

The Cost of Duplicate Data …a few examples… Direct marketing communications are doubled up unnecessarily. Product shipments and customer-site based services could be sent to the wrong address due to an incorrect duplicate record being used. Your sales reporting may be inaccurate due to an over- inflated number of customers. Inaccurate sales analysis due to sales being split between multiple records that represent the same customer, resulting in an undervaluing of some key customers.

Where do Duplicate Records come from? Poorly designed softwareNo verification of existing records upon entry Formatting & abbreviations "Doctor Robert Smith" Vs. "Dr. Bob Smith". Data validationHuman errors can creep into the system when fields’ input is not validated Company merging and acquisitions Merging systems may result in duplicates in the merged data. Change of attributesThe same person may appear to not exist in the database if some of the attributes were changed (e.g., address, name etc.)

…Data Issues… There are different ways to represent the same person or address in a database: Data is ‘fuzzy’ in nature (spelling mistakes, abbreviations etc.).

How Data Issues Affects Matching? Matching Results Matching Results Reasoning The Data

Integrated Profiling Progress Notifications Status Connect Build Use DQ Projects Knowledge Management Knowledge Base Sample Data Sample Data

1. Prepare Matching Policy Leverage a KB with existing knowledge Design matching policy rules Each Rule weighs single or multiple domains Tune your policy with your source data 2. Matching Project Map your source to the Relevant KB Run matching Review results and reject mismatches Export survivors And matching results DQS Matching Experience

Identifies exact and approximate matches, enabling removal of duplicate data. Enables creating a matching policy interactively using a computer-assisted process. Ensures that values that are equivalent, but were entered in a different format or style, are in fact rendered uniform.

A matching policy is prepared in the knowledge base. A matching policy consists of matching rules that assess how well one record matches to another. Specify in the rule whether records’ values have to be an exact match, similar, or prerequisite. Train your policy by running and tuning each rule separately.

Identify the attributes in your data that are most significant for matching. Create domains/composite domains based on your data structure. Define matching rules.

Similarity, select Similar if field values can be similar. Select Exact if field values must be identical. Weight, determines the contribution of each domain in the rule to the overall matching score for two records. Prerequisite validates whether field values return a 100% match; else the records are not considered a match. Minimum matching score is the threshold at or above which two records are considered to be a match.

Domains of type ‘Date’, ‘Integer’ or ‘Decimal’ can be matched using the ‘Similar’ property by assigning a tolerance either in percentage or integer. Field values that fall within the defined tolerance are considered a match.

The Matching Results tab displays statistics for the current and previous run of a matching rule. Restore the previous rule.

Home Team Song Artist

The DQS matching system uses the knowledge accumulated in the knowledge base to propose matching candidates. This knowledge includes: Synonyms, Syntax Errors and their Leading Value (by domain) Domain Values and their synonyms and syntax errors are used by the matching system to find identical or similar records. Term-Based Relations (TBR) TBR improves consistency of data attributes values by transforming data values to a single form using user-defined term relations. In matching, TBRs are only applied in-memory for boosting matching accuracy. Nulls and Equivalents (“Unknown”, “99999”…) Manage values that represent missing data by linking to the ‘DQS_Null’ value to assure that they are considered as a match.

String 1String 2Similarity ScoreCharacter BeforeAfter 175 CLEARBROOK ROAD P.O. BOX E. 42ND STREET1834 E. 42ND. ST DE KALB AVE, NE1721 DE KALB AVE NE , S. GARFIELD AVE., BLDG. 1-B14538 S GARFIELD AVE BLDG 1B ,. - #704, SJ Technoville BD, SJ Technoville BD #, - Example:

Export - export both matching results (clusters) and survivors (unique records). A Matching project is performed in three steps: Mapping - map source columns to domains. Matching - run matching and view the results; it includes additional functionality such as: Reject records Filter results by ‘Matched’ & ‘Unmatched’ and by matching score. Display clusters in two different methods (overlapping and non- overlapping )

In Overlapping clusters a record may appear more than once in various clustered results. This structure may be harder to read since the same record exists in multiple clusters. In Non-Overlapping clusters, the system unifies clusters containing the same record. This structure is easier to read as you won't repeat the same observation twice. Overlapping Clusters (A~B), (B~C) Non-Overlapping Cluster (A~B~C)

Overlapping Clusters Non-Overlapping Clusters

Check the Rejected box to move the records out of the proposed cluster upon moving to the next page in the activity. Unlike the Cleansing Data Project where records move between tabs instantly, the rejected records are not removed from the clusters on the user interface. DQS Client User Interface Exported Matching Results

Matching and Survivorship results can be exported to a SQL table, Excel or CSV file for further analysis or consumption.

The Story Contoso airport receives passenger details from different airlines; the data contains duplicate passengers information which need to be identified and removed. The Story Contoso airport receives passenger details from different airlines; the data contains duplicate passengers information which need to be identified and removed. Exercise Description In this exercise you will : Prepare a Matching Policy and tune the matching rules. Create a Matching Project and run a matching process to identify duplicate passengers. Export the matching and survivors results. Exercise Description In this exercise you will : Prepare a Matching Policy and tune the matching rules. Create a Matching Project and run a matching process to identify duplicate passengers. Export the matching and survivors results.

Resources Sessions On-Demand & CommunityMicrosoft Certification & Training Resources Resources for IT ProfessionalsResources for Developers Connect. Share. Discuss.

DQS Blog Tips, tricks and guidance on best practices for using DQS – courtesy of the DQS team DQS Blog Tips, tricks and guidance on best practices for using DQS – courtesy of the DQS team DQS Movies A set of getting started movies for an easy introduction to DQS DQS Movies A set of getting started movies for an easy introduction to DQS DQS Forum Come participate in DQS related discussions in our DQS forum on MSDN DQS Forum Come participate in DQS related discussions in our DQS forum on MSDN Available Here blogs.msdn.com/b/dqs Available Here