Inaport Training Fuzzy Matching. © Copyright 2010 InaPlex Inc Matching Process of deciding which record or set of records in the target table(s) should.

Slides:



Advertisements
Similar presentations
Inaport Training Introduction to Matching. Matching The purpose of Inaport is to: Extract data from a source Transform that data Load into a target Loading.
Advertisements

Organisation Of Data (1) Database Theory
Microsoft Office Illustrated Fundamentals Unit K: Working with Data.
Ultimate Bundle Overview Products Benefits Technical Requirements Licensing Pricing Valid until 01-Sep-2010.
Microsoft ® Access ® 2010 Training Create queries for a new database.
Fuzzy Duplicates Analysis with ACL
Microsoft® Office Access® 2007 Training
Company Confidential 1 © 2005 Nokia DBUpgradeTool_ ppt / / JMa A Database Upgrade Tool Nokia Networks Jukka Maaranen.
Inaport Training Standard Matching. © Copyright 2010 InaPlex Inc Matching Process of deciding which record or set of records in the target table(s) should.
 Copyright I/O International, 2013 Visit us at: A Feature Within from Sales Rep User Friendly Maintenance – with Zip Code.
1 of 30 G/L Journal Authorisation / DA00594-w1 Last updated: G/L Journal Authorisation.
Concepts of Database Management Seventh Edition
Concepts of Database Management Sixth Edition
User Friendly Customer Ship-To Maintenance A Family of Enhancements For iSeries 400 DMAS from  Copyright I/O International, 2006, 2007, 2008, 2010, 2011.
Classifications and CASCOT Ritva Ellison Institute for Employment Research University of Warwick.
Queries Help Topics Using the Access 2007, you can press the F1 to pop up the help windows, where you can search the following topics to help to generate.
C++ for Engineers and Scientists Third Edition
Utility Service Database Design a database to keep track of service calls for a utility company: Customers call to report problems Call center manages.
Microsoft Access 2010 Chapter 7 Using SQL.
Access Tutorial 3 Maintaining and Querying a Database
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Cummins® Inc. Update Manager 3.0 Training Electronic Service Tools.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
© Paradigm Publishing, Inc Access 2010 Level 2 Unit 1Advanced Tables, Relationships, Queries, and Forms Chapter 3Advanced Query Techniques.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall 1 1. Chapter 2: Relational Databases and Multi-Table Queries Exploring Microsoft Office.
Copyright © 2014, 2015 William R. Vaughn All rights reserved William R. Vaughn.
1 Lab Session-III CSIT-120 Fall 2000 Revising Previous session Data input and output While loop Exercise Limits and Bounds Session III-B (starts on slide.
Education Process Management. REPRODUCTION OR QUOTATION, IN WHOLE OR IN PART, IS STRICTLY PROHIBITED. Copyright ® 2006 Computer Information Concepts, Inc.
Copyright © 2008 Pearson Prentice Hall. All rights reserved.1 1 Committed to Shaping the Next Generation of IT Experts. Chapter 2: Relational Databases.
CS&E 1111 AcQueries Writing Simple Queries in Access Displaying on specific data fields Filtering data using criteria Objectives: Learn how to use the.
INSERT BOOK COVER 1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith.
CSCI 6962: Server-side Design and Programming Validation Tools in Java Server Faces.
© Hanson Research Corporation Deduping contacts in Sage CRM 24 th Day of November 2010.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Database Queries. Queries Queries are questions used to retrieve information from a database. Contain criteria to specify the records and fields to be.
FireRMS NEMSIS (Part 2) Presented by Laura Small FireRMS Quality Assurance.
Lesson 2.  To help ensure accurate data, rules that check entries against specified values can be applied to a field. A validation rule is applied to.
1 Chapter 4: Selection Structures. In this chapter, you will learn about: – Selection criteria – The if-else statement – Nested if statements – The switch.
Microsoft ® Access ® 2010 Training Create Queries for a New Database If a yellow security bar appears at the top of the screen in PowerPoint, click Enable.
Examining data using Microsoft Access Queries Using Criteria and Calculations SESSION 3.2 This section covers specifying an exact match condition in a.
Agency (BU) Query Manager Training State of Indiana Instructor: Lori Shapiro, ENTAP.
Database Systems Microsoft Access Practical #3 Queries Nos 215.
Views Lesson 7.
SESSION 3.1 This section covers using the query window in design view to create a query and sorting & filtering data while in a datasheet view. Microsoft.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
Copyright 2007, Paradigm Publishing Inc. ACCESS 2007 Chapter 2 BACKNEXTEND 2-1 LINKS TO OBJECTIVES Creating Related Tables Creating Related Tables Determining.
1 Duplicate Analyzer Exercises. 2 Installation and Initial Configuration: Exercises Exercises 1.Install Duplicate Analyzer on your local PC. 2.Configure.
Translation Memory System (TMS)1 Translation Memory Systems Presentation by1 Melina Takanen & Julianna Ekert CAT Prof. Thorsten Trippel University.
3 / 12 CHAPTER Databases MIS105 Week-10/ Lec02 Irfan Ahmed Ilyas.
Access Chapter 3-Obtaining Answers to Your Data Questions.
Setting Up TGO User Accounts. Creating User Accounts for Other Users If your company has other users who need to use the Active Orders system, your company’s.
INSERT BOOK COVER 1Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Getting Started with VBA for Microsoft Office 2010 by.
Copyright © 2010 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with Programming Logic & Design Second Edition by Tony Gaddis.
HKOI Programming HKOI Training Team (Intermediate) Alan, Tam Siu Lung Unu, Tse Chi Yung.
NSF DUE ; Wen M. Andrews J. Sargeant Reynolds Community College Richmond, Virginia.
MICROSOFT ACCESS – CHAPTER 3 CONTD. Sravanthi Lakkimsetty Mar 09, 2016.
SunGard BSR Inc. 1 University of Melbourne Advance Web Access Training CONFIDENTIALITY STATEMENT: This document contains information.
IS OPEN THE LIBRARY Polaris ILS Patron Services 5.0 SP3 Training.
Lesson 17 Mail Merge. Overview Create a main document. Create a data source. Insert merge fields into a main document. Perform a mail merge. Use data.
DAY 20: ACCESS CHAPTERS 5, 6, 7 Larry Reaves October 28,
Writing Simple Queries in Access
PIC + TransNet.
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Database Queries.
Introduction to the New SSA OnePoint Online Website
Creating and Modifying Queries
Apprenticeship Support and Knowledge for Schools
For First Place Most Times Up at the Table
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Presentation transcript:

Inaport Training Fuzzy Matching

© Copyright 2010 InaPlex Inc Matching Process of deciding which record or set of records in the target table(s) should be updated Alternatively, decide if record already exists and take appropriate action

© Copyright 2010 InaPlex Inc Matching Techniques Inaport supports different ways to match Standard build expressions on source and target Fuzzy Refine Standard to allow for poor data SQL Use SQL SELECT instead of expressions

© Copyright 2010 InaPlex Inc Fuzzy Matching Standard Matching can use any combination of fields can use expressions BUT Ultimately is restricted to exact match “InaPlex” <> “Innerplex Ltd”

© Copyright 2007 InaPlex Limited Fuzzy Matching Fuzzy matching compares source and target, and gives a similarity score Score measures how “close” two strings are Score = 1 : Perfect match Score = 0 : No match “InaPlex” and “inaplx” : 98% “InaPlex” and “innerplex” : 87% “InaPlex” and “ibm” : 49% See Tools – Fuzzy Match Demo

© Copyright 2010 InaPlex Inc How it Works As with Standard matching, Fuzzy match Can use any field or combination of fields Reads the match fields Builds an in memory index for each table The target match expression is applied to the field data read from the table

© Copyright 2010 InaPlex Inc How it Works Set scoring levels Score > Upper good match – accept immediately Lower < Score < Upper Possible match – user review Score < lower Not a match – reject No match < Lower < Possible < Upper < Good No match < 85% < Possible < 95% < Good

© Copyright 2010 InaPlex Inc How it Works When a source record comes in: Source expression applied to build match value Source match value scored against every value in target index “Best” matches used – you set boundaries No match < Possible match < Good match No 0.85 Possible 0.95 Good

© Copyright 2010 InaPlex Inc How it Works User Review Shows the source record and possible matches in target User can select one or more records as match Options Review “good” and “possible” matches –For testing purposes Review just “possible” matches –If there are no possibles, good and no match accepted automatically No review –Good and no match accepted automatically –Possible treated as bad

© Copyright 2010 InaPlex Inc How it Works Customise User Review May need to see more than the target table to decide on match Can also display associated tables E.g. Address, Contact Can also select which fields from associated tables to display

© Copyright 2007 InaPlex Limited Example – Operation Tab Select Fuzzy Match from Match Type

© Copyright 2010 InaPlex Inc Example – Match Tab Specify the base match criteria Source and target match expressions Boundary scores for no, possible, good matches Cluster Match covered later

© Copyright 2010 InaPlex Inc Example – Match Tab Set up the User Review Can choose No review – use in batch mode Only possible matches – accept good matches Good + possible – review all matches

© Copyright 2007 InaPlex Limited Example – User Review Shows possible matches at run time Source record Possible matching target records, with score If configured, child records of selected target record Allows selection of desired matches

© Copyright 2010 InaPlex Inc Clustering Fuzzy Matching is powerful, flexible BUT Every source record must be scored against EVERY target match, then highest scores selected 100,000 records in target => 100,000 scores per source record Solution is CLUSTERING

© Copyright 2010 InaPlex Inc Clustering Specify an expression to sort target records into clusters Then an equivalent expression for source to sort it into one cluster Finally scoring only done against members of the selected cluster 100,000 target divided into 20 clusters 5,000 records per cluster => 5,000 scores per source record

© Copyright 2010 InaPlex Inc Clustering Cluster expression should: Sort target into roughly equal groups Guard against allocating source to wrong cluster Examples First letter of company name Zip/Post code Phone area code

© Copyright 2010 InaPlex Inc Clustering Alpha Corp Zulu Corp Beta Corp Source record scored against every record in target No clustering established

© Copyright 2007 InaPlex Limited Clustering Set up clustering based on first letter of company name

© Copyright 2010 InaPlex Inc Clustering Alpha Corp Zulu Corp Beta Corp Source record only scored against records in “b” cluster Beta Corp Brown Corp Cluster on first letter

© Copyright 2010 InaPlex Inc Clustering Important Note Because source records will only be scored against one cluster, if clustering is poorly done can lead to missed matches “naplex” would look in “n” cluster, not “I” Cluster expression does NOT have to use same fields as match E.g. Match on name, cluster on ZIP code

© Copyright 2010 InaPlex Inc Summary Fuzzy matching provides powerful new tool for handling complex, dirty data Need to Use carefully, especially clustering Allow of overhead of user review

© Copyright 2010 InaPlex Inc THANK YOU