Fuzzy Duplicates Analysis with ACL

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Debugging ACL Scripts.
2017/3/25 Test Case Upgrade from “Test Case-Training Material v1.4.ppt” of Testing basics Authors: NganVK Version: 1.4 Last Update: Dec-2005.
Chapter 13: Query Processing
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Chapter 40 - Physiology and Pathophysiology of Diuretic Action Copyright © 2013 Elsevier Inc. All rights reserved.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Structured Query Language (SQL)
Database Queries and Structured Query Language (SQL) J.G. Zheng May 16 th 2008.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
XP New Perspectives on Microsoft Office Word 2003 Tutorial 6 1 Microsoft Office Word 2003 Tutorial 6 – Creating Form Letters and Mailing Labels.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Child Health Reporting System (CHRS) How to Submit VHSS Data
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 10 second questions Numbers and number system Numbers and the number system, fractions, decimals, proportion & probability.
Creating Data Entry Screens in Epi Info
Dr. Engr. Sami ur Rahman Data Analysis Lecture 6: SPSS.
Web-Based Planning Tools for Missouri Show-Me Ag Classic February 1, 2006 Columbia, MO Chris Barnett Center for Agricultural, Resource and Environmental.
Richmond House, Liverpool (1) 26 th January 2004.
Report Card P Only 4 files are exported in SAMS, but there are at least 7 tables could be exported in WebSAMS. Report Card P contains 4 functions: Extract,
Eiffel: Analysis, Design and Programming Bertrand Meyer (Nadia Polikarpova) Chair of Software Engineering.
© Paradigm Publishing, Inc Access 2010 Level 1 Unit 1Creating Tables and Queries Chapter 2Creating Relationships between Tables.
1 Lesson 10 Working with Tables Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Microsoft Office Illustrated Fundamentals Unit K: Working with Data.
Yong Choi School of Business CSU, Bakersfield
Microsoft Access.
1 Information Systems: Higher Database Systems. 2 AdamsAndrea D64 Carluke Street,JamestownGlasgow BairdHamish J7 Cedar Walk,Aberdeen01224.
1 After completing this lesson, you will be able to: Check spelling in a document. Check for grammatical errors. Find specific text. Replace specific text.
Integration Integrating Word, Excel, Access, and PowerPoint
© S Haughton more than 3?
Benchmark Series Microsoft Excel 2013 Level 2
Inaport Training Fuzzy Matching. © Copyright 2010 InaPlex Inc Matching Process of deciding which record or set of records in the target table(s) should.
CREATING A PAYMENT REQUEST FOR VENDOR IN SYSTEM
CREATING A PAYMENT REQUEST FOR A NEW VENDOR
General Navigation Training Presentation for Supply Chain Platform: BAE Systems July 2012.
 Copyright I/O International, 2013 Visit us at: A Feature Within from Item Class User Friendly Maintenance  Copyright.
How to Import a Survey from SPS Format
Proprietary and Confidential External Job Board Posting In FOX Live on Monday – October 20,
Energy & Green Urbanism Markku Lappalainen Aalto University.
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
Twenty C# Questions Explained Gerry O’Brien Content Development Manager Paul Pardi Senior Content Pub Manager.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
School Census Summer 2011 Headlines Version Jim Haywood Product Manager for Statutory Returns.
Test B, 100 Subtraction Facts
Week 1.
Number bonds to 10,
We will resume in: 25 Minutes.
1 Unit 1 Kinematics Chapter 1 Day
New Rubrics? Create Assignment and Project Templates and Send the Assignments in Tk20 Must do before you can use updated rubrics Theresa Dorn.
ACD Training.
Computer Concepts BASICS 4th Edition
Page 1 of 34 The Search Agreements functionality in Transfers enables you to find existing agreements pertaining to registered interests. Using the search.
CBISA ™ User Permission Levels Use “enter” on your keyboard, or click your left mouse button to move through the screens 1 For any CBISA TM questions,
CSCI3170 Introduction to Database Systems
VA. ACL USER’S GROUP Functions Intermediate to Advanced April 21, 2009 Kimberly M. Taylor, CPA, CISA Chesterfield County, VA.
© EZ-R Stats, LLC Duplicate Payments Slide 1 Auditing for Duplicate Payments A better way … Presentation of
© Hanson Research Corporation Deduping contacts in Sage CRM 24 th Day of November 2010.
© EZ-R Stats, LLC Duplicate Payments Slide 1 Auditing for Duplicate Payments A better way … Web CAAT.
Merging Word Documents
Presentation transcript:

Fuzzy Duplicates Analysis with ACL Prepared by: Kevin Legere Date: April 3rd, 2013

Agenda Overview Example FUZZYDUP command OMIT() Function Script Editor and RECOFFSET Q&A

Overview What is a "Fuzzy Duplicate"? Typically used for: Match based on criteria where the values are not exact but very close EX: "ACL Services" and "ACL Service" Typically used for: Keyword matching Invoice Number matching Vendor Name matching* Employee Name matching Can be simple or complex Completely depends on your approach and desired accuracy * focus for this presentation

Overview Simple Match Examples: Exact or 100% match "ACL" = "ACL" Force Upper or Lower case "ACL" = UPPER("acl") "acl" = LOWER("ACL") Removal of special characters "ACL" = EXCLUDE("*ACL." "!@#$%^&*().") Only compare numbers or letters "ACL" = INCLUDE(UPPER("ACL123") "ABCDEFGHIJKLMNOPQRSTUVWXYZ") "123" = INCLUDE("ACL123" "1234567890")

Overview Complex Match Examples: Removal of company type indicators (LLC, INC, LTD, etc) "ACL Services Ltd." = "ACL Services" Percent of word match AKA letter by letter "ACL Services" "ACL Service" 11/12 character match or 91.6% match Word by Word* "ACL Services" "ACL Champions" "ACL" "ACL" "Services" "Champions" = 50% match Levenshtein distance Sounds like NYSIIS *Most used by ACL Consultants

Vendor Master Analysis

Vendor Master Analysis Fuzzy Duplicates on Vendor Name Possible Risk Payments are being sent to more than one vendor May not involve risk. The desire can be to normalize the vendor master list to ensure that duplicates do not exist. Ideally, one unique vendor should exist in your vendor master list with one or more address records in your vendor address table

Vendor Master Analysis Sample file contains 75 vendors Only Vendor Code and Vendor Name Where do you start for Vendor Name matching? Look for exact duplicates Focus on Simple matching Sort or Summarize!

Vendor Master Analysis Step 1: Summarize your Vendor Master File Choose Vendor Name as your key field Add Vendor Code as the Other Fields for Summarizing Be sure to check "Presort"

Vendor Master Analysis Step 2: Quickly comb over the data to identify a common trend. We will focus on this issue, in the sample data: Create a computed field that corrects the trend (or cleans the data).

Vendor Master Analysis Functions used in Default Value text box: INCLUDE(UPPER(ALLTRIM(Vendor_Name)) 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') Within ACL, the computed field will return the following:

Vendor Master Analysis Step 3: Perform a Duplicates Command on the computed field

Vendor Master Analysis Results are as follows:

FUZZYDUP command ACL 9.3 has new features that make Fuzzy Duplicate analysis easier FUZZYDUP command OMIT() function ISFUZZYDUP() function LEVDIST() function Important parameters to understand Levenshtein Distance Difference Percentage

FUZZYDUP command Syntax Example Levenshtein Distance (LEVDISTANCE) FUZZYDUP ON {key_field} <OTHER fields> {LEVDISTANCE value} <DIFFPCT value><RESULTSIZE value> <EXACT> TO table_name Example FUZZYDUP ON Vendor_Name OTHER ALL LEVDISTANCE 2 DIFFPCT 50 TO My_Results Levenshtein Distance (LEVDISTANCE) The number of edits required to make the strings equal EX: "Smith" and "Smythe" have a Levenshtein Distance of 2 Difference Percentage (DIFFPCT) The threshold for percentage difference between two strings EX: "Smith" and "Smythe" have a Percentage Difference of 40% (2/5) * 100%

OMIT() Function When Do I use OMIT()? Syntax Example When you want to refine fuzzy duplicate analysis Look for repeating strings you want to remove from your Vendor Name field Syntax OMIT(string1, string2 <,case_sensitive>) Specify T to make substrings specified for removal case-sensitive, or F to ignore case Example OMIT(Vendor_Name " Ltd, Inc, Corp, Corporation" F)

Script Editor and RECOFFSET

Implementation Consultant Contact Information Kevin Legere Implementation Consultant ACL Services Ltd. 1550 Alberni Street, Vancouver, BC, Canada V6G 1A5 kevin_legere@acl.com | @aclkevin www.acl.com/linkedin | www.acl.com/twitter | www.acl.com/facebook