Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jared Kuehn – Skyline Technologies

Similar presentations


Presentation on theme: "Jared Kuehn – Skyline Technologies"— Presentation transcript:

1 Jared Kuehn – Skyline Technologies
When Low-Quality Data Strikes: Fuzzy Tools Provide clarity in Matching and deduplication Jared Kuehn – Skyline Technologies

2 About me Likes BLTs Male pattern baldness for a theater production
Weird Al is my hero I like hats My daughter is adorable My dog is fuzzy

3 This tangent is too divergent
Let’s get to our topic!

4 Today’s agenda What is Fuzzy logic?
What are the typical matching approaches? Let’s see it in action! Demo, demo, demo!

5 What is Fuzzy logic? Stock photo I found online that clearly displays my point…kind of -Taking two pieces of information and identifying a match based on how similar they are.

6 Case study!!! Two datasets of people for your data warehouse
Both contain names and demographic information One comes from your company’s main application Already in the warehouse. High-quality, managed well The other comes from a new application Data has been identified as low-quality Typos, blank fields, varied formatting A person can exist in both lists Goal is to merge the two lists into one master person dataset for your warehouse Minimize the number of duplicates without finding bad matches Here’s a second bullet point because I couldn’t think of a second point and I learned in high school that having only one sub bullet point is frowned upon

7 Approaches to matching
Exact Match Fuzzy Match Manual Match Match Game

8 Exact Match Define columns that you want to compare
Data in columns must match exactly to find matching records Strict rules result in more confidence in matches Can define multiple rules

9 Fuzzy Match Define which columns you want to compare
Find matches based on similarity Faster to set up for complex, low-quality scenarios Better at handling low-quality data

10 Manual match Trust the human brain to find accurate matches
Can account for any number of variances in data Most accurate form of matching

11 Still there? Good, cause it’s Demo time!!!!

12 Which Approach or Tool do I pick?
How much time do you want to invest in finding accurate matches? What resources are available for you to use? Business users? Yet another second bullet point with no information. I really need to be better about this. Oh no, I did it again…

13 Fuzzy tool options I know of
SQL Server Integration Services (SSIS) Versions 2005 and later Fuzzy Lookup and Fuzzy Grouping SQL Server Full Text Search Analyzes character patterns and linguistics Restricted to only text data Allows configuration for specific languages CLR functions Data Quality Services (DQS) and Master Data Services (MDS) DQS - Versions 2012 and later MDS – Versions 2008 R2 and later Engaging business users Business user friendly? Fuzzy Lookup for Excel Add-In (

14 Final thoughts Fuzzy logic is another tool that you can use. But it's still a tool Don't hammer a nail with a screwdriver Also, I need to improve my use of sub bullet points If you want to try it, plan some time to experiment with it Useful information to follow up on My Skyline blogs ( Fuzzy Lookup Excel Add-In ( us/download/details.aspx?id=15011) Check SQL Saturday website for script/SSIS packages

15 When your memory is fuzzy, stay fuzzy!


Download ppt "Jared Kuehn – Skyline Technologies"

Similar presentations


Ads by Google