Download presentation
Presentation is loading. Please wait.
1
CS395: Internship in Computing
Department of Natural Resources
2
Project Overview Analyze and research the Customer database
Write software to find duplicate Customer records
3
The Cube
4
More Cube
5
Customer table
6
Views
7
More Views
8
First Attempt
9
The Problem Missing Data Abbreviations Misspellings
10
The Solution Wildcards Standardizing Fuzzy Matching
11
Wildcards
12
Standardizing
13
Fuzzy Matching Levenshtein (Edit) Distance Soundex
14
Levenshtein Distance
15
Soundex
16
New Results
17
Requests from the PIC Display the associated numbers that are related to a given customer_number Mark associated numbers that are owned by a given customer_number Mark whether a customer_number has been loaded into the Land Administration System (LAS).
18
Sequencing
19
More Sequencing
20
SQL+ Report
21
Company and Government Records
One long name No date_of_birth, ssn_last_four, etc. Ordered by street_line_1
22
Final Numbers 2,470 out of 77,610 for individual
566 out of 7,563 for company 164 out of 759 for goverment
23
Lessons learned Oracle / SQL Asking for advice / information
Yuban coffee has low acidity levels
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.