Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS395: Internship in Computing

Similar presentations


Presentation on theme: "CS395: Internship in Computing"— Presentation transcript:

1 CS395: Internship in Computing
Department of Natural Resources

2 Project Overview Analyze and research the Customer database
Write software to find duplicate Customer records

3 The Cube

4 More Cube

5 Customer table

6 Views

7 More Views

8 First Attempt

9 The Problem Missing Data Abbreviations Misspellings

10 The Solution Wildcards Standardizing Fuzzy Matching

11 Wildcards

12 Standardizing

13 Fuzzy Matching Levenshtein (Edit) Distance Soundex

14 Levenshtein Distance

15 Soundex

16 New Results

17 Requests from the PIC Display the associated numbers that are related to a given customer_number Mark associated numbers that are owned by a given customer_number Mark whether a customer_number has been loaded into the Land Administration System (LAS).

18 Sequencing

19 More Sequencing

20 SQL+ Report

21 Company and Government Records
One long name No date_of_birth, ssn_last_four, etc. Ordered by street_line_1

22 Final Numbers 2,470 out of 77,610 for individual
566 out of 7,563 for company 164 out of 759 for goverment

23 Lessons learned Oracle / SQL Asking for advice / information
Yuban coffee has low acidity levels


Download ppt "CS395: Internship in Computing"

Similar presentations


Ads by Google