Download presentation
Presentation is loading. Please wait.
Published byPatience Theodora Daniels Modified over 9 years ago
1
Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm
2
Problem Description A single entity may be referenced in separate records in textually dissimilar ways. For example “Robert” and “Bob”. Traditional text similarity functions such as edit distance and jaccard coefficient cannot handle these cases. Current research is looking at string transformation databases. These databases can be extremely large.
3
Problem Description
4
Solution: Definitions Rule Application Example: {Olathe → Olathe, 7, 4} Alignment Rule applications cannot overlap Order does not matter Coverage
5
Solution: Algorithm
10
Record Matching Application Generating Example Pairs Traditional text matching methods are used (such as jaccard coefficient). Input from domain experts could also be considered but this is expensive. A few incorrect pairs will not effect the end result. Validation of Transformations All approaches involve confirmation by a domain expert.
11
Analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.