Graph Analysis Matching Program Burdette Pixton. Record Linkage Object Identification Problem Identifies possible links in pedigrees Advantages Compress.

Slides:



Advertisements
Similar presentations
Algorithm Design Techniques
Advertisements

1 Probabilistic Linkage: Issues and Strategies Craig A. Mason, Ph.D. University of Maine
Recursive Noisy-OR Authors : Lemmer and Gossink. 2 Recursive Noisy-Or Model A technique which allows combinations of dependent causes to be entered and.
Reconstructing historical populations from genealogical data An overview of methods used for aggregating data from GEDCOM files Corry Gellatly Department.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Conceptual Clustering
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
BY ANISH D. SARMA, XIN DONG, ALON HALEVY, PROCEEDINGS OF SIGMOD'08, VANCOUVER, BRITISH COLUMBIA, CANADA, JUNE 2008 Bootstrapping Pay-As-You-Go Data Integration.
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Wisconsin Department of Health Services Richard Miller Research Scientist Wisconsin Office of Health Informatics October 28, 2014 Matching Traffic Crash.
Data Mining Classification: Alternative Techniques
Sampling Mathsfest Why Sample? Jan8, 2003 Air Midwest Flight 5481 from Douglas International Airport in North Carolina stalled after take off, crashed.
Using ICD Codes and Birth Records to Prevent Mismatches of Multiple Births in Linked Hospital Readmission Data Alison Fraser 1, MSPH, Zhiwei Liu 2, MS,
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
Data Quality Class 10. Agenda Review of Last week Cleansing Applications Guest Speaker.
Data Mining Techniques Outline
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Threshold Voltage Assignment to Supply Voltage Islands in Core- based System-on-a-Chip Designs Milestone 1: Gall Gotfried Steven Beigelmacher
Locality Optimizations in OceanStore Patrick R. Eaton Dennis Geels An introduction to introspective techniques for exploiting locality in wide area storage.
Chapter 10: Algorithm Design Techniques
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Architecture Specialists in Computer Architecture Technology Corporation Routing Optimization Techniques for Wireless Ad Hoc Networks Maher Kaddoura, Ranga.
Recognizing Records from the Extracted Cells of Genealogical Microfilm Tables Kenneth Martin Tubbs Jr. A Thesis Submitted to the Faculty of Brigham Young.
LOGICAL DATABASE DESIGN
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Fractal Image Compression By Cabel Sholdt and Paul Zeman.
Copyright 2010, The World Bank Group. All Rights Reserved. PROCESSING, Part 1 Data capture, editing, imputation and tabulation Quality assurance for census.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Identity in the Census Finding people in more than one.
1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.
- Darshana Pathak - Dr. Hye-Chung Kum.  Overview  Entity resolution process  About Framework  Configuration file  Class Details  How to …  Future.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Grouping search-engine returned citations for person-name queries Reema Al-Kamha, David W. Embley (Proceedings of the 6th annual ACM international workshop.
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
NEURAL NETWORKS FOR DATA MINING
Evolving Virtual Creatures & Evolving 3D Morphology and Behavior by Competition Papers by Karl Sims Presented by Sarah Waziruddin.
CONCEPTS AND TECHNIQUES FOR RECORD LINKAGE, ENTITY RESOLUTION, AND DUPLICATE DETECTION BY PETER CHRISTEN PRESENTED BY JOSEPH PARK Data Matching.
Applications of Dynamic Programming and Heuristics to the Traveling Salesman Problem ERIC SALMON & JOSEPH SEWELL.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS4432: Database Systems II Query Processing- Part 2.
Probabilistic Record Linkage in Genealogical Research John Lawson, Dave White, Brenda Price and Ryan Yamagata Introduction Description of Probabilistic.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Methods and software for editing and imputation: recent advancements at Istat M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.
Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
An Evolutionary Algorithm for Neural Network Learning using Direct Encoding Paul Batchis Department of Computer Science Rutgers University.
Pleiades Software Development, Inc. Automatic Merging of Pedigree Information Annual Workshop on Family History Technology April 3, 2003 Sue Dintelman.
MEDICAL RECORD BROKER -LAVANYA GUNDAMARAJU Introduction Introduction n Database and database systems have become an essential part of everyday life.
Sampling procedures for assessing accuracy of record linkage Paul A. Smith, S3RI, University of Southampton Shelley Gammon, Sarah Cummins, Christos Chatzoglou,
Introduction to Algorithms: Divide-n-Conquer Algorithms
Graphcut Textures:Image and Video Synthesis Using Graph Cuts
Linear Regression.
Chapter 8 An Improvement of Bottom-Up Variable-Sized Block Matching Technique for Video Compression.
Results for p = 0.1 Graphs at different values of Call Arrival Rate for Call Blocking Probability (in %) System Utilization (Efficiency) (in %) Average.
Statistics 1: Elementary Statistics
Clustering.
A* Path Finding Ref: A-star tutorial.
Fractal Image Compression
Low Depth Cache-Oblivious Algorithms
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Statistical Relational AI
Pnina ZADKA Central Bureau of Statistics Israel
Pnina ZADKA Central Bureau of Statistics Israel
Presentation transcript:

Graph Analysis Matching Program Burdette Pixton

Record Linkage Object Identification Problem Identifies possible links in pedigrees Advantages Compress search results Merge pedigrees Discover missing Information

Record Linkage Manual Time consuming Error prone N 2 Blocking n 2

Record Linkage Current Approaches Naïve approach Deterministic algorithms Probabilistic Algorithms

Record Linkage Name: James Paul DOB: 7/4/1804 POB: England DOD: 8/15/1845 Parents: Howard Paul Mary Jones Name: J. Paul DOB: 7/11/1804 POB: England DOD: 8/15/1845 Parents: H. Paul Mary Jones Sex: M Children: Lucy Paul

Record Linkage Problems with Current Standards Uses Probabilistic Record Linkage Formula Weights and thresholds are 10 years old Depends on attributes of one record Does not completely solve missing fields problem

Record Linkage Name: James Paul DOB: 7/4/1804 POB: England DOD: 8/15/1845 Parents: Howard Paul Mary Jones Name: J. Paul DOB: 7/11/1804 POB: England DOD: 8/15/1845 Parents: H. Paul Mary Jones Sex: M Children: Lucy Paul

Thesis Statement Graph-matching can enhance current record linkage techniques to find a smaller set of possible matches and have high precision.

GRAMP - Overview Uses Multiple records Transverse two graphs in parallel Continue until no more links Compare related nodes to each other to get measurement

Record Linkage Name: James Paul DOB: 7/4/1804 POB: England DOD: 8/15/1845 Parents: Howard Paul Mary Jones Name: J. Paul DOB: 7/11/1804 POB: England DOD: 8/15/1845 Parents: H. Paul Mary Jones Sex: M Children: Lucy Paul

Record Linkage Name: Howard Paul DOB: 2/4/1789 POB: England DOD: 1/13/1815 Parents: Louis Paul ?? Name: H. Paul DOB: 2/4/1789 POB: England DOD: 8/15/1845 Parents: Louis Paul Michelle P. Sex: M Children: James PaulChildren: J Paul

GRAMP Determine probable matches Weakness is decreased Keep those with potential Transverse the graph Determine relationships Compute similarity matches against both sets Recursive calls Combine Measurements For each node in graph

GRAMP Testing Records with Errors Records without Errors Random set of Records Expected Results Do similar or better, smaller blocks Slow

Contributions Provides a useful tool for genealogical, census, and statistical programs An algorithm which matches objects utilizing surrounding nodes Offers a different approach to the object identity problem

Questions/Comments