Probabilistic Record Linkage in Genealogical Research John Lawson, Dave White, Brenda Price and Ryan Yamagata Introduction Description of Probabilistic.

Slides:



Advertisements
Similar presentations
1 Probabilistic Linkage: Issues and Strategies Craig A. Mason, Ph.D. University of Maine
Advertisements

So You Want To Know Your Ancestors
Organisation Of Data (1) Database Theory
Reconstructing historical populations from genealogical data An overview of methods used for aggregating data from GEDCOM files Corry Gellatly Department.
Ottawa Ontario Stake Family History Centre By Shirley-Ann Pyefinch, Director The Church of JESUS CHRIST of Latter-day Saints.
Introduction to Jewish Genealogy: Finding Your Roots Roots on the Road Created by Myra Rothenberg, Phyllis Grossman and Brad Fanta, from the Jewish Genealogical.
Wisconsin Department of Health Services Richard Miller Research Scientist Wisconsin Office of Health Informatics October 28, 2014 Matching Traffic Crash.
Brenda Cherry Barney Levantino Syosset Public Library September 22, 2010.
The New FamilySearch September 2008 New FamilySearch Announcement “One of the most troublesome aspects of our temple activity is…duplication of effort.
Bosna i Hercegovina Agencija za statistiku Bosne i Hercegovine Bosna i Hercegovina Agencija za statistiku Bosne i Hercegovine Post-enumeration Survey-A.
Using ICD Codes and Birth Records to Prevent Mismatches of Multiple Births in Linked Hospital Readmission Data Alison Fraser 1, MSPH, Zhiwei Liu 2, MS,
ESSnet DI WP2: Record Linkage Luca Valentino Istat.
Pregnancy-associated Crashes and Birth Outcomes: Linking birth/fetal death records to motor vehicle crash data Lisa Hyde, Larry Cook Lenora Olson, Hank.
Graph Analysis Matching Program Burdette Pixton. Record Linkage Object Identification Problem Identifies possible links in pedigrees Advantages Compress.
Capturing Sensitive Data & Data Linkage. Capturing Sensitive Data Data Protection Act 1998 (Section 33) – Allows data to be used for research purposes.
APHA, Nov Improving the linkage of deliveries over time using vital records and hospital discharge data Mark McLaughlin, Judy Weiss, ScD, Milton.
© 2007 John M. Abowd, Lars Vilhuber, all rights reserved Introduction to Record Linking John M. Abowd and Lars Vilhuber April 2007.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
Michigan Newborn Screening & Live Births Records Linkage and Follow-Up of Potentially Un-Screened Infants Steven J. Korzeniewski, MA, MSc, Maternal & Child.
When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto 05/06/2014 Joint work with Loredana Di Consiglio.
DEBRA A. HOFFMAN 4 October 2014 Grow Your Family Tree.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
© 2007 John M. Abowd, Lars Vilhuber, all rights reserved An application of probabilistic matching Abowd and Vilhuber (2004), JBES.
© John M. Abowd and Lars Vilhuber 2005, all rights reserved Introduction to Probabilistic Record Linking John M. Abowd and Lars Vilhuber March 2005.
The Census Data Enhancement Project Glenys Bishop.
CENTER FOR SOCIAL SERVICES RESEARCH School of Social Welfare, UC Berkeley Race/Ethnic Disparities in Child Welfare New Research Synthesis from Fluke et.
CENTER FOR SOCIAL SERVICES RESEARCH School of Social Welfare, UC Berkeley Black/White and Black/Hispanic Racial Disparity in Child Welfare: Controlling.
Organize Your Research Using Your Data Management Program- Legacy, Roots Magic, Ancestral Quest Marilyn Thomsen July 25, 2012.
In the puzzle of history, archives are the clues. records created by individuals, institutions and governments. the institutions that preserve them. Archives.
 Millions of people are interested in genealogy, most for personal interest.  When searching for information researchers' need to remember quality.
Genealogy Online: Discover Your Family History. Federal Census Records  Every ten years since 1790, the U.S. government has conducted a census of each.
How the Computer and the Internet Have Changed Genealogical Research Larry D. Crummer Lib15, Spring 2004 Joy Chase, Instructor.
Economics and Statistics Administration U.S. CENSUS BUREAU U.S. Department of Commerce Comparing IRS Exemptions to 2010 Census Population Counts Esther.
Improving Data Quality and Quality Assurance in Newborn Screening by Including the Bloodspot Screening Collection Device Serial Number on Birth Certificates.
What is a database? An organized collection of data. This can be in an electronic, paper, or other format. Types of databases Operational -constantly changing.
Introduction to Record Linking John M. Abowd and Lars Vilhuber April 2011 © 2011 John M. Abowd, Lars Vilhuber, all rights reserved.
Collaborative Research Assistant 2007 Family History Technology Conference John Finlay Christopher Stolworthy Daniel Parker.
Monthly APCD User Workgroup Webinar April 22, 2014.
Identity in the Census Finding people in more than one.
4
Using Pre-1850 Census to Find Family Relationships Jean Nudd, Archivist NARA Northeast Region 10 Conte Drive Pittsfield, MA
The Use of Administrative Sources for Statistical Purposes Matching and Integrating Data from Different Sources.
CENTER FOR SOCIAL SERVICES RESEARCH School of Social Welfare, UC Berkeley Black/White Racial Disparity in Child Welfare: Findings from Linkages to Birth.
Software. Records Fields Each record is made up of fields – categories of information. The fields here are Name, Surname, Address, Telephone and Date.
Psychology Psychology of Marriage Divorce/Qualities of a Successful Marriage a We have used the number of marriages per 1,000 unmarried women age.
The relationship between error rates and parameter estimation in the probabilistic record linkage context Tiziana Tuoto, Nicoletta Cibella, Marco Fortini.
4 FamilyHistory.com is a member of Ancestry.com View as a Free Front End to Ancestry.com.
Research Cycle 5 Basic Steps. Known Family Information - Contact relatives and extended family members. - Contact other researchers. Organize - Set up.
 Obtain information from family members.  Living family members (immediate and extended).  4 generation pedigree chart starting with yourself.  Back.
Surname:Brown Forename:James Form:7B Date of Birth: Telephone:
The Conditional Independence Assumption in Probabilistic Record Linkage Methods Stephen Sharp National Records of Scotland Ladywell Road Edinburgh EH12.
CONCEPTS AND TECHNIQUES FOR RECORD LINKAGE, ENTITY RESOLUTION, AND DUPLICATE DETECTION BY PETER CHRISTEN PRESENTED BY JOSEPH PARK Data Matching.
Assessing SES differences in life expectancy: Issues in using longitudinal data Elsie Pamuk, Kim Lochner, Nat Schenker, Van Parsons, Ellen Kramarow National.
 Identify What You Know  Begin with personal records :  Gather information, using family group sheets and pedigree charts to organize what is known.
1 ACS Statistical Issues and Challenges: One-, Three-, and Five-year Period Estimates Alfredo Navarro U.S. Census Bureau Association of Professional Data.
World vital Records. com Marilyn Thomsen. How to access the free online databases through your library card In your internet browser type: Orem Library.
DEATH RECORDS. DEATH CERTIFICATES BURIAL INDEXES CEMETERY RECORDS MORTUARY RECORDS.
Using Family Search for Finding Names for Your Family By Elder Richard O. Boen Layton Valley View Family History Center Layton, Utah.
An Introduction to Your Ancestors GENEALOGY 101. Pulling your ancestors out of the tree... Does this look like you trying to find your ancestors?
United Nations Sub-Regional Workshop on Census Data Evaluation Phnom Penh, Cambodia, November 2011 Evaluation of Internal Migration Data Collected.
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
Retrospective Chart Reviews: How to Review a Review Adam J. Singer, MD Professor and Vice Chairman for Research Department of Emergency Medicine Stony.
Research Cycle 5 Basic Steps. Known Family Information - Contact relatives and extended family members. - Contact other researchers. Organize - Set up.
Pleiades Software Development, Inc. Automatic Merging of Pedigree Information Annual Workshop on Family History Technology April 3, 2003 Sue Dintelman.
Is retention on ART underestimated due to patient transfers
Introduction to Probabilistic Record Linking
Church Resources and GEDCOMs
Lecture 9: Entity Resolution
How to Accomplish Your Original Research
DATABASES Surname: Brown Forename: James Form: 7B
Presentation transcript:

Probabilistic Record Linkage in Genealogical Research John Lawson, Dave White, Brenda Price and Ryan Yamagata Introduction Description of Probabilistic Record Linkage Applications to Quaker Records in N.C. Future Directions Agenda

Introduction Census Records Birth Records Marriage Records Death Records Church Records Immigration Records Wills More Complete Information about an Individual Deeds

Information Age Credit Records Medical Records Stored Electronically, for Quick Recall and Search Introduction

Genealogical Records No Identifier Field such as SSN Different Spellings or nicknames Misreported Dates or day, month, year interchanges Missing information Other Errors Introduction

Probabilistic Record Linkage We Will Describe the Approach and show its application to Genealogical Research Adapted by Church of Jesus Christ of Latter Day Saints Family History Department in TempleReady TM

Probabilistic Record Linkage History Dunn Introduces Concept 1959 – Newcomb et. al. – linked vital records 1960’s – Development Theoretical Foundations Du Boise Nathan Tepping Fellegi and Sunter Recently Computer Software CAMLINK, CAMLIS, LinkPro

Probabilistic Record Linkage Methodology Record Consists of Fields When Comparing Two Records each compared field receives a weight + if fields agree - if fields are different 0 if field from one or both record is missing Decision on whether two fields should be linked is based on the sum of the weights “Score” over all fields compared Link, Do not Link, Undetermined

Probabilistic Record Linkage Methodology Calculating the Weights: Using Bayes Rule

Probabilistic Record Linkage Methodology P(e i ) can be estimated using sample pairs P(e i |M) can be calculated from a known set of matches P(M) is constant for all comparisons

Probabilistic Record Linkage The Weights

Probabilistic Record Linkage The Scores Blocking

Lower Threshold Upper Threshold Score = Probabilistic Record Linkage

Application to Genealogical Research The Data: Church (Quaker Congregation) and County Records Perquimans and Pasquotank Counties, NC 1600 to 1900 Births, Deaths, Marriages, and minutes of town meeting 9279 Individual records

Application to Genealogical Research Benjamin C. Winslow, s. William & Julian, b , Chowan Co. Esther P. Winslow. (dt. Silas & Elizabeth Chappell, b , Chowan Co.) Ch:Harriett Annb William W.“ James Claudius“ Ora Henry Laden. 1880, 8, 7. Sarah (form Winslow) rpd m. (not m in mtg). George Durant son of George & Ann Durant was borne the 24 th December 1659 Records from Town Meeting Minutes: Birth Record:

Application to Genealogical Research Records entered manually into PAF GEDCOM file created from PAF Visual Basic Program: GEDCOM Flat File SAS (Statistical Analysis System) RIN’s MRIN’s Flat File 9279 records

Application to Genealogical Research 9279 Total Records = 43,045,281 pairwise comparisons Blocking by Surname and Sex: 1875 Records with no Surname 7404 Records remaining = 220,931 pairwise comparisons 2118 matches 218,813 non-matches Blocking by Surname only treated no surname together in one block 9279 total records 1,961,004 pairwise comparisons 3692 matches 1,957,312 non-matches

Application to Genealogical Research Matches: 1.65% misclassified, 17.52% unclassified Non-Matches: 1.87% misclassified, 7.71% unclassified

Application to Genealogical Research Matches: 4.96% misclassified Non-Matches: 2.39% misclassified

The Future For Our Research Extend Visual Basic Program RIN’s MRIN’s Expand Weighting Possibilities Obtain More Data Build Library of Weights