Joseph S. Park and David W. Embley Brigham Young University

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Extracting Names Using Layout Clues in Genealogical Books Aaron Stewart David W. Embley March 20, 2010.
Finding Genealogy Facts with Linguistic Analysis Peter Lindes, Deryle W. Lonsdale, David W. Embley Brigham Young University © 2014 Peter Lindes 3/19/2014PL.
Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University.
Automating the Extraction of Genealogical Information from Historical Documents Aaron P. Stewart David W. Embley March 20, 2011.
FOCIH: Form-based Ontology Creation and Information Harvesting Cui Tao, David W. Embley, Stephen W. Liddle Brigham Young University Nov. 11, 2009 Supported.
Enabling Search for Facts and Implied Facts in Historical Documents David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Spencer Machado, Thomas Packer,
Named Entity Recognition for Digitised Historical Texts by Claire Grover, Sharon Givon, Richard Tobin and Julian Ball (UK) presented by Thomas Packer 1.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding,
LDK R Logics for Data and Knowledge Representation Description Logics as query language.
Deryle W. Lonsdale, David W. Embley, Stephen W. Liddle, and Joseph Park BYU Data Extraction Research Group.
Knowledge based Learning Experience Management on the Semantic Web Feng (Barry) TAO, Hugh Davis Learning Society Lab University of Southampton.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Logics for Data and Knowledge Representation
FROntIER: A Framework for Extracting and Organizing Biographical Facts in Historical Documents Joseph Park.
Proposal for Synergistic Name Extraction from Historical Text Documents.
Discussions for oneM2M Semantics Standardization Group Name: WG5 Source: InterDigital Communications Meeting Date: Agenda Item: WI-0005 ASN/MN-CSE.
A Web of Knowledge for Historical Documents David W. Embley.
Joseph Park Brigham Young University.  Motivation.
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: how to cost-effectively extract Extraction.
Populating A Knowledge Base From Text Clay Fink, Tim Finin, Christine Piatko and Jim Mayfield.
Soar and Construction Grammar Peter Lindes, Deryle Lonsdale, David Embley Brigham Young University 2014 Soar Workshop © 2014 Peter Lindes 6/19/2014PL 2014.
Ontology-based Information Extraction with a Cognitive Agent Peter Lindes 1, Deryle Lonsdale, David Embley Brigham Young University AAAI Now at.
Bootstrapping Regular-Expression Recognizer to Help Human Annotators Tae Woo Kim.
FROntIER: Fact Recognizer for Ontologies with Inference and Entity Resolution Joseph Park, Computer Science Brigham Young University.
Semanntic Web Exercises. XML-exercises (1) 1.Give an XML-document (by not using attributes), which includes the information that the first name of a person.
Cost-Effective Information Extraction from Lists in OCRed Historical Documents Thomas Packer and David W. Embley Brigham Young University FamilySearch.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
“Automating Reasoning on Conceptual Schemas” in FamilySearch — A Large-Scale Reasoning Application David W. Embley Brigham Young University More questions.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
OntoSoar: Soar Finds Facts in Text Peter Lindes, Deryle Lonsdale, David Embley Brigham Young University 33 rd Soar Workshop, June 2013 pl 6/6/201333rd.
Personalized Recommendation of Related Content Based on Automatic Metadata Extraction Andreas Nauerz 1, Fedor Bakalov 2, Birgitta.
ECIMF meeting, Paris Interoperability through semantic labeling with context Andrzej Bialecki.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Veteran-Surname: Deane Veteran-Given-name: William J Company: A Application-Number: WC 4 Relationship: Widow Birth-Year-of-Age: Not provided on this document.
GreenFIE-HD: A “Green” Form-based Information Extraction Tool for Historical Documents Tae Woo Kim.
David W. Embley Brigham Young University Provo, Utah, USA.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Extracting and Organizing Facts of Interest from OCRed Historical Documents Joseph Park, Computer Science Brigham Young University.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: cost-effective extraction Extraction tools.
Extracting Data Automatically from Scanned Books with OntoSoar
Dependency Management
 Corpus Formation [CFT]  Web Pages Annotation [Web Annotator]  Web sites detection [NEACrawler]  Web pages collection [NEAC]  IE Remote.
Introduction to the Semantic Web (tutorial) 2009 Semantic Technology Conference San Jose, California, USA June 15, 2009 Ivan Herman, W3C
Presented by: Hassan Sayyadi
A Web of Knowledge for Family History (Research Directions)
David W. Embley Brigham Young University Provo, Utah, USA
Stephen W. Liddle, Deryle W. Lonsdale, and Scott N. Woodfield
Vision for an Automatically Constructed FH-WoK
D-Dupe-like Mary Ely Example
GreenFIE-HD: A Form-based Information Extraction Tool for Historical Documents Tae Woo Kim There are thousands of books that contain rich genealogical.
Automating Schema Matching for Data Integration
[jws13] Evaluation of instance matching tools: The experience of OAEI
COMPUTATIONAL PROCESS REPRESENTATION IN A KNOWLEDGE BASE
(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.
Logics for Data and Knowledge Representation
Temple Ready within an Hour of Collection Capture
Grant Number: IIS Institution of PI: Brigham Young University PI’s: David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale Title:
Data Provenance.
A Green Form-Based Information Extraction System for Historical Documents Tae Woo Kim No HD. I’m glad to present GreenFIE today. A Green Form-…
Sheet 10 RDF (2).
Joseph Park Brigham Young University
Extraction Rule Creation by Text Snippet Examples
Joseph Park Brigham Young University
Presentation transcript:

Joseph S. Park and David W. Embley Brigham Young University Extracting and Organizing Facts of Interest from OCRed Historical Documents Joseph S. Park and David W. Embley Brigham Young University

Motivation Large collection of scanned, OCRed books Stated facts Implied facts Inferred Same-as (entities)

Stated Facts of Interest William Gerard Lathrop married Charlotte Brackett Jennings in 1837 is the son of Mary Ely was born in 1812

Inferred Facts of Interest William Gerard Lathrop has gender Male Maria Jennings has gender Female Maria Jennings has surname Lathrop

Same-as (Entities) William Gerard Lathrop Gerard Lathrop Mary Ely

FROntIER OntoES Jena reasoner Duke RDF output parameters comparators <owl:Thing rdf:ID="Person_1"/> <rdf:type rdf:resource="&family;Spouse"/> <rdf:type rdf:resource="&family;Child"/> <owl:Thing rdf:about="#Person_1"> <rdf:type rdf:resource="&family;Person"/> <rdf:type rdf:resource="&family;Name"/> <owl:Thing rdf:about="#Name_1"> <owl:Thing rdf:ID="Name_1"/> </owl:Thing> <rdf:type rdf:resource="&family;GivenName"/> <owl:Thing rdf:about="#GivenName_1"> <owl:Thing rdf:ID="GivenName_1"/> <family:GivenNameValue rdf:datatype="&xsd;string">Abigail</family:GivenNameValue> <rdf:Description rdf:about="#GivenName_1"> </rdf:Description> <owl:Thing rdf:about="#GivenName_2"> <owl:Thing rdf:ID="GivenName_2"/> <family:GivenNameValue rdf:datatype="&xsd;string">Huntington</family:GivenNameValue> <rdf:Description rdf:about="#GivenName_2"> <owl:Thing rdf:about="#Surname_1"> <owl:Thing rdf:ID="Surname_1"/> <family:SurnameValue rdf:datatype="&xsd;string">Lathrop</family:SurnameValue> <rdf:Description rdf:about="#Surname_1"> <rdf:type rdf:resource="&family;Surname"/> <owl:Thing rdf:about="#BirthDate_1"> <rdf:type rdf:resource="&family;BirthDate"/> <owl:Thing rdf:ID="BirthDate_1"/> <rdf:Description rdf:about="#BirthDate_1"> <owl:Thing rdf:about="#MarriageDate_1"> <owl:Thing rdf:ID="MarriageDate_1"/> <family:BirthDateValue rdf:datatype="&xsd;date">1810-00-00</family:BirthDateValue> <rdf:type rdf:resource="&family;MarriageDate"/> <family:MarriageDateValue rdf:datatype="&xsd;date">1835-00-00</family:MarriageDateValue> <rdf:Description rdf:about="#MarriageDate_1"> <owl:Thing rdf:ID="Gender_1"/> <rdf:type rdf:resource="&family;Gender"/> <owl:Thing rdf:about="#Gender_1"> <family:GenderValue rdf:datatype="&xsd;string">Female</family:GenderValue> <rdf:Description rdf:about="#Gender_1"> <family:Person-BirthDate rdf:resource="#BirthDate_1"/> <rdf:Description rdf:about="#Person_1"> <family:Person-Name rdf:resource="#Name_1"/> <family:Person-Child rdf:resource="#Person_6"/> <family:Person-Child rdf:resource="#Person_5"/> <family:Person-Gender rdf:resource="#Gender_1"/> <rdf:Description rdf:about="#Name_1"> <family:Name-Surname rdf:resource="#Surname_1"/> <family:Name-GivenName rdf:resource="#GivenName_2"/> <family:Name-GivenName rdf:resource="#GivenName_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-Person rdf:resource="#Person_1"/> <rdf:Description rdf:about="#PersonMarriageDateMarriagePlaceSpouse_1"> <owl:Thing rdf:ID="PersonMarriageDateMarriagePlaceSpouse_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-MarriageDate rdf:resource="#MarriageDate_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-Spouse rdf:resource="#Person_4"/> RDF output Jena reasoner OntoES convert to RDF comparators parameters owl:sameAs convert to csv Duke [(?x rdf:type source:Person) -> (?x rdf:type target:Person)] [(?x source:Person-Name ?n),(?n source: NameValue ?nv), isMale(?nv),makeTemp(?gender) -> (?x target:Person-Gender ?gender),(?gender rdf:type target:Gender), (?gender target:GenderValue `Male'^^xsd:string)]

FROntIER OntoES Jena reasoner Duke RDF output parameters comparators <owl:Thing rdf:ID="Person_1"/> <rdf:type rdf:resource="&family;Spouse"/> <rdf:type rdf:resource="&family;Child"/> <owl:Thing rdf:about="#Person_1"> <rdf:type rdf:resource="&family;Person"/> <rdf:type rdf:resource="&family;Name"/> <owl:Thing rdf:about="#Name_1"> <owl:Thing rdf:ID="Name_1"/> </owl:Thing> <rdf:type rdf:resource="&family;GivenName"/> <owl:Thing rdf:about="#GivenName_1"> <owl:Thing rdf:ID="GivenName_1"/> <family:GivenNameValue rdf:datatype="&xsd;string">Abigail</family:GivenNameValue> <rdf:Description rdf:about="#GivenName_1"> </rdf:Description> <owl:Thing rdf:about="#GivenName_2"> <owl:Thing rdf:ID="GivenName_2"/> <family:GivenNameValue rdf:datatype="&xsd;string">Huntington</family:GivenNameValue> <rdf:Description rdf:about="#GivenName_2"> <owl:Thing rdf:about="#Surname_1"> <owl:Thing rdf:ID="Surname_1"/> <family:SurnameValue rdf:datatype="&xsd;string">Lathrop</family:SurnameValue> <rdf:Description rdf:about="#Surname_1"> <rdf:type rdf:resource="&family;Surname"/> <owl:Thing rdf:about="#BirthDate_1"> <rdf:type rdf:resource="&family;BirthDate"/> <owl:Thing rdf:ID="BirthDate_1"/> <rdf:Description rdf:about="#BirthDate_1"> <owl:Thing rdf:about="#MarriageDate_1"> <owl:Thing rdf:ID="MarriageDate_1"/> <family:BirthDateValue rdf:datatype="&xsd;date">1810-00-00</family:BirthDateValue> <rdf:type rdf:resource="&family;MarriageDate"/> <family:MarriageDateValue rdf:datatype="&xsd;date">1835-00-00</family:MarriageDateValue> <rdf:Description rdf:about="#MarriageDate_1"> <owl:Thing rdf:ID="Gender_1"/> <rdf:type rdf:resource="&family;Gender"/> <owl:Thing rdf:about="#Gender_1"> <family:GenderValue rdf:datatype="&xsd;string">Female</family:GenderValue> <rdf:Description rdf:about="#Gender_1"> <family:Person-BirthDate rdf:resource="#BirthDate_1"/> <rdf:Description rdf:about="#Person_1"> <family:Person-Name rdf:resource="#Name_1"/> <family:Person-Child rdf:resource="#Person_6"/> <family:Person-Child rdf:resource="#Person_5"/> <family:Person-Gender rdf:resource="#Gender_1"/> <rdf:Description rdf:about="#Name_1"> <family:Name-Surname rdf:resource="#Surname_1"/> <family:Name-GivenName rdf:resource="#GivenName_2"/> <family:Name-GivenName rdf:resource="#GivenName_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-Person rdf:resource="#Person_1"/> <rdf:Description rdf:about="#PersonMarriageDateMarriagePlaceSpouse_1"> <owl:Thing rdf:ID="PersonMarriageDateMarriagePlaceSpouse_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-MarriageDate rdf:resource="#MarriageDate_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-Spouse rdf:resource="#Person_4"/> RDF output Jena reasoner OntoES convert to RDF comparators parameters owl:sameAs convert to csv Duke [(?x rdf:type source:Person) -> (?x rdf:type target:Person)] [(?x source:Person-Name ?n),(?n source: NameValue ?nv), isMale(?nv),makeTemp(?gender) -> (?x target:Person-Gender ?gender),(?gender rdf:type target:Gender), (?gender target:GenderValue `Male'^^xsd:string)]

Extraction Ontologies Conceptual model Instance recognizers

Lexical Object-Set Recognizers BirthDate external representation: \b[1][6-9]\d\d\b left context: b\.\s right context: [.,] …

Non-lexical Object-Set Recognizers Person object existence rule: {Name} … Name external representation: \b{FirstName}\s{LastName}\b …

Relationship-set Recognizers Person-BirthDate external representation: ^\d{1,3}\.\s{Person},\sb\.\s{BirthDate}[.,] …

Ontology-snippet Recognizers ChildRecord external representation: ^(\d{1,3})\.\s+([A-Z]\w+\s[A-Z]\w+) (,\sb\.\s([1][6-9]\d\d))?(,\sd\.\s([1][6-9]\d\d))?\.

FROntIER OntoES Jena reasoner Duke RDF output parameters comparators <owl:Thing rdf:ID="Person_1"/> <rdf:type rdf:resource="&family;Spouse"/> <rdf:type rdf:resource="&family;Child"/> <owl:Thing rdf:about="#Person_1"> <rdf:type rdf:resource="&family;Person"/> <rdf:type rdf:resource="&family;Name"/> <owl:Thing rdf:about="#Name_1"> <owl:Thing rdf:ID="Name_1"/> </owl:Thing> <rdf:type rdf:resource="&family;GivenName"/> <owl:Thing rdf:about="#GivenName_1"> <owl:Thing rdf:ID="GivenName_1"/> <family:GivenNameValue rdf:datatype="&xsd;string">Abigail</family:GivenNameValue> <rdf:Description rdf:about="#GivenName_1"> </rdf:Description> <owl:Thing rdf:about="#GivenName_2"> <owl:Thing rdf:ID="GivenName_2"/> <family:GivenNameValue rdf:datatype="&xsd;string">Huntington</family:GivenNameValue> <rdf:Description rdf:about="#GivenName_2"> <owl:Thing rdf:about="#Surname_1"> <owl:Thing rdf:ID="Surname_1"/> <family:SurnameValue rdf:datatype="&xsd;string">Lathrop</family:SurnameValue> <rdf:Description rdf:about="#Surname_1"> <rdf:type rdf:resource="&family;Surname"/> <owl:Thing rdf:about="#BirthDate_1"> <rdf:type rdf:resource="&family;BirthDate"/> <owl:Thing rdf:ID="BirthDate_1"/> <rdf:Description rdf:about="#BirthDate_1"> <owl:Thing rdf:about="#MarriageDate_1"> <owl:Thing rdf:ID="MarriageDate_1"/> <family:BirthDateValue rdf:datatype="&xsd;date">1810-00-00</family:BirthDateValue> <rdf:type rdf:resource="&family;MarriageDate"/> <family:MarriageDateValue rdf:datatype="&xsd;date">1835-00-00</family:MarriageDateValue> <rdf:Description rdf:about="#MarriageDate_1"> <owl:Thing rdf:ID="Gender_1"/> <rdf:type rdf:resource="&family;Gender"/> <owl:Thing rdf:about="#Gender_1"> <family:GenderValue rdf:datatype="&xsd;string">Female</family:GenderValue> <rdf:Description rdf:about="#Gender_1"> <family:Person-BirthDate rdf:resource="#BirthDate_1"/> <rdf:Description rdf:about="#Person_1"> <family:Person-Name rdf:resource="#Name_1"/> <family:Person-Child rdf:resource="#Person_6"/> <family:Person-Child rdf:resource="#Person_5"/> <family:Person-Gender rdf:resource="#Gender_1"/> <rdf:Description rdf:about="#Name_1"> <family:Name-Surname rdf:resource="#Surname_1"/> <family:Name-GivenName rdf:resource="#GivenName_2"/> <family:Name-GivenName rdf:resource="#GivenName_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-Person rdf:resource="#Person_1"/> <rdf:Description rdf:about="#PersonMarriageDateMarriagePlaceSpouse_1"> <owl:Thing rdf:ID="PersonMarriageDateMarriagePlaceSpouse_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-MarriageDate rdf:resource="#MarriageDate_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-Spouse rdf:resource="#Person_4"/> RDF output Jena reasoner OntoES convert to RDF comparators parameters owl:sameAs convert to csv Duke [(?x rdf:type source:Person) -> (?x rdf:type target:Person)] [(?x source:Person-Name ?n),(?n source: NameValue ?nv), isMale(?nv),makeTemp(?gender) -> (?x target:Person-Gender ?gender),(?gender rdf:type target:Gender), (?gender target:GenderValue `Male'^^xsd:string)]

Canonicalization “Sam’l” and “Geo.” -> “Samuel” and “George” “New York City” -> “New York, NY” “Boonton, N.J.” -> “Boonton, NJ”

Schema Mapping our view author’s view

Direct Schema Mapping Rule [(?x rdf:type source:Person) -> (?x rdf:type target:Person)] Person7 Person7

Name Decomposition Rule [(?x rdf:type source:Person),(?x source:Person-Name ?y),(?y rdf:type source:Name),… -> (?x rdf:type target:Person),(?x target:Person-Name ?y),(?y rdf:type target:Name),…] William Gerard Lathrop Person7 Person7 Name7 William Lathrop Gerard

Person has gender Male Rule [(?x rdf:type source:Son),makeSkolem(?gender, ?x) -> (?x target:Person-Gender ?gender),(?gender rdf:type target:Gender), (?gender target:GenderValue 'Male'^^xsd:string)] Male Name7 Person7 Person7 Name7 William Gerard Lathrop William Person7 Lathrop Gerard

FROntIER OntoES Jena reasoner Duke RDF output parameters comparators <owl:Thing rdf:ID="Person_1"/> <rdf:type rdf:resource="&family;Spouse"/> <rdf:type rdf:resource="&family;Child"/> <owl:Thing rdf:about="#Person_1"> <rdf:type rdf:resource="&family;Person"/> <rdf:type rdf:resource="&family;Name"/> <owl:Thing rdf:about="#Name_1"> <owl:Thing rdf:ID="Name_1"/> </owl:Thing> <rdf:type rdf:resource="&family;GivenName"/> <owl:Thing rdf:about="#GivenName_1"> <owl:Thing rdf:ID="GivenName_1"/> <family:GivenNameValue rdf:datatype="&xsd;string">Abigail</family:GivenNameValue> <rdf:Description rdf:about="#GivenName_1"> </rdf:Description> <owl:Thing rdf:about="#GivenName_2"> <owl:Thing rdf:ID="GivenName_2"/> <family:GivenNameValue rdf:datatype="&xsd;string">Huntington</family:GivenNameValue> <rdf:Description rdf:about="#GivenName_2"> <owl:Thing rdf:about="#Surname_1"> <owl:Thing rdf:ID="Surname_1"/> <family:SurnameValue rdf:datatype="&xsd;string">Lathrop</family:SurnameValue> <rdf:Description rdf:about="#Surname_1"> <rdf:type rdf:resource="&family;Surname"/> <owl:Thing rdf:about="#BirthDate_1"> <rdf:type rdf:resource="&family;BirthDate"/> <owl:Thing rdf:ID="BirthDate_1"/> <rdf:Description rdf:about="#BirthDate_1"> <owl:Thing rdf:about="#MarriageDate_1"> <owl:Thing rdf:ID="MarriageDate_1"/> <family:BirthDateValue rdf:datatype="&xsd;date">1810-00-00</family:BirthDateValue> <rdf:type rdf:resource="&family;MarriageDate"/> <family:MarriageDateValue rdf:datatype="&xsd;date">1835-00-00</family:MarriageDateValue> <rdf:Description rdf:about="#MarriageDate_1"> <owl:Thing rdf:ID="Gender_1"/> <rdf:type rdf:resource="&family;Gender"/> <owl:Thing rdf:about="#Gender_1"> <family:GenderValue rdf:datatype="&xsd;string">Female</family:GenderValue> <rdf:Description rdf:about="#Gender_1"> <family:Person-BirthDate rdf:resource="#BirthDate_1"/> <rdf:Description rdf:about="#Person_1"> <family:Person-Name rdf:resource="#Name_1"/> <family:Person-Child rdf:resource="#Person_6"/> <family:Person-Child rdf:resource="#Person_5"/> <family:Person-Gender rdf:resource="#Gender_1"/> <rdf:Description rdf:about="#Name_1"> <family:Name-Surname rdf:resource="#Surname_1"/> <family:Name-GivenName rdf:resource="#GivenName_2"/> <family:Name-GivenName rdf:resource="#GivenName_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-Person rdf:resource="#Person_1"/> <rdf:Description rdf:about="#PersonMarriageDateMarriagePlaceSpouse_1"> <owl:Thing rdf:ID="PersonMarriageDateMarriagePlaceSpouse_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-MarriageDate rdf:resource="#MarriageDate_1"/> <family:PersonMarriageDateMarriagePlaceSpouse-Spouse rdf:resource="#Person_4"/> RDF output Jena reasoner OntoES convert to RDF comparators parameters owl:sameAs convert to csv Duke [(?x rdf:type source:Person) -> (?x rdf:type target:Person)] [(?x source:Person-Name ?n),(?n source: NameValue ?nv), isMale(?nv),makeTemp(?gender) -> (?x target:Person-Gender ?gender),(?gender rdf:type target:Gender), (?gender target:GenderValue `Male'^^xsd:string)]

Resolving Mary Elys Example Person2 (1st Mary Ely) owl:sameAs Person8 (3rd Mary Ely) 0.032081 0.995030

Resolving Gerard Lathrop Example Person3 (1st Gerard Lathrop) owl:sameAs Person9 (2nd Gerard Lathrop) 0.028505 0.088043 0.032081 ~0.0 0.000007 0.078216 0.953839

Experiment The Ely Ancestry, page 419 Annotated gold standard files Extracted facts: excerpt Inferred facts: excerpt Same-as facts: excerpt Annotated gold standard files Extracted facts: automated evaluation tool Inferred facts: automated evaluation tool Same-as facts: hand checked

Experimental Results

Concluding Remarks Proof of concept Current and Future work: Corpus of 50,000+ books provided by LDS Church 200 randomly selected pages Estimate the following: Time required Expertise required Accuracy (precision & recall)