“Automating Reasoning on Conceptual Schemas” in FamilySearch — A Large-Scale Reasoning Application David W. Embley Brigham Young University More questions.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Big Ideas in Cmput366. Search Blind Search State space representation Iterative deepening Heuristic Search A*, f(n)=g(n)+h(n), admissible heuristics Local.
Finding Genealogy Facts with Linguistic Analysis Peter Lindes, Deryle W. Lonsdale, David W. Embley Brigham Young University © 2014 Peter Lindes 3/19/2014PL.
Automating the Extraction of Genealogical Information from Historical Documents Aaron P. Stewart David W. Embley March 20, 2011.
Computer Science Research for Family History and Genealogy David W. Embley Heath Nielson, Mike Rimer, Luke Hutchison, Ken Tubbs, Doug Kennard, Tom Finnigan.
Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.
1 Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand A Thesis Proposal Research Supported By NSF.
Domain-Independent Data Extraction: Person Names Carl Christensen and Deryle Lonsdale Brigham Young University
Ontologies and the Semantic Web by Ian Horrocks presented by Thomas Packer 1.
Enabling Search for Facts and Implied Facts in Historical Documents David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Spencer Machado, Thomas Packer,
PR-OWL: A Framework for Probabilistic Ontologies by Paulo C. G. COSTA, Kathryn B. LASKEY George Mason University presented by Thomas Packer 1PR-OWL.
Principled Pragmatism: A Guide to the Adaptation of Philosophical Disciplines to Conceptual Modeling David W. Embley, Stephen W. Liddle, & Deryle W. Lonsdale.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
1 Relational Model. 2 Relational Database: Definitions  Relational database: a set of relations  Relation: made up of 2 parts: – Instance : a table,
1 Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand Masters Thesis Research Supported By NSF.
Describing Syntax and Semantics
An Abstract Framework for Extraction Plans and Heuristics in a Data Extraction System Alan Wessman Brigham Young University Based on research supported.
1 Cui Tao PhD Dissertation Defense Ontology Generation, Information Harvesting and Semantic Annotation For Machine-Generated Web Pages.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Enabling Efficient Chinese Jiapu Information Extraction
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Lowell 2003 Challenges Alon Y. Halevy University of Washington.
Knowledge Acquisition. Concepts of Knowledge Engineering Knowledge engineering The engineering discipline in which knowledge is integrated into computer.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
FROntIER: A Framework for Extracting and Organizing Biographical Facts in Historical Documents Joseph Park.
DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.
A Web of Knowledge for Historical Documents David W. Embley.
Joseph Park Brigham Young University.  Motivation.
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: how to cost-effectively extract Extraction.
Populating A Knowledge Base From Text Clay Fink, Tim Finin, Christine Piatko and Jim Mayfield.
Populating Ontologies with Data from Lists in Family History Books Thomas L. Packer David W. Embley RT.FHTW BYU.CS 1.
Soar and Construction Grammar Peter Lindes, Deryle Lonsdale, David Embley Brigham Young University 2014 Soar Workshop © 2014 Peter Lindes 6/19/2014PL 2014.
Ontology-based Information Extraction with a Cognitive Agent Peter Lindes 1, Deryle Lonsdale, David Embley Brigham Young University AAAI Now at.
FALL 2004CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
An Introduction to Description Logics (chapter 2 of DLHB)
FROntIER: Fact Recognizer for Ontologies with Inference and Entity Resolution Joseph Park, Computer Science Brigham Young University.
Cost-Effective Information Extraction from Lists in OCRed Historical Documents Thomas Packer and David W. Embley Brigham Young University FamilySearch.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Majid Sazvar Knowledge Engineering Research Group Ferdowsi University of Mashhad Semantic Web Reasoning.
(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.
FamilySearch - Basics A Quick Look at the Puzzle Pieces.
Data Profiling 13 th Meeting Course Name: Business Intelligence Year: 2009.
Aidan Hogan, Antoine Zimmermann, Jürgen Umbrich, Axel Polleres, Stefan Decker Presented by Joseph Park SCALABLE AND DISTRIBUTED METHODS FOR ENTITY MATCHING,
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
GreenFIE-HD: A “Green” Form-based Information Extraction Tool for Historical Documents Tae Woo Kim.
Cost-effective Ontology Population with Data from Lists in OCRed Historical Documents Thomas L. Packer David W. Embley HIP ’13 BYU CS 1.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Extracting and Organizing Facts of Interest from OCRed Historical Documents Joseph Park, Computer Science Brigham Young University.
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
n-ary relations OWL modeling problem when n≥3
Extracting Data Automatically from Scanned Books with OntoSoar
A Web of Knowledge for Family History (Research Directions)
David W. Embley Brigham Young University Provo, Utah, USA
Stephen W. Liddle, Deryle W. Lonsdale, and Scott N. Woodfield
Vision for an Automatically Constructed FH-WoK
Pragmatic Quality Assessment for Automatically Extracted Data
Joseph S. Park and David W. Embley Brigham Young University
Property consolidation for entity browsing
[jws13] Evaluation of instance matching tools: The experience of OAEI
(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.
Extracting Full Names from Diverse and Noisy Scanned Document Images
Temple Ready within an Hour of Collection Capture
Query Optimization.
Joseph Park Brigham Young University
Joseph Park Brigham Young University
A framework for ontology Learning FROM Big Data
Presentation transcript:

“Automating Reasoning on Conceptual Schemas” in FamilySearch — A Large-Scale Reasoning Application David W. Embley Brigham Young University More questions than answers … help! But possibly also some interesting directions for future research … TM BYU Data Extraction Research Group

Context: Web of Knowledge Supermposed Family History Books Large collection of scanned, OCRed books: currently 85,000 on line and counting Stated assertions Implied assertions ▫ Inferred ▫ Same-as entities 4.25  assertions Trillions of RDF triples

Stated Assertions William Gerard Lathrop – married Charlotte Brackett Jennings in 1837 – is the son of Mary Ely – was born in 1812

Inferred Assertions William Gerard Lathrop has gender Male Maria Jennings Female (name analysis) Maria Jennings has surname Lathrop

Same-as Entities William Gerard Lathrop Gerard Lathrop Mary Ely

FROntIER OntoES convert to RDF Jena reasoner comparators parameters owl:sameAs Abigail Huntington Lathrop Female RDF output [(?x rdf:type source:Person) -> (?x rdf:type target:Person)] [(?x source:Person-Name ?n),(?n source: NameValue ?nv), isMale(?nv),makeTemp(?gender) -> (?x target:Person-Gender ?gender),(?gender rdf:type target:Gender), (?gender target:GenderValue `Male'^^xsd:string)] Duke convert to csv

WoK (a Web of Knowledge superimposed over historical documents) …… ……

…… grandchildren of Mary Ely ……

WoK (a Web of Knowledge superimposed over historical documents) …… …… grandchildren of Mary Ely

WoK (a Web of Knowledge superimposed over historical documents) …… grandchildren of Mary Ely ……

WoK (a Web of Knowledge superimposed over historical documents) …… ……

Issues Inference with large sets of assertions Satisfiability over more expressive constraints Uncertainty – Constraint violation search – Probabilistic description logics

Q1: Scalability? 4.25  assertions, about half inferred Conceptual model growth (via new inferred assertion types) (Dynamic) semantic index

Q2: Satisfiability? Basic Tableau Algorithm & OSM? n-ary (n > 2)? Marriage (  1 AD) Π (  1 AD) (  1 AD  ) Π (  1 AD  ) (  1 CE  ) ¬B A ¬C A ¬A B C ¬B ¬C Π Π Π Π Π x x y y z w v

Q3: Uncertainty? How do we verify assertions? – Plato: “Knowledge is justified true belief” – WoK perspective: Reasoning chains grounded in source documents – Well-reasoned explanations Where does uncertainty arise? – Assertion extraction – Assertion Inference – Object identity resolution – FamilySearch wiki model What conceptual modeling constructs would be useful? – Probability of assertion correctness – Probability distributions for cardinality constraints – Probabilities and probability distributions over general constraints Processing with uncertainty? – Satisfiability in the context of uncertainty – Discovering discrepancies – Likely views

grandchildren of Mary Ely Uncertainty: Truth Verification …… …… Plus: discussion (hopefully well-reasoned) Plato: “Knowledge is justified true belief”

Source of Uncertainty: Assertion Extraction 1. Andy b Becky Beth h, Charles Conrad 1.Initialize 1. Andy b Becky Beth h, i Charles Conrad C FN BD \n(1)\. (Andy) b\. (1816)\n 3.Alignment-Search 2.Generalize C FN BD \n([\dlio])[.,] (\w{4}) [bh][.,] ([\dlio]{4})\n C FN BD \n([\dlio])\[.,] (\w{4}) [bh][.,] ([\dlio]{4})\n X Deletion C FN Unknown BD \n([\dlio])[.,] (\w{4,5}) (\S{1,10}) [bh][.,] ([\dlio]{4})\n Insertion 1. Andy b Becky Beth h, Charles Conrad Expansion 4.Evaluate (edit sim. * match prob.) One match! No Match 5.Active Learning 1. Andy b Becky Beth h, i Charles Conrad C FN MN BD \n([\dlio])[.,] (\w{4,5}) (\w{4}) [bh][.,] ([\dlio]{4})\n 6.Extract 1. Andy b Becky Beth h, i Charles Conrad Many more … ListReader Wrapper Induction: Precision: 95% Recall: 91%

Source of Uncertainty: Assertion Inference Name isMale? (?x source:Person-Name ?n), (?n source: NameValue ?nv), isMale(?nv), makeTemp(?gender) -> (?x target:Person-Gender ?gender), (?gender rdf:type target:Gender), (?gender target:GenderValue `Male'^^xsd:string)]

Source of Uncertainty: Object Identity Resolution Person 2 (Mary Ely) owl:sameAs Person 8 (Mary Ely)

Source of Uncertainty: Wiki-like Updates + Uncertain Inferences Sources of error: 1.Incorrect person merges 2.Incorrect parent-child relationship assertions Cyclic Pedigree:

Uncertainty: Useful More-Expressive Conceptual Model Specifications 1:* 1:2.1:* x 2 Nov Nov 1845 p = 0.79 p = 0.35

Uncertainty Processing: (1) Satisfiability, (2) Discrepancy Discovery, (3) Likely Views 1:* 1:2.1:* x 2 Nov Nov 1845 p = 0.79 p = 0.35

Summary Web of Knowledge (WoK) Resolution needs: – Scalability – Satisfiability – Uncertainty Conceptual-modeling extensions supporting uncertainty Discrepancy discovery View likelihood