Ontology-based Information Extraction with a Cognitive Agent Peter Lindes 1, Deryle Lonsdale, David Embley Brigham Young University AAAI 2015 1 Now at.

Slides:



Advertisements
Similar presentations
AeroDAML Applying Information Extraction to Generate DAML Annotations Dr. Paul Kogut Lockheed Martin Management & Data Systems.
Advertisements

Georgiana Fitz Miss Owen.  Marie de Guise was Catholic. Marie was the first child of Claude I, Duc of Guise and his wife Antoinette de Bourbon (another.
Finding Genealogy Facts with Linguistic Analysis Peter Lindes, Deryle W. Lonsdale, David W. Embley Brigham Young University © 2014 Peter Lindes 3/19/2014PL.
BY: LOGAN Henry Ford. Where he was born Henry Ford was born on July 30, 1863 in Dearborn, Michigan.
Diana, Princess of Wales
(Type Famous American’s Name Here) 1 Date of Birth:
Growing the Semantic Web By Charla Woodbury June 11, 2004.
Automating the Extraction of Genealogical Information from Historical Documents Aaron P. Stewart David W. Embley March 20, 2011.
How to draw a family tree POSH and BECKS. David Beckham Born 2 May = means married Victoria Adams Born 17 Apr David Beckham = Victoria Adams.
Scott N. Woodfield David W. Embley Stephen W. Liddle Brigham Young University.
CPSC 322 Introduction to Artificial Intelligence September 15, 2004.
1 Automating the Extraction of Genealogical Information from the Web GeneTIQS Troy Walker & David W. Embley Family History Technology Conference March.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
Ontologies and the Semantic Web by Ian Horrocks presented by Thomas Packer 1.
Enabling Search for Facts and Implied Facts in Historical Documents David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Spencer Machado, Thomas Packer,
Principled Pragmatism: A Guide to the Adaptation of Philosophical Disciplines to Conceptual Modeling David W. Embley, Stephen W. Liddle, & Deryle W. Lonsdale.
Artificial Intelligence and Lisp #2 Introduction to Cognitive Agents and to Knowledge Representation.
1 Automating the Extraction of Domain-Specific Information from the Web A Case Study for the Genealogical Domain Troy Walker Thesis Defense November 19,
The Importance of Architecture for Achieving Human-level AI John Laird University of Michigan June 17, th Soar Workshop
1 Deryle Lonsdale, Jeremiah McGhee, Nathan Glenn, and Tory Anderson.
Populating the Semantic Web by Macro-Reading Internet Text T.M Mitchell, J. Betteridge, A. Carlson, E. Hruschka, R. Wang Presented by: Will Darby.
An Abstract Framework for Extraction Plans and Heuristics in a Data Extraction System Alan Wessman Brigham Young University Based on research supported.
1 Automating the Extraction of Domain-Specific Information from the Web A Case Study for the Genealogical Domain Troy Walker Spring Research Conference.
Marketing 334 Consumer Behavior
Inventor’s Name (Paste picture over this) By: (Type your name)
Who is Jesus ? Where’s the answer ? This has to be the most important question of all time !
1. Human – the end-user of a program – the others in the organization Computer – the machine the program runs on – often split between clients & servers.
Deryle W. Lonsdale, David W. Embley, Stephen W. Liddle, and Joseph Park BYU Data Extraction Research Group.
Collaborative Research Assistant 2007 Family History Technology Conference John Finlay Christopher Stolworthy Daniel Parker.
OntoSoar: Feeding a Growing Ontology CS 652 Information Extraction and Integration Fall 2012 Peter Lindes pl 12/4/2012OntoSoar1.
FROntIER: A Framework for Extracting and Organizing Biographical Facts in Historical Documents Joseph Park.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: how to cost-effectively extract Extraction.
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: cost-effective extraction Extraction tools.
Constructing a Pedigree
Soar and Construction Grammar Peter Lindes, Deryle Lonsdale, David Embley Brigham Young University 2014 Soar Workshop © 2014 Peter Lindes 6/19/2014PL 2014.
Understanding Natural Language
Bootstrapping Regular-Expression Recognizer to Help Human Annotators Tae Woo Kim.
FROntIER: Fact Recognizer for Ontologies with Inference and Entity Resolution Joseph Park, Computer Science Brigham Young University.
Cost-Effective Information Extraction from Lists in OCRed Historical Documents Thomas Packer and David W. Embley Brigham Young University FamilySearch.
DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.
Grammars Grammars can get quite complex, but are essential. Syntax: the form of the text that is valid Semantics: the meaning of the form – Sometimes semantics.
“Automating Reasoning on Conceptual Schemas” in FamilySearch — A Large-Scale Reasoning Application David W. Embley Brigham Young University More questions.
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: cost-effective extraction Extraction tools.
Ontology and Databases 1. We'll go around with a self-introduction of participants (10~15 minutes) - we'll skip this if we have more than 20 participants.
OntoSoar: Soar Finds Facts in Text Peter Lindes, Deryle Lonsdale, David Embley Brigham Young University 33 rd Soar Workshop, June 2013 pl 6/6/201333rd.
How Phonological and Language Deficits Impact Literacy Proficiency Sherry Comerchero ASHA Certified Speech-Language Pathologist April 4, 2007.
Cost-effective Ontology Population with Data from Lists in OCRed Historical Documents Thomas L. Packer David W. Embley HIP ’13 BYU CS 1.
Slide no 1 Cognitive Systems in FP6 scope and focus Colette Maloney DG Information Society.
Scanned Books: Annotator Training. Project Overview Untapped sources – 200,000+ scanned/OCRed books – Problem: cost-effective extraction Extraction tools.
Extracting and Organizing Facts of Interest from OCRed Historical Documents Joseph Park, Computer Science Brigham Young University.
Artificial Intelligence
IES Research Conference 2009 Selections from “Reducing the Complexities of Reading Comprehension: A Simplifying Framework” Charles Perfetti Original presentation.
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: cost-effective extraction Extraction tools.
Extracting Data Automatically from Scanned Books with OntoSoar
Cognitive Language Processing for Rosie
Artificial Intelligence and Lisp Lecture 13 Additional Topics in Artificial Intelligence LiU Course TDDC65 Autumn Semester,
David W. Embley Brigham Young University Provo, Utah, USA
Cognition and neurolinguistics
Cognitive Language Comprehension in Rosie
Psychology in Everyday Life
Stephen W. Liddle, Deryle W. Lonsdale, and Scott N. Woodfield
Vision for an Automatically Constructed FH-WoK
Joseph S. Park and David W. Embley Brigham Young University
(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.
Extracting Full Names from Diverse and Noisy Scanned Document Images
Grant Number: IIS Institution of PI: Brigham Young University PI’s: David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale Title:
Joseph Park Brigham Young University
Using Language to Find Genealogy Facts
Joseph Park Brigham Young University
Presentation transcript:

Ontology-based Information Extraction with a Cognitive Agent Peter Lindes 1, Deryle Lonsdale, David Embley Brigham Young University AAAI Now at University of Michigan © 2015 Peter Lindes 1/22/2015AAAI IE with a Cognitive Agent1

The Problem 1/22/2015AAAI IE with a Cognitive Agent2

Goals and Strategies OntoSoar project goals – Extract genealogy facts from family history books – Project extracted information onto a conceptual model to populate a searchable database Strategies – Use ideas from Embodied Construction Grammar – Use the Soar cognitive architecture – Integrate several levels of knowledge Long term goals – Build computational models of human language processing – Apply these models to real-world applications 1/22/2015AAAI IE with a Cognitive Agent3

Example 1 1/22/2015AAAI IE with a Cognitive Agent4

A Simple Ontology 1/22/2015AAAI IE with a Cognitive Agent5 Charles Christopher Lathrop has born on died on

Example2 1/22/2015AAAI IE with a Cognitive Agent6

A More Complex Ontology 1/22/2015AAAI IE with a Cognitive Agent7 Myra Harwood Jonathan Squires J. Wilbur Squires Feb. 13, 1874

The Solution 1/22/2015AAAI IE with a Cognitive Agent8 Thus, intelligence is the ability to bring to bear all the knowledge that one has in service of one’s goals. Newell (1990), p. 90 Page Layout * Text Analysis Syntax Semantics Pragmatics World Knowledge Conceptual Models

OntoSoar Architecture 1/22/2015AAAI IE with a Cognitive Agent9

Construction Grammar 1/22/2015AAAI IE with a Cognitive Agent10

Applying Constructions 1/22/2015AAAI IE with a Cognitive Agent11 Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ;

… More Constructions 1/22/2015AAAI IE with a Cognitive Agent12 Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ;

Building Knowledge 1/22/2015AAAI IE with a Cognitive Agent13 Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ;

Knowledge Structures Compared 1/22/2015AAAI IE with a Cognitive Agent14 … his widow married JONATHAN SQUIRES, who was born in Ohio, July 25, 1823, by whom she had one son, J. Wilbur, born June 16, 1865, Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ;

Results on Examples 1/22/2015AAAI IE with a Cognitive Agent15

1/22/2015AAAI IE with a Cognitive Agent16

Results on The Ely Ancestry 1/22/2015AAAI IE with a Cognitive Agent17 Item TypeInstance Found Persons16,848 Births8,609 Deaths2,406 Genders1,674 Couples3,343 Children3,049 Total35,929 a book of 830 pages, including our Example 1

Conclusions Contributions Produces usable genealogy data from scanned books Does this using: – Integration of several levels of knowledge – An adaptation of Embodied Construction Grammar – A cognitive architecture (Soar) Future Work Integrate parsing with semantics Develop a means to learn many new constructions Adapt to varying book styles Scale up to perform well on 100’s of thousands of books 1/22/2015AAAI IE with a Cognitive Agent18 It works! … and, it could work a lot better.