David W. Embley Brigham Young University Provo, Utah, USA.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Knowledge Representation
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
Ontologies for multilingual extraction Deryle W. Lonsdale David W. Embley Stephen W. Liddle Supported by the.
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
Ontology-Based Free-Form Query Processing for the Semantic Web by Mark Vickers Supported by:
David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Aaron Stewart, and Cui Tao* Brigham Young University, Provo, Utah, USA *Mayo Clinic, Rochester,
FOCIH: Form-based Ontology Creation and Information Harvesting Cui Tao, David W. Embley, Stephen W. Liddle Brigham Young University Nov. 11, 2009 Supported.
Semi-automatic Ontology Creation through Conceptual-Model Integration David W. Embley Brigham Young University ER2008.
Principled Pragmatism: A Guide to the Adaptation of Philosophical Disciplines to Conceptual Modeling David W. Embley, Stephen W. Liddle, & Deryle W. Lonsdale.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
Conceptual Model Based Semantic Web Services Muhammed J. Al-Muhammed David W. Embley Stephen W. Liddle Brigham Young University Sponsored in part by NSF.
Ontology-Based Free-Form Query Processing for the Semantic Web Thesis proposal by Mark Vickers.
Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.
6/17/20151 Table Structure Understanding by Sibling Page Comparison Cui Tao Data Extraction Group Department of Computer Science Brigham Young University.
1 Semi-Automatic Semantic Annotation for Hidden-Web Tables Cui Tao & David W. Embley Data Extraction Research Group Department of Computer Science Brigham.
Semiautomatic Generation of Resilient Data-Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Two-Level Semantic Annotation Model BYU Spring Conference 2007 Yihong Ding Sponsored by NSF.
Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
From OSM-L to JAVA Cui Tao Yihong Ding. Overview of OSM.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
Seed-based Generation of Personalized Bio-Ontologies for Information Extraction Cui Tao & David W. Embley Data Extraction Research Group Department of.
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation Stephen W. Liddle Information Systems Department Yihong Ding & David.
Semantic Web Queries by Mark Vickers Funded by NSF.
1 A Tool to Support Ontology Creation Based on Incremental Mini-ontology Merging Zonghui Lian.
SOLUTION: Source page understanding – Table interpretation Table recognition Table pattern generalization Pattern adjustment Information extraction & semantic.
Generating Data-Extraction Ontologies By Example Joe Zhou Data Extraction Group Brigham Young University.
Table Interpretation by Sibling Page Comparison Cui Tao & David W. Embley Data Extraction Group Department of Computer Science Brigham Young University.
1 Ontology Generation Based on a User-Specified Ontology Seed Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University.
1 Cui Tao PhD Dissertation Defense Ontology Generation, Information Harvesting and Semantic Annotation For Machine-Generated Web Pages.
Semi-Automatic Generation of Mini-Ontologies from Canonicalized Relational Tables Chris Hathaway.
Automatic Creation and Simplified Querying of Semantic Web Content An Approach Based on Information-Extraction Ontologies Yihong Ding, David W. Embley,
BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding,
Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Theoretical Foundations for Enabling a Web of Knowledge David W. Embley Andrew Zitzelberger Brigham Young University
EXCS Sept Knowledge Engineering Meets Software Engineering Hele-Mai Haav Institute of Cybernetics at TUT Software department.
Artificial intelligence project
Joseph Park Brigham Young University.  Motivation.
1 A Hierarchical Approach to Wrapper Induction Presentation by Tim Chartrand of A paper bypaper Ion Muslea, Steve Minton and Craig Knoblock.
Dimitrios Skoutas Alkis Simitsis
An Aspect of the NSF CDI InitiativeNSF CDI: Cyber-Enabled Discovery and Innovation.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.
Domain Model A representation of real-world conceptual classes in a problem domain. The core of object-oriented analysis They are NOT software objects.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.
Cross-language Information Retrieval
David W. Embley Brigham Young University Provo, Utah, USA
Grant Number: IIS Institution of PI: Brigham Young University PI’s: David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale Title:
Presentation transcript:

David W. Embley Brigham Young University Provo, Utah, USA

Birthdate of my great grandpa Orson Price and mileage of red Nissans, 1990 or newer Location and size of chromosome 17 US states with property crime rates above 1%

Fundamental questions What is knowledge? What are facts? How does one know? Philosophy Ontology Epistemology Logic and reasoning

Existence  asks “What exists?” Concepts, relationships, and constraints

The nature of knowledge  asks: “What is knowledge?” and “How is knowledge acquired?” Populated conceptual model

Principles of valid inference – asks: “What is known?” and “What can be inferred?” For us, it answers: what can be inferred (in a formal sense) from conceptualized data. Find price and mileage of red Nissans, 1990 or newer

Distill knowledge from the wealth of digital web data Annotate web pages Need a computational alembic to algorithmically turn raw symbols contained in web pages into knowledge Fact Annotation … …

Symbols: $ 11, K Nissan CD AC Data: price(11,500) mileage(117K) make(Nissan) Conceptualized data: Car(C 123 ) has Price($11,500) Car(C 123 ) has Mileage(117,000) Car(C 123 ) has Make(Nissan) Car(C 123 ) has Feature(AC) Knowledge “Correct” facts Provenance

Find me the price and mileage of all red Nissans – I want a 1990 or newer.

Extraction Ontologies Semantic Annotation Free-Form Query Interpretation

Object sets Relationship sets Participation constraints Lexical Non-lexical Primary object set Aggregation Generalization/Specialization

External Rep.: \s*[$]\s*(\d{1,3})*(\.\d{2})? Key Word Phrase Left Context: $ Data Frame: Internal Representation: float Values Key Words: ([Pp]rice)|([Cc]ost)| … Operators Operator: > Key Words: (more\s*than)|(more\s*costly)|…

Generality: assumptions about web pages Data rich Narrow domain Document types Simple multiple-record documents (easiest) Single-record documents (harder) Records with scattered components (even harder) Resiliency: declarative Still works when web pages change Works for new, unseen pages in the same domain Scalable, but takes work to declare the extraction ontology

Parse Free-Form Query (wrt data extraction ontology) Select Ontology Formulate Query Expression Run Query Over Semantically Annotated Data

“Find me the and of all s – I want a ”pricemileageredNissan1996or newer >= Operator

“Find me the price and mileage of all red Nissans – I want a 1996 or newer”

Conjunctive queries and aggregate queries Mentioned object sets are all of interest. Values and operator keywords determine conditions. Color = “red” Make = “Nissan” Year >= 1996 >= Operator Formulate Query Expression

For Let Where Return Formulate Query Expression

Several dozen person-hours Oodles of extraction ontologies needed How can we resolve this problem?

Forms – General familiarity – Reasonable conceptual framework – Appropriate correspondence Transformable to ontological descriptions Capable of accepting source data Instance recognizers – Some pre-existing instance recognizers – Lexicons Automated extraction ontology creation?

Basic form-construction facilities: single-entry field multiple-entry field nested form …

Need reading path: DOM-tree structure Need to resolve mapping problems Split/Merge Union/Selection

Need reading path: DOM-tree structure Need to resolve mapping problems Split/Merge Union/Selection Voltage-dependent anion-selective channel protein 3 VDAC-3 hVDAC3 Outer mitochondrial membrane Protein porin 3 Name

Need reading path: DOM-tree structure Need to resolve mapping problems Split/Merge Union/Selection Voltage-dependent anion-selective channel protein 3 VDAC-3 hVDAC3 Outer mitochondrial membrane Protein porin 3 Name

Need reading path: DOM-tree structure Need to resolve mapping problems Split/Merge Union/Selection Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15

Need reading path: DOM-tree structure Need to resolve mapping problems Split/Merge Union/Selection Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15

Name

protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP E

Name Voltage-dependent anion-selective channel protein 3 VDAC-3 hVDAC3 Outer mitochondrial membrane Protein porin 3

Name Tryptophanyl-tRNA synthetase, mitochondrial precursor EC Tryptophan—tRNA ligase TrpRS (Mt)TrpRS

Also helps adjust ontology constraints

Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15

Lexicons Name protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP E Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15 Name Tryptophanyl-tRNA synthetase, mitochondrial precursor EC Tryptophan—tRNA ligase TrpRS (Mt)TrpRS … protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP E … T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15 … Tryptophanyl-tRNA synthetase, mitochondrial precursor EC Tryptophan—tRNA ligase TrpRS (Mt)TrpRS …

Instance Recognizers Number Patterns Context Keywords and Phrases

Recognize and annotate with respect to an ontology

Automatic (or near automatic) creation of extraction ontologies Automatic (or near automatic) annotation of web pages Simple but accurate query specification without specialized training “Effortlessly” generate WoK content

Extraction-ontology generation Auto-enhancement of extraction ontologies Form-based specification Auto-generation based on table interpretation Sophisticated conceptualization with TANGO Automated annotation Extraction ontologies Form-based information harvesting Generated pattern-based annotation Simple query specification Free-form queries Generated form-based queries