Download presentation
Presentation is loading. Please wait.
Published byKarin Harvey Modified over 8 years ago
1
David W. Embley Brigham Young University Provo, Utah, USA
2
Birthdate of my great grandpa Orson Price and mileage of red Nissans, 1990 or newer Location and size of chromosome 17 US states with property crime rates above 1%
3
Fundamental questions What is knowledge? What are facts? How does one know? Philosophy Ontology Epistemology Logic and reasoning
4
Existence asks “What exists?” Concepts, relationships, and constraints
5
The nature of knowledge asks: “What is knowledge?” and “How is knowledge acquired?” Populated conceptual model
6
Principles of valid inference – asks: “What is known?” and “What can be inferred?” For us, it answers: what can be inferred (in a formal sense) from conceptualized data. Find price and mileage of red Nissans, 1990 or newer
7
Distill knowledge from the wealth of digital web data Annotate web pages Need a computational alembic to algorithmically turn raw symbols contained in web pages into knowledge Fact Annotation … …
8
Symbols: $ 11,500 117K Nissan CD AC Data: price(11,500) mileage(117K) make(Nissan) Conceptualized data: Car(C 123 ) has Price($11,500) Car(C 123 ) has Mileage(117,000) Car(C 123 ) has Make(Nissan) Car(C 123 ) has Feature(AC) Knowledge “Correct” facts Provenance
9
Find me the price and mileage of all red Nissans – I want a 1990 or newer.
13
Extraction Ontologies Semantic Annotation Free-Form Query Interpretation
14
Object sets Relationship sets Participation constraints Lexical Non-lexical Primary object set Aggregation Generalization/Specialization
15
External Rep.: \s*[$]\s*(\d{1,3})*(\.\d{2})? Key Word Phrase Left Context: $ Data Frame: Internal Representation: float Values Key Words: ([Pp]rice)|([Cc]ost)| … Operators Operator: > Key Words: (more\s*than)|(more\s*costly)|…
16
Generality: assumptions about web pages Data rich Narrow domain Document types Simple multiple-record documents (easiest) Single-record documents (harder) Records with scattered components (even harder) Resiliency: declarative Still works when web pages change Works for new, unseen pages in the same domain Scalable, but takes work to declare the extraction ontology
18
Parse Free-Form Query (wrt data extraction ontology) Select Ontology Formulate Query Expression Run Query Over Semantically Annotated Data
19
“Find me the and of all s – I want a ”pricemileageredNissan1996or newer >= Operator
20
“Find me the price and mileage of all red Nissans – I want a 1996 or newer”
21
Conjunctive queries and aggregate queries Mentioned object sets are all of interest. Values and operator keywords determine conditions. Color = “red” Make = “Nissan” Year >= 1996 >= Operator Formulate Query Expression
22
For Let Where Return Formulate Query Expression
24
Several dozen person-hours Oodles of extraction ontologies needed How can we resolve this problem?
25
Forms – General familiarity – Reasonable conceptual framework – Appropriate correspondence Transformable to ontological descriptions Capable of accepting source data Instance recognizers – Some pre-existing instance recognizers – Lexicons Automated extraction ontology creation?
26
Basic form-construction facilities: single-entry field multiple-entry field nested form …
33
Need reading path: DOM-tree structure Need to resolve mapping problems Split/Merge Union/Selection
34
Need reading path: DOM-tree structure Need to resolve mapping problems Split/Merge Union/Selection Voltage-dependent anion-selective channel protein 3 VDAC-3 hVDAC3 Outer mitochondrial membrane Protein porin 3 Name
35
Need reading path: DOM-tree structure Need to resolve mapping problems Split/Merge Union/Selection Voltage-dependent anion-selective channel protein 3 VDAC-3 hVDAC3 Outer mitochondrial membrane Protein porin 3 Name
36
Need reading path: DOM-tree structure Need to resolve mapping problems Split/Merge Union/Selection Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15
37
Need reading path: DOM-tree structure Need to resolve mapping problems Split/Merge Union/Selection Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15
38
Name
39
14-3-3 protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP-1 14-3-3E
40
Name Voltage-dependent anion-selective channel protein 3 VDAC-3 hVDAC3 Outer mitochondrial membrane Protein porin 3
41
Name Tryptophanyl-tRNA synthetase, mitochondrial precursor EC 6.1.1.2 Tryptophan—tRNA ligase TrpRS (Mt)TrpRS
43
Also helps adjust ontology constraints
44
Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15
45
Lexicons Name 14-3-3 protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP-1 14-3-3E Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15 Name Tryptophanyl-tRNA synthetase, mitochondrial precursor EC 6.1.1.2 Tryptophan—tRNA ligase TrpRS (Mt)TrpRS … 14-3-3 protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP-1 14-3-3E … T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15 … Tryptophanyl-tRNA synthetase, mitochondrial precursor EC 6.1.1.2 Tryptophan—tRNA ligase TrpRS (Mt)TrpRS …
46
Instance Recognizers Number Patterns Context Keywords and Phrases
48
Recognize and annotate with respect to an ontology
49
Automatic (or near automatic) creation of extraction ontologies Automatic (or near automatic) annotation of web pages Simple but accurate query specification without specialized training “Effortlessly” generate WoK content
50
Extraction-ontology generation Auto-enhancement of extraction ontologies Form-based specification Auto-generation based on table interpretation Sophisticated conceptualization with TANGO Automated annotation Extraction ontologies Form-based information harvesting Generated pattern-based annotation Simple query specification Free-form queries Generated form-based queries www.deg.byu.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.