David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge David W. Embley Brigham Young University Provo, Utah, USA
A Web of Pages A Web of Facts Birthdate of my great grandpa Orson Price and mileage of red Nissans, 1990 or newer Location and size of chromosome 17 US states with property crime rates above 1%
Toward a Web of Knowledge Fundamental questions What is knowledge? What are facts? How does one know? Philosophy Ontology Epistemology Logic and reasoning
Ontology Existence asks “What exists?” Concepts, relationships, and constraints
Epistemology The nature of knowledge asks: “What is knowledge?” and “How is knowledge acquired?” Populated conceptual model
Logic and reasoning Principles of valid inference – asks: “What is known?” and “What can be inferred?” For us, it answers: what can be inferred (in a formal sense) from conceptualized data. Find price and mileage of red Nissans, 1990 or newer
Logic and reasoning Principles of valid inference – asks: “What is known?” and “What can be inferred?” For us, it answers: what can be inferred (in a formal sense) from conceptualized data. Find price and mileage of red Nissans, 1990 or newer
Making this Work How? Distill knowledge from the wealth of digital web data Annotate web pages Annotation Annotation … … Fact Fact Fact
Turning Raw Symbols into Knowledge Symbols: $ 11,500 117K Nissan CD AC Data: price(11,500) mileage(117K) make(Nissan) Conceptualized data: Car(C123) has Price($11,500) Car(C123) has Mileage(117,000) Car(C123) has Make(Nissan) Car(C123) has Feature(AC) Knowledge “Correct” facts Provenance
Actualization (with Extraction Ontologies) Find me the price and mileage of all red Nissans – I want a 1990 or newer.
Data Extraction
Semantic Annotation
Free-Form Query
Free-Form Query
Explanation: How it Works Extraction Ontologies Semantic Annotation Free-Form Query Interpretation
Extraction Ontologies Object sets Relationship sets Participation constraints Lexical Non-lexical Primary object set Aggregation Generalization/Specialization Extraction Ontology = conceptual model Object set = collection of instances
Extraction Ontologies Data Frame: Internal Representation: float Values External Rep.: \s*[$]\s*(\d{1,3})*(\.\d{2})? Left Context: $ Key Word Phrase Key Words: ([Pp]rice)|([Cc]ost)| … Basic Idea: Data Frames describe the value the Object Set holds Operators Operator: > Key Words: (more\s*than)|(more\s*costly)|…
Semantic Annotation
Free-Form Query Interpretation Parse Free-Form Query (wrt data extraction ontology) Select Ontology Formulate Query Expression Run Query Over Semantically Annotated Data
Parse Free-Form Query “Find me the and of all s – I want a ” price mileage red Nissan 1996 or newer >= Operator
Select Ontology “Find me the price and mileage of all red Nissans – I want a 1996 or newer”
Formulate Query Expression Conjunctive queries and aggregate queries Mentioned object sets are all of interest. Values and operator keywords determine conditions. Color = “red” Make = “Nissan” Year >= 1996 >= Operator
Formulate Query Expression Let Where Return
Run Query Over Semantically Annotated Data
Conclusion & Current & Future Work Key challenge: simplicity A simple way to annotate web pages Simple but accurate query specification A simple way to create extraction ontologies www.deg.byu.edu