The Semantic Web in Ten Passages Harold Boley Institute for Information Technology e- Business New Brunswick, Canada
Passage 1: Meaningful Search in the Billion- Fold Planetary Network
Like searching for a specific grain of sand in two tightly packed 1m x 1m x 1m boxes Current search engines: keyword-based, ranked search (good, but …) Future search engines: “understand” the semantics (answers/services, not just ranked pages) Knowledge representation: moving into focus on the Web
Passage 2: The Search Engine and its Crawler
Crawlers: enter central & frequent words into a huge “address book” You get the “hit list” for word w when you type in w Example Wonder drug for head pain (1,160,000 hits) “Wonder drug for head pain” (no hits) Abmiguities drug = medicine or narcotics head = body part or front or direction pain = ache or hurt or suffering or distress wonder = puzzlement or monumental creation Missing the relationships among the words
Passage 3: Precision and Recall – Conflicting Measures for Search Results
Aspirin (5,860,000 hits) – low precision Aspirin “head pain” (8,040 hits) Better, but still low precision Recall problems: many “headache” pages missed – Aspirin headache (649,000 hits) Aspirin “head pain” OR “head hurt” (583,000 hits) But now what about also “migraine” Query starts to get hard
Passage 4: Semantics – From Common Words to Standard Concepts
Semantically want the concept that can be named “head pain” OR “headache” OR “migraine” Semantic search engine would find the pages “meant” Ideally Recall: complete Precision: perfect as well
Passage 5: Semantic Relationships Between Standard Concepts and …
“Asprin cures head pain” vs. “Asprin causes head pain” Semantic search engine should recognize semantic relationships between concepts “Address book” becomes a “knowledge base” Facts in the knowledge base Asprin --- cures --- headache Subject PRECICATE Object Increases both recall and precision
Passage 5: … and Knowledge Derivation
Suppose you want “Asprin CURES Headache AND Asprin CAUSES Headache” Could store fact: “Asprin AMB Headache” (AMB = ambivalent) Could instead write a rule IF pharmaceutical CURES sickness AND pharmaceutical CAUSES sickness THEN pharmaceutical AMB sickness Semantic search engine would find pages satisfying the IF part and hence necessarily also the THEN part How? Semantic relationships between standard concepts Knowledge representation
Passage 6: Where do the Standard Concepts and Predicates Come from?
Experts of a specialized field agree to share normative definitions of their concepts and predicates Shared, explicit concept catalogues Ontologies Hierarchical superconcept-subconcept dubbed most important: Headache ISA Pain
Passage 7: Assigning Concepts/Predicates to Common Words: How?
Build ontologies – tough job! Automating the building of ontologies is very difficult – why?
Passage 7: Assigning Concepts/Predicates to Common Words: How? Build ontologies – tough job! Automating the building of ontologies is very difficult – why? Meaning often depends on context Granularity: e.g. general “stomach ache” or specific “appendix attack” Sentence analysis – NLP known to be hard Audio and video – can’t apply textual techniques Sometimes necessary to extend ontology, which only domain experts should be allowed to do Semi-automatic construction System proposes concepts – expert agrees/fixes/completes TANGO
Passage 8: Where Will the Assignments be Stored as Metadata?
External E.g. the “address book” Advantages Possible to annotate pages not owned Better for multiple annotations for different ontologies More convenient for queries Internal Annotations in the pages themselves Advantages Can be updated when page changes Compromise: only URL pointer placed in page Change/maintenance problem for annotations
Passage 9: Refined Standard Concepts Inherit Refined Semantic Relationships
Suppose: Headache ISA Pain; Sporadic-Headache ISA Headache; Chronic-Headache ISA Heacache Aspirin --- CURES --- Headache Now suppose someone decides this should be different: Aspirin --- CURES --- Sporadic-Headache Now, what about all the annotated pages before the change? (two possibilities) UPDATE all old annotations: But now domain experts should decide which was meant for each “Headache” occurrence – “Sporadic-Headache” or “Chronic- Headache” SWITCH ontologies but access old via old: eventually leads to versions of versions and … problems
Passage 10: Library Catalogues as Metadata Ontologies
“UPDATE” is the “nicer” solution, but many libraries have chosen “SWITCH” – you sometimes have to search in two or more catalogues Will eventually become a big problem Competing ontologies Complementary ontologies Could be overwhelmed by ontologies