Passage 1 1. Meaningful Search

Slides:



Advertisements
Similar presentations
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Advertisements

IPY and Semantics Siri Jodha S. Khalsa Paul Cooper Peter Pulsifer Paul Overduin Eugeny Vyazilov Heather lane.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System modeling 2.
Human Centric Computing (Comp 106)Assignment 2 PROPOSAL 21 Our task is to design a new user interface for the Liverpool University Library catalogue system.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
Overview of Software Requirements
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
The Semantic Web in Ten Passages Harold Boley Institute for Information Technology e- Business New Brunswick, Canada.
Domain Modeling (with Objects). Motivation Programming classes teach – What an object is – How to create objects What is missing – Finding/determining.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Adding metadata to web pages Please note: this is a temporary test document for use in internal testing only.
Developing facets in UDC for online retrieval Claudio Gnoli (University of Pavia) Aida Slavic (UDC Consortium) 8th NKOS Workshop, Corfu, 1 Oct 2009.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Evaluating Centralized, Hierarchical, and Networked Architectures for Rule Systems Benjamin Craig University of New Brunswick Faculty of Computer Science.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
Semantic Web - an introduction By Daniel Wu (danielwujr)
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Ontology-Centered Personalized Presentation of Knowledge Extracted from the Web Ralitsa Angelova.
Company LOGO Digital Infrastructure of RPI Personal Library Qi Pan Digital Infrastructure of RPI Personal Library Qi Pan.
Majid Sazvar Knowledge Engineering Research Group Ferdowsi University of Mashhad Semantic Web Reasoning.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Knowledge Representation Fall 2013 COMP3710 Artificial Intelligence Computing Science Thompson Rivers University.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
International Workshop 28 Jan – 2 Feb 2011 Phoenix, AZ, USA Ontology in Model-Based Systems Engineering Henson Graves 29 January 2011.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
Knowledge Representation
Knowledge Representation Techniques
Algorithms and Problem Solving
COMP6215 Semantic Web Technologies
Software Engineering Lecture 4 System Modeling The Analysis Stage.
Requirements Engineering Process
Presentation on Software Requirements Submitted by
Chapter 5 – Requirements Engineering
Improving Data Discovery Through Semantic Search
ece 627 intelligent web: ontology and beyond
SOFTWARE DESIGN AND ARCHITECTURE
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Web Ontology Language for Service (OWL-S)
Abstract descriptions of systems whose requirements are being analysed
By Dr. Abdulrahman H. Altalhi
RDF For Semantic Web Dhaval Patel 2nd Year Student School of IT
Searching for and Accessing Information
ece 720 intelligent web: ontology and beyond
CSc4730/6730 Scientific Visualization
Electrical and Computer Engineering Department
IL Step 3: Using Bibliographic Databases
Metadata Framework as the basis for Metadata-driven Architecture
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Ontology-Based Approaches to Data Integration
Sample elicitation scripts Execution of the elicitation scripts
Introduction to Information Retrieval
The Database Environment
Algorithms and Problem Solving
Knowledge Representation
Subject Name: SOFTWARE ENGINEERING Subject Code:10IS51
The Database Environment
Presentation transcript:

The Semantic Web in Ten Passages Harold Boley NRC-IIT Fredericton University of New Brunswick Short Research Presentations For New Graduate Students 2 October 2002 (Revised: 10 September 2003)

Passage 1 1. Meaningful Search Conventional Web expanded into a Semantic Web:   Search engines in future should ‘understand’ meaning of Web pages far enough to enable ‘sensible’ queries At the moment semantic search engines only exist for specialized areas of knowledge Shall use conceptual representation of Web pages: Help people as direct users Help ‘agent systems’ of AI: based on core technology of Semantic Web, offer higher Web services such as info comparison, integration, abstraction, or trading Passage 1 Ten Passages 2-Oct-02

2. The Search Engine and its Crawler Crawler: A program that periodically navigates across Web pages and for every page analyses text, entering central words into ‘address book’ Every word in ‘address book’ refers to a list of all the pages in which this word was discovered by the crawler Passage 2 You get 'hit list' of pages after you type in that (compound) word   ====== Google-Search: "wonder drug" ====== 24,000 pages, too low in “precision”, e.g. ambiguous drug medicine? narcotics? both? Ten Passages 2-Oct-02

3. Precision and Recall – Conflicting Measures for Search Results (I) Sample goal: Check Aspirin remedy for head pain   ====== Google-Search: Aspirin ====== 625,000 pages, too low in “precision”, although unambiguous Passage 3 Crawler enters all important words of analysed page into ‘address book’, so you can now narrow down search by typing combination of words in the search line. Then receive a page only if crawler has discovered in it all of these search words   ====== Google-Search: Aspirin “head pain” ====== 79,900 pages, “precision” improved Ten Passages 2-Oct-02

3. Precision and Recall – Conflicting Measures for Search Results (II) But wait: Have we perhaps cut out pages because only wrote “head pain” but not “head hurt” which means same? Indeed: In improving the precision measure we have seriously forfeited the recall measure Passage 3   === Google-Search: Aspirin “head hurt” OR “head pain === 84,300 pages, “recall” improved Ten Passages 2-Oct-02

4. Semantics – From Common Words to Standard Concepts “Semantically”, i.e. ‘with respect to meaning’, we look for the concept that can be named in pages by “head pain” OR “head hurt” OR another such word ‘Semantic Search Engine’ could use one semantic standard concept for whole group of such words, named, e.g., by capitalized English term “Headache” Passage 4 ‘Address book’ internally only uses “Headache”. But this standard concept refers to all pages in which crawler found “head pain” OR “head hurt” OR another such common word  ====== Semantic Search: Aspirin Headache ====== “recall” complete! – “precision” perfect? Ten Passages 2-Oct-02

5. Semantic Relationships Between Standard Concepts & … (I) Wanted pages claiming that Aspirin cures head pain – not pages claiming that Aspirin causes head pain Semantic relationships: ‘address book’ “knowledge base” Contains so-called ‘facts’ like “Aspirin CURES Headache” (here triple of the form “Subject PREDICATE Object”) Passage 5 All-capitalized term “CURES” is standard predicate, standing for common relational words in pages such as “remedies”, “heals”, etc.  ===== Semantic Search: Aspirin CURES Headache ===== “precision” perfect! Ten Passages 2-Oct-02

5. Semantic Relationships Between Standard Concepts & … (II) Some pages claim both semantic relationships, the curing and the causing one  === Semantic Search: Aspirin CURES Headache AND Aspirin CAUSES Headache === Passage 5 Describe such pages with a further standard predicate “AMB”, even if they do not contain a corresponding common word such as “ambivalent”, “conflicting” etc.  == Semantic Search: Aspirin AMB Headache === Store “Aspirin AMB Headache” as a fact in ‘address book’? Ten Passages 2-Oct-02

5. … & Knowledge Derivation Better: Logic languages, e.g. RuleML, instead allow this triple to be derived from the two stored facts with a so-called rule A special ‘If-then’ derivation like IF Aspirin CURES Headache AND Aspirin CAUSES Headache THEN Aspirin AMB Headache is performed with the general ‘IF-THEN’ rule (‘?’ for variables) IF ?Pharm CURES ?Sick AND ?Pharm CAUSES ?Sick THEN ?Pharm AMB ?Sick via ‘bindings’ like ‘?Pharm = Aspirin’ and ‘?Sick = Headache’ Passage 5 Rules explicitly deduce knowledge (here on ‘ambivalence’) implicitly hidden in the facts (here in ‘cures’ plus ‘causes’) In parallel, they can find every page that fulfills the ‘IF’ part, hence also the ‘THEN’ part (here each “AMB” page) Ten Passages 2-Oct-02

6. Where Do the Standard Concepts and Standard Predicates Come from? Experts of field, in this case medicine, have to agree on standard definitions of connected concepts and predicates Hierarchical, superconcept-subconcept connection is most important   Example: Pain-Headache connection puts Headache below Pain: Pain | Headache Passage 6 For such shared explicit concept catalogues AI often borrows the expression “ontologies” from philosophy Ten Passages 2-Oct-02

7. How Does One Assign the Standard Concepts/Predicates to Common Words? Ideally, crawler would navigate through pages for important common words and assign right standard concepts and standard predicates to them fully automatically But such full automation is very difficult Interactive classification of pages together with experts: Passage 7 The crawler for a given page proposes standard concepts, some with semantic relationships via standard predicates At least for unclear cases these will then be corrected and if necessary completed by experts Ten Passages 2-Oct-02

8. Where Will the Assignments be Stored as Metadata? Two principal possibilities for storing these metadata: “EXTERNAL”: ‘Address book’ can store standard concept/ relationship together with its assignment to all pages with the corresponding common words “INTERNAL”: Pages themselves can store their own descriptive standard concepts/relationships (“annotations”) Passage 8 Advantage of “EXTERNAL”, disadvantage of “INTERNAL”: Only by separating metadata from pages themselves is it possible to describe pages one does not own Advantage of “INTERNAL”, disadvantage of “EXTERNAL”: For every page change affected annotations can be immediately updated as well without searching for metadata Ten Passages 2-Oct-02

9. Refined Standard Concepts Inherit Refined Semantic Relationships (I) What happens when standard concepts or semantic relationships change, e.g. through concept refinements following new scientific discoveries? E.g., split our sample standard concept Headache into subconcepts such as Sporadic/Chronic-Headache; we agree on the following ontology, including our semantic relationship “Aspirin CURES Headache”: Pain | Aspirin-------CURES------->Headache / \ / \ / \ Sporadic-Headache Chronic-Headache Passage 9 Ten Passages 2-Oct-02

9. Refined Standard Concepts Inherit Refined Semantic Relationships (II) Relationship “Aspirin CURES Headache” may be refined for one or both subconcepts: Pain | Headache / \ / \ / \ Aspirin--CURES-->Sporadic-Headache Chronic-Headache | ^ | | -----CURES--------------------------- Passage 9 If relation holds for all subconcepts (here: Sporadic/Chronic-Headache), it can also be left at the superconcept (Headache), from where it is automatically ‘inherited’ to the subconcepts on demand only (similarly as in OO programming) Ten Passages 2-Oct-02

9. Refined Standard Concepts Inherit Refined Semantic Relationships (III) As a result of such concept refinements two principal possibilities arise for pages classified by them: “UPDATE”: Try corresponding retroactive updates to metadata of all affected ‘old’ pages. Domain experts should decide whether one or more subconcept such as Sporadic-Headache and Chronic-Headache were ‘meant’ or whether their old common superconcept Headache remains correct Passage 9 “SWITCH”: Switch metadata ontology at certain points in time, continue to access ‘old’ pages via ‘old’ metadata, and only for ‘new’ pages use ‘new’ metadata. Headache would stay unrefined as a standard concept for an old page, even if domain experts would immediately notice that it were, e.g., only about Sporadic-Headache Ten Passages 2-Oct-02

10. Library Catalogues as Metadata Ontologies “UPDATE” would be ‘nicer’ solution, but many libraries have chosen solution “SWITCH”, i.e. put up with users having to search in two or more catalogues sometimes The Semantic Web will not solve this problem either, but both possible solutions, “UPDATE” and “SWITCH”, will be supported by software tools of the Semantic Web Passage 10 Conversely, the Semantic Web can learn a lot from Library Sciences. Initiatives – e.g. within Math-Net and CISTI – attempt to bring both together The Semantic Web, on the basis of AI, is a new subfield of computer science with various further interdisciplinary relations, e.g. to logic, linguistics, and cognitive science Ten Passages 2-Oct-02