© Copyright 2008 STI INNSBRUCK Semantic Web Introduction Dieter Fensel Ioan Toma
Where are we? #DateTitle 1Introduction 2Semantic Web architecture 3RDF and RDFs 4Web of hypertext (RDFa, Microformats) and Web of data 5Semantic annotations 6Repositories and SPARQL 7OWL 8RIF 9Web-scale reasoning 10Social Semantic Web 11Ontologies and the Semantic Web 12SWS 13Tools 14Applications 15Exam 2
What is this course about? Semantic Web –Web as it is today, Web as it “should” be tomorrow The evolution of the Web –Web of hypertext, Web of Data, Social Web Semantic Web languages –RDF, RDFs, OWL, RIF Query languages for Semantic Web –SPARQL Ontologies and semantic annotations Semantic Web Services Tools and applications for Semantic Web 3
Course Organization Course is organized into: –X lectures (Thursday..-..) –Y tutorials (Tuesday..-..) The lecturers are: –Dieter Fensel –A B The tutors are: –C D 4
Course material Web site: innsbruck.at/teaching/courses/ws200910/deta ils/?title=semantic-web –Slides available online before each lecture Mailing list: 5
Examination Final grade: –75% Exam –25% Tutorial Exam grade: score grade
Agenda 1.Motivation 2.Technical solutions, illustrations, and extensions 1.Introduction 2.Semantic Web – architecture and languages 3.Semantic Web - data 4.Semantic Web - processes 5.Industries & the Semantic Web 3.Summary 4.References 7
MOTIVATION 8
Today Web The current Web represents information using –natural language (English, German, Italian,…) –graphics, multimedia, page layout Humans can process this easily –can deduce facts from partial information –can create mental associations –are used to various sensory information 9
Today Web Tasks often require to combine data on the Web –hotel and travel information may come from different sites –searches in different digital libraries –etc. Again, humans combine these information easily –even if different terminologies are used! 10
However… Machines are ignorant! –partial information is unusable –difficult to make sense from, e.g., an image –drawing analogies automatically is difficult –difficult to combine information automatically is same as ? how to combine different XML hierarchies? – … 11
How to improve current Web? Increasing automatic linking among data Increasing recall and precision in search Increasing automation in data integration … Adding semantics to data and services is the solution! 12
Five approaches to semantics Tagging Statistics Linguistics Semantic Web Artificial Intelligence Semantic-Web.pdf 13
The tagging approach Pros –easy for users to add and read tags –tags are just strings –no algorithms or ontologies to deal with –no technology to learn Cons –easy for users to add and read tags –tags are just strings –no algorithms or ontologies to deal with –no technology to learn Del.icio.us Flickr Wikipedia 14
The statistical approach Pros: –pure mathematical algorithms –massively scalable –language independent Cons: –no understanding of the content –hard to craft good queries –best for finding really popular things – not good at finding needles in haystacks –not good for structured data Google Lucene 15
The statistical approach Lucene “a high-performance, full-featured text search engine library written entirely in Java” Features –Ranked Searching –Flexible Queries Phrases, Wildcards, etc… –Field-specific Queries eg. title, artist, album –Sorting More details at: 16
The statistical approach Google runs on a distributed network of thousands of low-cost computers and can therefore carry out fast parallel processing. Google has three distinct parts: –Googlebot, a web crawler that finds and fetches web pages. –The indexer that sorts every word on every page and stores the resulting index of words in a huge database. –The query processor, which compares your search query to the index and recommends the documents that it considers most relevant. More details at: 17
The linguistic approach Pros: –true language understanding –extract knowledge from text –best for search for particular facts or relationships –more precise queries Cons: –computationally intensive –difficult to scale –lots of errors –language-dependent Powerset Hakia Inxight Attensity … 18
The statistical approach Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis. Powerst approached based natural language processing to search queries can be expressed as keywords, phrases or simple questions aggregates information from across multiple articles 19
The statistical approach Inxight extraction and retrieval tools for gathering, analyzing, and sharing information using computational linguistics automatic text extraction of entities, relations, events, etc, to create metadata Attensity allows users to extract and analyze facts like who, what, where, when and why Allows users to drill down to understand people, places and events and how they are related It then creates output in XML and in a structured relational data format that is fused with existing structured data 20
The Semantic Web approach Pros: –more precise queries –smarter apps with less work –not as computationally intensive –share & link data between apps –works for both unstructured and structured data Cons: –lack of tools –difficult to scale –who makes all the metadata? FOAF Project DBpedia Project Metaweb … Cons will be solved, It is just a matter of time! Cons will be solved, It is just a matter of time! 21
The Artificial Intelligence approach Pros: –smart in narrow domains –answer questions intelligently –reasoning and learning Cons: –computationally intensive –difficult to scale –extremely hard to program –does not work well outside of narrow domains –training takes a lot of work Cycorp 22
TECHNICAL SOLUTIONS, ILLUSTRATIONS, AND EXTENSIONS 23
INTRODUCTION TO SEMANTIC WEB 24
Static WWW URI, HTML, HTTP The Vision More than 2 billion users more than 50 billion pages 25
WWW URI, HTML, HTTP Serious problems in information finding, information extracting, information representing, information interpreting and and information maintaining. Semantic Web RDF, RDF(S), OWL Static The Vision (contd.) 26
What is the Semantic Web? “The Semantic Web is an extension of the current web in which information is given well- defined meaning, better enabling computers and people to work in cooperation.” T. Berners-Lee, J. Hendler, O. Lassila, “The Semantic Web”, Scientific American, May 2001 “…allowing the Web to reach its full potential…” with far-reaching consequences “The next generation of the Web” 27
What is the Semantic Web? The next generation of the WWW Information has machine-processable and machine- understandable semantics Not a separate Web but an augmentation of the current one The backbone of Semantic Web are ontologies 28
Ontology definition formal, explicit specification of a shared conceptualization commonly accepted understanding conceptual model of a domain (ontological theory) unambiguous terminology definitions machine-readability with computational semantics Gruber, “Toward principles for the design of ontologies used or knowledge sharing?”, Int. J. Hum.-Comput. Stud., vol. 43, no. 5-6,
… “well-defined meaning” … “An ontology is an explicit specification of a conceptualization” Gruber, “Toward principles for the design of ontologies used for knowledge sharing?”, Int. J. Hum.-Comput. Stud., vol. 43, no. 5-6,1995. Ontologies are the modeling foundations to Semantic Web –They provide the well-defined meaning for information 30
… explicit, … specification, … conceptualization, … An ontology is: A conceptualization –An ontology is a model of the most relevant concepts of a phenomenon from the real world Explicit –The model explicitly states the type of the concepts, the relationships between them and the constraints on their use Formal –The ontology has to be machine readable (the use of the natural language is excluded) Shared –The knowledge contained in the ontology is consensual, i.e. it has been accepted by a group of people. Studer, Benjamins, D. Fensel, “Knowledge engineering: Principles and methods”, Data Knowledge Engineering, vol. 25, no. 1-2,
Ontology example Concept conceptual entity of the domain Property attribute describing a concept Relation relationship between concepts or properties Axiom coherency description between Concepts / Properties / Relations via logical expressions Person Student Professor Lecture isA – hierarchy (taxonomy) name matr.-nr. research field topic lecture nr. attends holds holds(Professor, Lecture) => Lecture.topic = Professor.researchField 32
Top Level O., Generic O. Core O., Foundational O., High-level O, Upper O. Task & Problem- solving Ontology Application Ontology Domain Ontology [Guarino, 98] Formal Ontology in Information Systems describe very general concepts like space, time, event, which are independent of a particular problem or domain describe the vocabulary related to a generic domain by specializing the concepts introduced in the top-level ontology. describe the vocabulary related to a generic task or activity by specializing the top-level ontologies. the most specific ontologies. Concepts in application ontologies often correspond to roles played by domain entities while performing a certain activity. Types of ontologies 33
Types of ontologies - examples Top Level/Upper ontologies: –Cyc, DOLCE, SUMO, DublinCore Domain ontologies: –medicine, telecom ontologies, etc. Task ontologies: –diagnosing, selling, scheduling ontologies Application ontologies: –Cell Cycle Ontology (CCO) 34
The Semantic Web is about… Web Data Annotation –connecting (syntactic) Web objects, like text chunks, images, … to their semantic notion (e.g., this image is about Innsbruck, Dieter Fensel is a professor) Data Linking on the Web (Web of Data) –global networking of knowledge through URI, RDF, and SPARQL (e.g., connecting my calendar with my rss feeds, my pictures,...) Data Integration over the Web –seamless integration of data based on different conceptual models (e.g., integrating data coming from my two favorite book sellers) 35
Web Data Annotating 36
Data Linking on the Web 37
Data Linking on the Web Linked Open Data statistics: –data sets: 108 –total number of triples: –total number of links between data sets:
Data linking on the Web principles Use URIs as names for things –anything, not just documents –you are not your homepage –information resources and non-information resources Use HTTP URIs –globally unique names, distributed ownership –allows people to look up those names Provide useful information in RDF –when someone looks up a URI Include RDF links to other URIs –to enable discovery of related information 39
Data Integration over the Web Same URI = Same resource 40
SEMANTIC WEB – ARCHITECTURE AND LANGUAGES 41
Web Architecture Things are denoted by URIs Use them to denote things Serve useful information at them Dereference them 42
Semantic Web Architecture Give important concepts URIs Each URI identifies one concept Share these symbols between many languages Support URI lookup 43
Semantic Web - Data Topics covered in the course 44
URI and XML Uniform Resource Identifier (URI) is the dual of URL on Semantic Web –it’s purpose is to indentify resources eXtensible Markup Language (XML) is a markup language used to structure information –fundament of data representation on the Semantic Web –tags do not convey semantic information 45
RDF and OWL Resource Description Framework (RDF) is the dual of HTML in the Semantic Web –simple way to describe resources on the Web –sort of simple ontology language (RDF-S) –based on triples (subject; predicate; object) –serialization is XML based Ontology Web Language (OWL) a layered language based on DL –more complex ontology language –overcome some RDF(S) limitations 46
SPARQL and Rule languages SPARQL –Query language for RDF triples –A protocol for querying RDF data over the Web Rule languages (e.g. SWRL) –Extend basic predicates in ontology languages with proprietary predicates –Based on different logics Description Logic Logic Programming 47
Semantic Web - Data 48
Semantic Web - Data URIs are used to identify resources, not just things that exists on the Web, e.g. Sir Tim Berners-Lee RDF is used to make statements about resources in the form of triples With RDFS, resources can belong to classes (my Mercedes belongs to the class of cars) and classes can be subclasses or superclasses of other classes (vehicles are a superclass of cars, cabriolets are a subclass of cars) 49
Annotated Content KIM Browser Plugin Web content is annotated using ontologies Content can be searched and browsed intelligently Semantic Web - Data Select one or more concepts from the ontology… … send the currently loaded web page to the Annotation Server 50
Dereferencable URI Disco Hyperdata Browser navigating the Semantic Web as an unbound set of data sources Semantic Web - Data 51
Faceted DBLP uses the keywords provided in metadata annotations to automatically create light-weight topic categorization Semantic Web - Data 52
Semantic Web - Data 53
Semantic Web Data Billing Sales Order Processing Inventory Marketing CRM 43% of businesses resort to manual processes and/or new software when integrating information for reporting 54
Semantic Web Data BillingSales Order Processing InventoryMarketingCRM Semantic Broker Existing legacy systems “wrapped” in semantic technologies Based on lightweight, open standards from W3C Reasoning enables inference of new facts from existing data sources Declarative definition of Business Rules 55
Semantic Web - Processes 56
Processes The Web is moving from static data to dynamic functionality –Web services: a piece of software available over the Internet, using standardized XML messaging systems over the SOAP protocol –Mashups: The compounding of two or more pieces of web functionality to create powerful web applications 57
Semantic Web - Processes 58
Web services and mashups are limited by their syntactic nature As the amount of services on the Web increases it will be harder to find Web services in order to use them in mashups The current amount of human effort required to build applications is not sustainable at a Web scale Semantic Web - Processes 59
The addition of semantics to form Semantic Web Services and Semantically Enabled Service-oriented Architectures can enable the automation of many of these currently human intensive tasks –Service Discovery, Adaptation, Ranking, Mediation, Invocation Frameworks: –OWL-S: WS Description Ontology (Profile, Service Model, Grounding) –WSMO: Ontologies, Goals, Web Services, Mediators –SWSF: Process-based Description Model & Language for WS –SAWSDL (WSDL-S): Semantic annotation of WSDL descriptions Semantic Web - Processes 60
Conceptual Model for SWS Formal Language for WSMO Execution Environment for SWS Ontology & Rule Language for the Semantic Web Semantic Web - Processes More about in Semantic Web Services lecture 61
Industries & the Semantic Web 62
Semantic Web uptake Major companies offer Semantic Web tools or systems using Semantic Web: Adobe, Oracle, IBM, HP, Software AG, GE, Northrop Gruman, Altova, Microsoft, Dow Jones, … 63
Semantic Web uptake Others are using it (or consider using it) as part of their own operations: Novartis, Boeing, Pfizer, Telefónica, … 64
Semantic Web uptake Some of the names of active participants in W3C SW related groups: ILOG, HP, Agfa, SRI International, Fair Isaac Corp., Oracle, Boeing, IBM, Chevron, Siemens, Nokia, Pfizer, Sun, Eli Lilly, … 65
Example I Find the right experts at NASA Expertise locater for nearly 20,000 NASA civil servants using RDF integration techniques over 6 or 7 geographically distributed databases, data sources, and web services… From Kendall Clark, Clark & Parsia, LLC 66
Example II Vodafone live! Integrate various vendors’ product descriptions via RDF –ring tones, games, wallpapers –manage complexity of handsets, binary formats A portal is created to offer appropriate content Significant increase in content download after the introduction From Kevin Smith, Vodafone Group R&D 67
More Examples Semantic Web Case Studies and Use Cases ( –Cultural Heritage –Health Care –Life Sciences –eCommerce –B2B integration –eTourism –… 68
69 Case study: BT Research and Venturing The complexity of supply chains has increased, they involve many players of differing size and function Support for “Operational Support Systems (OSS)” integration using semantic descriptions of system interfaces and messages Internet Service Providers integrate their OSS-s with those of BT (via a gateway) Integration of heterogeneous OSS systems of partners The approach reduces costs and time-to-market; ontologies allow for a reuse of services Integration with Semantic Mediation Courtesy of Alistair Duke, BT, (SWEO Use Case)
Web Roadmap (from Nova Spivack) Connections between people Connections between Information Social Networking Groupware Javascript Weblogs Databases File Systems HTTP Keyword Search USENET Wikis Websites Directory Portals Web PC Era RSS Widgets PC’s Office 2.0 XML RDF SPARQL AJAX FTP IRC SOAP Mashups File Servers Social Media Sharing Lightweight Collaboration ATOM Web 3.0 Web 4.0 Semantic Search Semantic Databases Distributed Search Intelligent personal agents Java SaaS Web 2.0 Flash OWL HTML SGML SQL Gopher P2P The Web The PC Windows MacOS SWRL OpenID BBS MMO’s VR Semantic Web Intelligent Web The Internet Social Web Web OS 70
SUMMARY 71
Summary Semantic Web is not a replacement of the current Web, it’s an evolution of it It aims at automating tasks currently carried out by humans Semantic Web is not AI 2.0 Semantic Web is becoming real (maybe not as we originally envisioned it, but it is) 72
References RDF Primer: RDF Semantics: Information Sharing on the Semantic Web, Heiner Stuckenschmidt and Frank van Harmelen, Springer (2004) Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce, 2nd Edition, Dieter Fensel, Springer (2003) A Semantic Web Primer, (2nd edition), Grigoris Antoniou and Frank van Harmelen, The MIT Press (2008) Weaving the Web, Tim Berners-Lee, HarperCollins (2000) 73