The Semantic Web. Schedule for this evening Review of the survey – Summary. Discussion if wanted Some other ways to move content from place to place –

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
XML: Extensible Markup Language
Web Development & Design Foundations with XHTML
CS570 Artificial Intelligence Semantic Web & Ontology 2
RDF Tutorial.
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
F DIGITAL MEDIA: COMMUNICATION AND DESIGN INTRODUCTION TO XML AND XHTML.
Semantic Web Presented by: Edward Cheng Wayne Choi Tony Deng Peter Kuc-Pittet Anita Yong.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
1 The World Wide Web. 2  Web Fundamentals  Pages are defined by the Hypertext Markup Language (HTML) and contain text, graphics, audio, video and software.
Introduction to XML This material is based heavily on the tutorial by the same name at
A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
DATA COMMUNICATION DONE BY: ALVIN SAMPATH CARLVIN SAMPATH.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Practical RDF Chapter 1. RDF: An Introduction
Metadata Harvesting Interoperable digital collections.
1 HTML References: A HTML Tutorial: /HTMLPrimer.html
Metadata Harvesting Interoperable digital collections.
Metadata Harvesting Interoperable digital collections.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
INTRODUCTION. What is HTML? HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a programming language,
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
HTML: Hyptertext Markup Language Doman’s Sections.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
Semantic Web - an introduction By Daniel Wu (danielwujr)
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Basics of Information Retrieval W Arms Digital Libraries 1999 Manuscript as background reading.
The OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
OWL Representing Information Using the Web Ontology Language.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
Open Archives Initiative Protocol for Metadata Harvesting.
Basics of Web Based Computing. The Architecture The user’s system A Web Server What’s inside? Server software Apache or other Resources to be accessible.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.
Metadata Harvesting Interoperable digital collections.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
From XML to DAML – giving meaning to the World Wide Web Katia Sycara The Robotics Institute
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Unit 4 Representing Web Data: XML
Getting a Leg Up on OAI for the NSDL
Building the Semantic Web
Chapter 7 Representing Web Data: XML
OAI and Metadata Harvesting
Attributes and Values Describing Entities.
Chapter 16 The World Wide Web.
CSE591: Data Mining by H. Liu
Presentation transcript:

The Semantic Web

Schedule for this evening Review of the survey – Summary. Discussion if wanted Some other ways to move content from place to place – FTP – OAI – PMH Then, the Semantic Web – An introduction to things to come

Survey Summary on Word document Responses and any comments

Other ways to move materials in the Internet FTP – File Transfer Protocol – One of the oldest of the Internet protocols – Originally, command line interface – Now, many GUI versions Host must run a server version that listens on port 20 (default) Client requests a session, user logs in, issues a sequence of commands including get and put. Brief demonstration

Open Archives Intiative Generally oriented toward sharing information about resources in collections accessible on the Internet There is a protocol for sharing – Based on XML so we will look at that first

Semantic Web Semantics refers to meaning. The semantic web aims to have enough information about a resource available that a program can use resources as if the program could understand what the resources are. – Of course, the program does not really “understand” in the human sense. – However, if it has enough information, it can follow rules and behave in ways that are consistent with understanding what it is working with.

Markup HTML is a markup language – not the first, by any means Tags in HTML give clues to the reader (browser or other program) about what to do in displaying or presenting the marked text. – emphasize, make stand out (like a title or section head), break – Some allowance for meta tags HTML has been stretched beyond its original design

XML Simplified version of SGML – Language for defining languages (markup languages) – HTML is now XHTML and is an XML language – XML allows you to make up your own descriptive language

Metadata Critical part of the description of content and resources What does metadata look like? Metadata is data about data – Information about a resource, encoded in the resource or associated with the resource. The language of metadata: XML – eXtensible Markup Language

XML XML is a markup language XML describes features There is no standard XML Use XML to create a resource type Separately develop software to interact with the data described by the XML codes. Source: tutorial at w3school.com

XML rules Easy rules, but very strict First line is the version and character set used: – The rest is user defined tags Every tag has an opening and a closing

Element naming XML elements must follow these naming rules: –Names can contain letters, numbers, and other characters –Names must not start with a number or punctuation character –Names must not start with the letters xml (or XML or Xml..) –Names cannot contain spaces

Elements and attributes Use elements to describe data Use attributes to present information that is not part of the data – For example, the file type or some other information that would be useful in processing the data, but is not part of the data.

Repeating elements Naming an element means it appears exactly once. Name+ means it appears one or more times Name* means it appears 0 or more times. Name? Means it appears 0 or one time.

Parts of an XML document Elements – The components of an XML document – Some contain other parts, some are empty Ex in HTML: “br” or “table” in XML “ingredient” Attributes – Information about elements, not data Ex in HTML “src=” in XML “scale=” Entities – Special characters or strings with pre-assigned meaning Ex in HTML &nbsp for non-breaking space PCDATA – Parsed Character data: text that will be parsed and interpreted by the reader. Tags and entities will be expanded and used in presentation. CDATA – Character data: text that will not be parsed and interpreted. It will be displayed exactly as provided. The HTML examples are familiar; the XML examples are made up – dependent on the specific XML scheme used

Using XML - an example Define the fields of a recipe collection: ISO 8859 is a character set. See

Processing the XML data How do we know what to do with the information in an XML file? – Document Type Definition (DTD) Put in the same file as the data -- immediate reference Put a reference to an external description Provides the definition of the legitimate content for each element

Document Type Definition <!DOCTYPE recipe [ ]> Repeat 0 or more times

Meringue cookies 3 egg whites 1 cup sugar 1 teaspoon vanilla 2 cups mini chocolate chips Beat the egg whites until stiff. Stir in sugar, then vanilla. Gently fold in chocolate chips. Place in warm oven at 200 degrees for an hour. Alternatively, place in an oven at 350 degrees. Turn oven off and leave overnight. Not the way that I want to see a recipe in a magazine! What could we do with a large collection of such entries? How would we get the information entered into a collection? External reference to DTD

Spot Check Design an XML schema for an application of your choice. Keep it simple. Examples -- address book, TV program listing, DVD collection, … Work in pairs and discuss your choice and your solution

Another example A paper with content encoded with XML: First few lines: Standards E-learning and their possible support for a rich pedagogic approach in a 'Integrated Learning' context Rodolophe Borer "ePBLpaper11.dtd” shown on next slide

%foreign-dtd; Source: No longer there

Resource sharing On your projects, you had to go looking for the materials that you need You look at the site, see what is there, consider how it could be used in your project. On a large scale, that does not work so well. It would be nice to query a site and ask what is there that might be of interest to us.

Distributed Resources Multiple Services Service provider -- search, browse, compare, etc. Data provider One service provider gathers information about data and uses it to provide services

Open Archives Initiative (OAI) Web-based – Uses HTTP to communicate between sites Centralized server – Services provided from a site that has already gathered the information it needs for those services from a distributed collection of sites.

OAI PMH Interoperability through Metadata Exchange The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.

OAI PMH verbs Identify ListMetadataformats ListSets Listidentifiers Listrecords Getrecord

Open Archives Initiative Protocol for Metadata Harvesting -- OAI-PMH Repository OAI Harvester OAI HTTP req (OAI verb) HTTP resp (XML) OAI PMH defines an interface between the Harvester and any number of Repositories Metadata Provider Service Provider Implemented as CGI, ASP, PHP, or other Any system may serve as a harvester, repository, or both

OAI - PMH components Service Providers and Data Providers Requests and Responses

Records Metadata of a resource. Three parts – Header (required) Identifier (required: 1 only) Datestamp (required: 1 only) setSpec elements (optional: 0, 1, or more) Status attribute for deleted item – Metadata (required) XML encoded metadata with root tag, namespace Repositories must support Dublin Core, other formats optional – “About” statement (optional) Right statements Provenance statements

Dublin Core elements see: Title Creator Subject - C Description Publisher Contributor Date Type - C Format - C Identifier Source Language Relation Coverage - C Rights Rights Management information Space, time, jurisdiction. C = controlled vocabulary recommended. Ref. to related resource Standards RFC 3066, ISO639 Unambiguous ID Ex: collection, dataset, event, image YYYY-MM-DD, ex. Entity primarily responsible for making content of the resource Entity making the resource available Contributor to content of the resource What is needed to display or operate the resource.

Identifiers Globally unique identifier Valid URI – Examples oai: : oai:etd.vt.edu:etd – Must resolve to one item No duplicates No reuse of previously used identifiers

Datestamps Date of last modification of a record – Used only for harvesting (meta metadata?) Mandatory for each item in the repository Two levels of granularity possible – YYYY-MM-DD – YYYY-MM-DDThh:mm:ssZ T … Z = Time zone -- must be GMT Allows harvesting incrementally -- get only what is new since last visit – Accessed by arguments from and until

The OAI-PMH verbs Each requests a specific response from a data repository

Identify Function: Description of the archive Example: Parameters: none Errors/exceptions: – badArgument (there should not be any) Response format: Element Example Ordinality ‡ repositoryName My Archive 1 baseURL 1 protocolVersion earliestDatestamp deleteRecords no, transient, persistent 1 granularity YYYY-MM-DD, YYYY-MM-DDThh:mm:ssZ 1 admin + compression deflate, compress * description oai-identifier, eprints, friends, … * ‡ Ordinality: 1 = mandatory, 1 only; + = mandatory, 1 only; * = optional, 0 or more

Actual response from Continued T02:01:52Z OLAC Aggregator no YYYY-MM-DD identity -->

Continued oai OLACA.language-archives.org : oai:ethnologue.com:aaa Open Language Archives Community Philadelphia, U.S.A.

This repository contains all records from OLAC-registered archives. It is intended to be used by services which do not want to harvest individual OLAC archives. Metadata may be used only subject to the access permissions given by the individual archives.

ListMetadataFormats Function: retrieve available metadata formats from archive Example: archive.org/oai-script?verb=ListMetadataFormats& identifier=oai:HUBerlin.de: Parameters: identifier (optional) Errors/exceptions: – badArgument – idDoesNotExist – noMetadataFormats

− T01:58:06Z bin/olaca3.pl − olac archives.org/OLAC/1.0/ − olac_display archives.org/OLAC/1.0/ − oai_dc Response to olaca3.pl?verb=ListMetadataFormats

ListSets Function: retrieve set structure of a repository Example: archive.org/oai-script?verb=ListSets Parameters: resumptionToken (exclusive) Errors/exceptions: – badArgument – badResumptionToken – noSetHierarchy Sets are optional and are used to divide a repository into separate units that will be of interest to different harvesters.

ListIdentifiers Function: abbieviated form of ListRecords, retrieve only headers Example: archive.org/oai-script?verb=ListIdentifiers&metadataPrefix= oai_dc&from= Parameters: – from (optional) – until (optional) – metadataPrefix (required) – set (optional) – resumptionToken (exclusive) Errors/exceptions: – badArgument – badResumptionToken – cannotDisseminateFormat – noRecordsMatch – noSetHierarchy

ListRecords Function: harvest records from a repository Example: archive.org/oai-script?verb=ListRecords& metadataPrefix=oai_dc&set=biology Parameters: – from (optional) – until (optional) – metadataPrefix (required) – set (optional) – resumptionToken (exclusive) Errors/exceptions: – badArgument – badResumptionToken – cannotDisseminateFormat – noRecordsMatch – noSetHierarchy

GetRecord Function: retrieve an individual metadata record from a repository Example: archive.org/oai-script?verb=GetRecord&identifier=oai:HUBerlin.de: &metadataPrefix=oai_dc Parameters: – Identifier (required) – metadataPrefix (required) Errors/exceptions: – badArgument – cannotDisseminateFormat – idDoesNotExist

Interoperability The goal: communication, without human intervention, between information sources – Books that “talk to each other” Live links for references Knowledge of how to find relevant resources when needed Ability to query other information locations

Protocols Precise rules for interactions between independent processes – Format of the messages Both structure and content – Specified behavior in response to specific messages Many ways to accomplish the same result, but both sides must have the same understanding of the rules of engagement.

Spot Check Make up a protocol Suppose we wanted a kind of command and control protocol so that a master site could cause a satellite site to clear the screen that is displayed to the web. We want the response to be prompt We want the satellite site to confirm receipt of the command and to notify the master when the site screen has been cleared. It should be possible to accomplish this with messages between the two sites and an action at the satellite site.

The Semantic Web Some of these slides come from Lee Giles – Who, in turn, credits Jim Hendler, Carl Lagoze, Jayavel Shanmugasundaram, Sara Cohen, Jonathan Mamou, Yaron Kanza, Mark Sapossnek, Yehoshua Sagiv, Frank van Harmelen

Beyond XML Building with XML, new languages have emerged to – Describe content, and things in general – Relationships between things – Attributes (characteristics) of things The semantic web requires that things be described in sufficient detail that autonomous processes can discover useful things and use them properly

Motivation for the Semantic Web Search engines concepts, not keywords semantic narrowing/widening of queries Shopbots semantic interchange, not screenscraping E-commerce – Negotiation, catalogue mapping, personalization Web Services – Need semantic characterizations to find them Navigation by semantic proximity, not hardwired links.....

Example Try these queries with Google:Google – Distance between Paris and Madrid Google returns: – (The) Largest city of France Google returns: France – Largest City: Paris – (The) Largest city of Spain Google returns: Spain – Largest City: Madrid Now, try these with Google: – Distance between largest city of France and largest city of Spain – Distance between “largest city of France” and “largest city of Spain” – And worst, Distance between “the largest city of France” and “the largest city of Spain” – No result returned by Google! Actually now shows a link to several versions of these slides! Distance between Madrid spain and Paris france COORDINATES +. TOTAL DISTANCE. Madrid, SP, Paris, FR, Miles: Kilometers: Bearing: NE. Madrid, SPAIN...

Semantic Web Stack

RDF and OWL Resource Description Framework (RDF) Web Ontology Language (OWL)

So why not just use XML? No agreement on: – structure is country a: – object? – class? – attribute? – relation? – something else? what does nesting mean? – vocabulary is country the same as nation? Netherlands Amsterdam 020 Netherlands Amsterdam 020 ● Are the above XML documents the same? ● Do they convey the same information? ● Is that information machine-accessible?

“2 nd aim of Semantic Web”: Data integration – Unstructured and sensors, programs, services semi-structured sources (document collections, message traffic, web pages,...) – Structured data without an explicit data schema (non-local databases, data tables, charts and reports,...) – Non-Text collections (image, video, sound,...) – Streams of data Must specify the structure of data resources..

2 nd aim of Semantic Web: Data integration... so a processor can tell how the "attributes" and "values" are related – What is required vs. optional? – How many values for a particular attribute? – What attributes are keys for other attributes? – Which attributes are necessarily related to other attributes and in what way?? – How do the attributes (and values) in one data source map to attributes and values describing another source?

Stack of languages XML: – Surface syntax, no semantics XML Schema: – Describes structure of XML documents RDF: – Datamodel for “relations” between “things” RDF Schema (RDFS): – RDF Vocabulary Definition Language OWL: – A more expressive Vocabulary Definition Language

Semantic web languages today Today there are three semantic web languages – RDF – Resource Description Framework – DAML+OIL – Darpa Agent Markup Language (deprecated) – OWL – Ontology Web Language OWL lit OWL DL OWL Full

RDF is the first Semantic Web language XML Encoding Graph stmt(docInst, rdf_type, Document) stmt(personInst, rdf_type, Person) stmt(inroomInst, rdf_type, InRoom) stmt(personInst, holding, docInst) stmt(inroomInst, person, personInst) Triples RDF Data Model Good for Machine Processing Good For Human Viewing Good For Reasoning RDF is a simple language for building graph based representations

The RDF Data Model An RDF document is an unordered collection of statements, each with a subject, predicate and object (aka triples) A triple can be thought of as a labelled arc in a graph Statements describe properties of web resources A resource is any object that can be pointed to by a URI: – a document, a picture, a paragraph on the Web, … – E.g., – a book in the library, a real person (?) – isbn:// –…–… Properties themselves are also resources (URIs)

RDF without a Schema Object ->Attribute-> Value triples objects are web-resources Value is again an Object: triples can be linked data-model = graph pers05 ISBN... Author-of pers05 ISBN... Author-of MIT ISBN... Publ- by Author-of Publ- by

Bluffer’s guide to RDF (2) Every identifier is a URL = world-wide unique naming! Has XML syntax Any statement can be an object graphs can be nested pers05 ISBN... Author-of NYT claims ISBN...

What does RDF Schema add? Defines vocabulary for RDF Organizes this vocabulary in a typed hierarchy Class, subClassOf, type Property, subPropertyOf domain, range Person AuthorReader subClassOf Lynda type communicatesTo domain range Frank type communicatesTo

Which Semantic Web? Version 1: "Semantic Web as Web of Data" (TBL) recipe: expose databases on the web, use XML, RDF, integrate metadata from: – expressing DB schema semantics in machine interpretable ways enable integration and unexpected re-use

Which Semantic Web? Version 2: “Enrichment of the current Web” recipe: Annotate, classify, index metadata from: – automatically producing markup: named-entity recognition, concept extraction, tagging, etc. enable personalization, search, browse,..

Which Semantic Web? Version 1: “Semantic Web as Web of Data” Version 2: “Enrichment of the current Web” Different use-cases Different techniques Different users

The Evolving Web Web of Knowledge HyperText Markup Language HyperText Transfer Protocol Resource Description Framework eXtensible Markup Language Self-Describing Documents Foundation of the Current Web Proof, Logic and Ontology Languages Shared terms/terminology Machine-Machine communication Berners-Lee, Hendler; Nature, 2001 DOCUMENTS DATA/PROGRAMS