Download presentation
Presentation is loading. Please wait.
1
The Semantic Web
2
Schedule for this evening Review of the survey – Summary. Discussion if wanted Some other ways to move content from place to place – FTP – OAI – PMH Then, the Semantic Web – An introduction to things to come
3
Survey Summary on Word document Responses and any comments
4
Other ways to move materials in the Internet FTP – File Transfer Protocol – One of the oldest of the Internet protocols – Originally, command line interface – Now, many GUI versions Host must run a server version that listens on port 20 (default) Client requests a session, user logs in, issues a sequence of commands including get and put. Brief demonstration
5
Open Archives Intiative Generally oriented toward sharing information about resources in collections accessible on the Internet There is a protocol for sharing – Based on XML so we will look at that first
6
Semantic Web Semantics refers to meaning. The semantic web aims to have enough information about a resource available that a program can use resources as if the program could understand what the resources are. – Of course, the program does not really “understand” in the human sense. – However, if it has enough information, it can follow rules and behave in ways that are consistent with understanding what it is working with.
7
Markup HTML is a markup language – not the first, by any means Tags in HTML give clues to the reader (browser or other program) about what to do in displaying or presenting the marked text. – emphasize, make stand out (like a title or section head), break – Some allowance for meta tags HTML has been stretched beyond its original design
8
XML Simplified version of SGML – Language for defining languages (markup languages) – HTML is now XHTML and is an XML language – XML allows you to make up your own descriptive language
9
Metadata Critical part of the description of content and resources What does metadata look like? Metadata is data about data – Information about a resource, encoded in the resource or associated with the resource. The language of metadata: XML – eXtensible Markup Language
10
XML XML is a markup language XML describes features There is no standard XML Use XML to create a resource type Separately develop software to interact with the data described by the XML codes. Source: tutorial at w3school.com
11
XML rules Easy rules, but very strict First line is the version and character set used: – The rest is user defined tags Every tag has an opening and a closing
12
Element naming XML elements must follow these naming rules: –Names can contain letters, numbers, and other characters –Names must not start with a number or punctuation character –Names must not start with the letters xml (or XML or Xml..) –Names cannot contain spaces
13
Elements and attributes Use elements to describe data Use attributes to present information that is not part of the data – For example, the file type or some other information that would be useful in processing the data, but is not part of the data.
14
Repeating elements Naming an element means it appears exactly once. Name+ means it appears one or more times Name* means it appears 0 or more times. Name? Means it appears 0 or one time.
15
Parts of an XML document Elements – The components of an XML document – Some contain other parts, some are empty Ex in HTML: “br” or “table” in XML “ingredient” Attributes – Information about elements, not data Ex in HTML “src=” in XML “scale=” Entities – Special characters or strings with pre-assigned meaning Ex in HTML   for non-breaking space PCDATA – Parsed Character data: text that will be parsed and interpreted by the reader. Tags and entities will be expanded and used in presentation. CDATA – Character data: text that will not be parsed and interpreted. It will be displayed exactly as provided. The HTML examples are familiar; the XML examples are made up – dependent on the specific XML scheme used
16
Using XML - an example Define the fields of a recipe collection: ISO 8859 is a character set. See http://www.bbsinc.com/iso8859.html
17
Processing the XML data How do we know what to do with the information in an XML file? – Document Type Definition (DTD) Put in the same file as the data -- immediate reference Put a reference to an external description Provides the definition of the legitimate content for each element
18
Document Type Definition <!DOCTYPE recipe [ ]> Repeat 0 or more times
19
Meringue cookies 3 egg whites 1 cup sugar 1 teaspoon vanilla 2 cups mini chocolate chips Beat the egg whites until stiff. Stir in sugar, then vanilla. Gently fold in chocolate chips. Place in warm oven at 200 degrees for an hour. Alternatively, place in an oven at 350 degrees. Turn oven off and leave overnight. Not the way that I want to see a recipe in a magazine! What could we do with a large collection of such entries? How would we get the information entered into a collection? External reference to DTD
20
Spot Check Design an XML schema for an application of your choice. Keep it simple. Examples -- address book, TV program listing, DVD collection, … Work in pairs and discuss your choice and your solution
21
Another example A paper with content encoded with XML: http://tecfaseed.unige.ch/staf18/modules/ePBL/uploads/proj3/paper81.xml http://tecfaseed.unige.ch/staf18/modules/ePBL/uploads/proj3/paper81.xml First few lines: Standards E-learning and their possible support for a rich pedagogic approach in a 'Integrated Learning' context Rodolophe Borer http://tecfa.unige.ch/perso/staf/borer/ "ePBLpaper11.dtd” shown on next slide
22
%foreign-dtd; Source: http://tecfa.unige.ch/staf/staf-j/vuilleum/staf18/p6/ No longer there
23
Resource sharing On your projects, you had to go looking for the materials that you need You look at the site, see what is there, consider how it could be used in your project. On a large scale, that does not work so well. It would be nice to query a site and ask what is there that might be of interest to us.
24
Distributed Resources Multiple Services Service provider -- search, browse, compare, etc. Data provider One service provider gathers information about data and uses it to provide services
25
Open Archives Initiative (OAI) Web-based – Uses HTTP to communicate between sites Centralized server – Services provided from a site that has already gathered the information it needs for those services from a distributed collection of sites.
26
OAI PMH Interoperability through Metadata Exchange The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP. http://www.openarchives.org/pmh/
27
OAI PMH verbs Identify ListMetadataformats ListSets Listidentifiers Listrecords Getrecord
28
Open Archives Initiative Protocol for Metadata Harvesting -- OAI-PMH Repository OAI Harvester OAI HTTP req (OAI verb) HTTP resp (XML) OAI PMH defines an interface between the Harvester and any number of Repositories Metadata Provider Service Provider Implemented as CGI, ASP, PHP, or other Any system may serve as a harvester, repository, or both
29
OAI - PMH components Service Providers and Data Providers Requests and Responses http://www.oaforum.org/tutorial/english/page3.htm#section3
30
Records Metadata of a resource. Three parts – Header (required) Identifier (required: 1 only) Datestamp (required: 1 only) setSpec elements (optional: 0, 1, or more) Status attribute for deleted item – Metadata (required) XML encoded metadata with root tag, namespace Repositories must support Dublin Core, other formats optional – “About” statement (optional) Right statements Provenance statements
31
Dublin Core elements see: http://dublincore.org/documents/dces/ Title Creator Subject - C Description Publisher Contributor Date Type - C Format - C Identifier Source Language Relation Coverage - C Rights Rights Management information Space, time, jurisdiction. C = controlled vocabulary recommended. Ref. to related resource Standards RFC 3066, ISO639 Unambiguous ID Ex: collection, dataset, event, image YYYY-MM-DD, ex. Entity primarily responsible for making content of the resource Entity making the resource available Contributor to content of the resource What is needed to display or operate the resource.
32
Identifiers Globally unique identifier Valid URI – Examples oai: : oai:etd.vt.edu:etd-1234567890 – Must resolve to one item No duplicates No reuse of previously used identifiers
33
Datestamps Date of last modification of a record – Used only for harvesting (meta metadata?) Mandatory for each item in the repository Two levels of granularity possible – YYYY-MM-DD – YYYY-MM-DDThh:mm:ssZ T … Z = Time zone -- must be GMT Allows harvesting incrementally -- get only what is new since last visit – Accessed by arguments from and until
34
The OAI-PMH verbs Each requests a specific response from a data repository
35
Identify Function: Description of the archive Example: http://www.language-archives.org/cgi-bin/olaca3.pl?verb=Identify Parameters: none Errors/exceptions: – badArgument (there should not be any) Response format: Element Example Ordinality ‡ repositoryName My Archive 1 baseURL http://archive.org/oai 1 protocolVersion 2.0 1 earliestDatestamp 1999-01-01 1 deleteRecords no, transient, persistent 1 granularity YYYY-MM-DD, YYYY-MM-DDThh:mm:ssZ 1 adminEmail oai-admin@archive.org + compression deflate, compress * description oai-identifier, eprints, friends, … * ‡ Ordinality: 1 = mandatory, 1 only; + = mandatory, 1 only; * = optional, 0 or more
36
Actual response from http://www.language-archives.org/cgi-bin/olaca3.pl?verb=Identify Continued 2011-11-13T02:01:52Z http://www.language-archives.org/cgi-bin/olaca3.pl OLAC Aggregator http://www.language-archives.org/cgi-bin/olaca3.pl 2.0 haejoong@ldc.upenn.edu 1900-01-01 no YYYY-MM-DD identity -->
37
Continued oai OLACA.language-archives.org : oai:ethnologue.com:aaa http://www.language-archives.org/archive_records/ Open Language Archives Community http://www.language-archives.org/ Philadelphia, U.S.A.
38
This repository contains all records from OLAC-registered archives. It is intended to be used by services which do not want to harvest individual OLAC archives. Metadata may be used only subject to the access permissions given by the individual archives.
39
ListMetadataFormats Function: retrieve available metadata formats from archive Example: archive.org/oai-script?verb=ListMetadataFormats& identifier=oai:HUBerlin.de:3000218 Parameters: identifier (optional) Errors/exceptions: – badArgument – idDoesNotExist – noMetadataFormats
40
− 2006-10-17T01:58:06Z http://www.language-archives.org/cgi- bin/olaca3.pl − olac http://www.language-archives.org/OLAC/1.0/olac.xsd http://www.language- archives.org/OLAC/1.0/ − olac_display http://www.language-archives.org/OLAC/1.0/olac.xsd http://www.language- archives.org/OLAC/1.0/ − oai_dc http://www.openarchives.org/OAI/2.0/oai_dc.xsd http://www.openarchives.org/OAI/2.0/oai_dc/ Response to http://www.language-archives.org/cgi-bin/ olaca3.pl?verb=ListMetadataFormats
41
ListSets Function: retrieve set structure of a repository Example: archive.org/oai-script?verb=ListSets Parameters: resumptionToken (exclusive) Errors/exceptions: – badArgument – badResumptionToken – noSetHierarchy Sets are optional and are used to divide a repository into separate units that will be of interest to different harvesters.
42
ListIdentifiers Function: abbieviated form of ListRecords, retrieve only headers Example: archive.org/oai-script?verb=ListIdentifiers&metadataPrefix= oai_dc&from=2002-12-01 Parameters: – from (optional) – until (optional) – metadataPrefix (required) – set (optional) – resumptionToken (exclusive) Errors/exceptions: – badArgument – badResumptionToken – cannotDisseminateFormat – noRecordsMatch – noSetHierarchy
43
ListRecords Function: harvest records from a repository Example: archive.org/oai-script?verb=ListRecords& metadataPrefix=oai_dc&set=biology Parameters: – from (optional) – until (optional) – metadataPrefix (required) – set (optional) – resumptionToken (exclusive) Errors/exceptions: – badArgument – badResumptionToken – cannotDisseminateFormat – noRecordsMatch – noSetHierarchy
44
GetRecord Function: retrieve an individual metadata record from a repository Example: archive.org/oai-script?verb=GetRecord&identifier=oai:HUBerlin.de: 3000218 &metadataPrefix=oai_dc Parameters: – Identifier (required) – metadataPrefix (required) Errors/exceptions: – badArgument – cannotDisseminateFormat – idDoesNotExist
47
Interoperability The goal: communication, without human intervention, between information sources – Books that “talk to each other” Live links for references Knowledge of how to find relevant resources when needed Ability to query other information locations
48
Protocols Precise rules for interactions between independent processes – Format of the messages Both structure and content – Specified behavior in response to specific messages Many ways to accomplish the same result, but both sides must have the same understanding of the rules of engagement.
49
Spot Check Make up a protocol Suppose we wanted a kind of command and control protocol so that a master site could cause a satellite site to clear the screen that is displayed to the web. We want the response to be prompt We want the satellite site to confirm receipt of the command and to notify the master when the site screen has been cleared. It should be possible to accomplish this with messages between the two sites and an action at the satellite site.
50
The Semantic Web Some of these slides come from Lee Giles – Who, in turn, credits Jim Hendler, Carl Lagoze, Jayavel Shanmugasundaram, Sara Cohen, Jonathan Mamou, Yaron Kanza, Mark Sapossnek, Yehoshua Sagiv, Frank van Harmelen
51
Beyond XML Building with XML, new languages have emerged to – Describe content, and things in general – Relationships between things – Attributes (characteristics) of things The semantic web requires that things be described in sufficient detail that autonomous processes can discover useful things and use them properly
52
Motivation for the Semantic Web Search engines concepts, not keywords semantic narrowing/widening of queries Shopbots semantic interchange, not screenscraping E-commerce – Negotiation, catalogue mapping, personalization Web Services – Need semantic characterizations to find them Navigation by semantic proximity, not hardwired links.....
53
Example Try these queries with Google:Google – Distance between Paris and Madrid Google returns: – (The) Largest city of France Google returns: France – Largest City: Paris – (The) Largest city of Spain Google returns: Spain – Largest City: Madrid Now, try these with Google: – Distance between largest city of France and largest city of Spain – Distance between “largest city of France” and “largest city of Spain” – And worst, Distance between “the largest city of France” and “the largest city of Spain” – No result returned by Google! Actually now shows a link to several versions of these slides! Distance between Madrid spain and Paris france www.mapcrow.info/Distance_between_Madrid_SP_and_Paris_FR.html COORDINATES +. TOTAL DISTANCE. Madrid, SP, -3.6833 40.4000. Paris, FR, 2.3333 48.8667. Miles: 654.57. Kilometers: 1053.40. Bearing: NE. Madrid, SPAIN...
54
http://www.w3.org/DesignIssues/diagrams/sw-stack-2005.png Semantic Web Stack
55
RDF and OWL Resource Description Framework (RDF) Web Ontology Language (OWL)
56
So why not just use XML? No agreement on: – structure is country a: – object? – class? – attribute? – relation? – something else? what does nesting mean? – vocabulary is country the same as nation? 020 020 Netherlands Amsterdam 020 Netherlands Amsterdam 020 ● Are the above XML documents the same? ● Do they convey the same information? ● Is that information machine-accessible?
57
“2 nd aim of Semantic Web”: Data integration – Unstructured and sensors, programs, services semi-structured sources (document collections, message traffic, web pages,...) – Structured data without an explicit data schema (non-local databases, data tables, charts and reports,...) – Non-Text collections (image, video, sound,...) – Streams of data Must specify the structure of data resources..
58
2 nd aim of Semantic Web: Data integration... so a processor can tell how the "attributes" and "values" are related – What is required vs. optional? – How many values for a particular attribute? – What attributes are keys for other attributes? – Which attributes are necessarily related to other attributes and in what way?? – How do the attributes (and values) in one data source map to attributes and values describing another source?
59
Stack of languages XML: – Surface syntax, no semantics XML Schema: – Describes structure of XML documents RDF: – Datamodel for “relations” between “things” RDF Schema (RDFS): – RDF Vocabulary Definition Language OWL: – A more expressive Vocabulary Definition Language
60
Semantic web languages today Today there are three semantic web languages – RDF – Resource Description Framework http://www.w3.org/RDF/ – DAML+OIL – Darpa Agent Markup Language http://www.daml.org/ (deprecated) – OWL – Ontology Web Language http://www.w3.org/2001/sw/ http://www.w3.org/2001/sw/ OWL lit OWL DL OWL Full
61
RDF is the first Semantic Web language XML Encoding Graph stmt(docInst, rdf_type, Document) stmt(personInst, rdf_type, Person) stmt(inroomInst, rdf_type, InRoom) stmt(personInst, holding, docInst) stmt(inroomInst, person, personInst) Triples RDF Data Model Good for Machine Processing Good For Human Viewing Good For Reasoning RDF is a simple language for building graph based representations
62
The RDF Data Model An RDF document is an unordered collection of statements, each with a subject, predicate and object (aka triples) A triple can be thought of as a labelled arc in a graph Statements describe properties of web resources A resource is any object that can be pointed to by a URI: – a document, a picture, a paragraph on the Web, … – E.g., http://umbc.edu/~ypeng/F07671.html – a book in the library, a real person (?) – isbn://5031-4444-3333 –…–… Properties themselves are also resources (URIs)
64
RDF without a Schema Object ->Attribute-> Value triples objects are web-resources Value is again an Object: triples can be linked data-model = graph pers05 ISBN... Author-of pers05 ISBN... Author-of MIT ISBN... Publ- by Author-of Publ- by
65
Bluffer’s guide to RDF (2) Every identifier is a URL = world-wide unique naming! Has XML syntax Any statement can be an object graphs can be nested pers05 ISBN... Author-of NYT claims ISBN...
66
What does RDF Schema add? Defines vocabulary for RDF Organizes this vocabulary in a typed hierarchy Class, subClassOf, type Property, subPropertyOf domain, range Person AuthorReader subClassOf Lynda type communicatesTo domain range Frank type communicatesTo
67
Which Semantic Web? Version 1: "Semantic Web as Web of Data" (TBL) recipe: expose databases on the web, use XML, RDF, integrate metadata from: – expressing DB schema semantics in machine interpretable ways enable integration and unexpected re-use
68
Which Semantic Web? Version 2: “Enrichment of the current Web” recipe: Annotate, classify, index metadata from: – automatically producing markup: named-entity recognition, concept extraction, tagging, etc. enable personalization, search, browse,..
69
Which Semantic Web? Version 1: “Semantic Web as Web of Data” Version 2: “Enrichment of the current Web” Different use-cases Different techniques Different users
70
The Evolving Web Web of Knowledge HyperText Markup Language HyperText Transfer Protocol Resource Description Framework eXtensible Markup Language Self-Describing Documents Foundation of the Current Web Proof, Logic and Ontology Languages Shared terms/terminology Machine-Machine communication 1990 2000 2010 Berners-Lee, Hendler; Nature, 2001 DOCUMENTS DATA/PROGRAMS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.