Cornell CS 502 Identifiers and Types CS 502 – 20020205 Carl Lagoze – Cornell University.

Slides:



Advertisements
Similar presentations
DDI3 Uniform Resource Names: Locating and Providing the Related DDI3 Objects Part of Session: DDI 3 Tools: Possibilities for Implementers IASSIST Conference,
Advertisements

1 CS 502: Computing Methods for Digital Libraries Lecture 2 The Nomadic Computing Experiment Object Models.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Cornell CS502 Web Basics and Protocols CS 502 – Carl Lagoze Acks to McCracken Syracuse Univ.
World Wide Web1 Applications World Wide Web. 2 Introduction What is hypertext model? Use of hypertext in World Wide Web (WWW) – HTML. WWW client-server.
1 The HyperText Transfer Protocol: HTTP Nick Smith Stuart Alley Tara Tjaden.
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
SNMP & MIME Rizwan Rehman, CCS, DU. Basic tasks that fall under this category are: What is Network Management? Fault Management Dealing with problems.
Hypertext Transport Protocol CS Dick Steflik.
Lecture 4 Web browsers, servers and HTTP Boriana Koleva Room: C54
EPICUR Kathrin Schroeder ERPANET-Workshop „Persistent Identifiers“ (17th June 2004) Uniform Resource Names (URN) – Overview Die Deutsche Bibliothek.
Why identifiers? To access resources To cite resources To unambiguously identify a resource –To register it as intellectual property –To record changes.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Persistent Identifiers Reinhard.
Locating objects identified by DDI3 Uniform Resource Names Part of Session: Concurrent B2: Reports and Updates on DDI activities 2nd Annual European DDI.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Web Architecture Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
Data Communications and Computer Networks Chapter 2 CS 3830 Lecture 9
Computer Networking From LANs to WANs: Hardware, Software, and Security Chapter 12 Electronic Mail.
Internet Applications  DNS   TELNET  FTP  Web browsing.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Cornell CS 431 Identifiers and Types CS431 – Architecture of Web Information Systems Carl Lagoze – Cornell University – Feb
Computer Networks. IP Addresses Before we communicate with a computer on the network we have to be able to identify it. Every computer on a network must.
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
Sistem Jaringan dan Komunikasi Data #9. DNS The Internet Directory Service  the Domain Name Service (DNS) provides mapping between host name & IP address.
World Wide Web Hypertext model Use of hypertext in World Wide Web (WWW) WWW client-server model Use of TCP/IP protocols in WWW.
OCLC Online Computer Library Center Erpanet Symposium on Persistent Identifiers PURLs Stuart Weibel Senior Research Scientist June 17, 2004.
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
Web Programming : Building Internet Applications Chris Bates CSE :
European Endeavor Users Group Meeting Helsinki, Sept Esa-Pekka Keskitalo, System Analyst Helsinki University Library OpenURL 1.0.
UNESCO ICTLIP Module 1. Lesson 61 Introduction to Information and Communication Technologies Lesson 6. What is the Internet?
CH1. Hardware: CPU: Ex: compute server (executes processor-intensive applications for clients), Other servers, such as file servers, do some computation.
Web Client-Server Server Client Hypertext link TCP port 80.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
1 Seminar on Service Oriented Architecture Principles of REST.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
CS 3830 Day 9 Introduction 1-1. Announcements r Quiz #2 this Friday r Demo prog1 and prog2 together starting this Wednesday 2: Application Layer 2.
1 WWW. 2 World Wide Web Major application protocol used on the Internet Simple interface Two concepts –Point –Click.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 1 Fundamentals.
Programming for WWW (ICE 1338) Lecture #2 Lecture #2 June 25, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
Interoperability How to Build a Digital Library Ian H. Witten and David Bainbridge.
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
1 Unraveling the Web: How Does it All Work?. 2 Web Enabling Technologies F TCP/IP network (Internet & others) F URLs F HTTP protocol and HTTP Servers.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
INTRODUCTION Dr Mohd Soperi Mohd Zahid Semester /16.
Linked Data Publishing on the Semantic Web Dr Nicholas Gibbins
Tutorial 1 Getting Started with Adobe Dreamweaver CS5.
Transparent Format Migration of Preserved Web Content D. S. H. Rosenthal, T. Lipkis, T. S. Robertson, S. Morabito Lib Magazine, 11(1), 2005
HTTP – An overview.
Hypertext Transfer Protocol
Domain Name System (DNS)
Identifiers and Types CS431 – Architecture of Web Information Systems
Data Communications and Computer Networks Chapter 2 CS 3830 Lecture 9
E-commerce | WWW World Wide Web - Concepts
Hypertext Transport Protocol
Layered Architectures
E-commerce | WWW World Wide Web - Concepts
The Architecture of the World Wide Web
Application layer Lecture 7.
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2016)
HyperText Transfer Protocol
William Stallings Data and Computer Communications
HTTP Hypertext Transfer Protocol
Presentation transcript:

Cornell CS 502 Identifiers and Types CS 502 – Carl Lagoze – Cornell University

Cornell CS 502 Identity Change Persistence Paradox: reality contains things that persist and change over time –Heraclitus and Plato: can you step into the same river twice? –Ship of Theseus: over the years, the Athenians replaced each plank in the original ship of Theseus as it decayed, thereby keeping it in good repair. Eventually, there was not a single plank left of the original ship. So, did the Athenians still have one and the same ship that used to belong to Theseus

Cornell CS 502 Identity Change Persistence

Cornell CS 502 Identifiers Provide a key or handle linking abstract concepts to physical or perceptible entities Provide us with a necessary figment of persistence They are perhaps the one essential and common form of metadata Why bother? –Finding things –Referring to things –Asserting ownership over things

Cornell CS 502 I have lots of identifiers Carl Jay Lagoze, Dad, Hey you (SSN) (Visa Card) FZBMLH (US Airways locator on Jan 31 flight to San Diego)

Cornell CS 502 Identifier Issues Location independence Global uniqueness Persistent across time Human vs. machine generation Machine resolution Administration (centralized vs. decentralized) Intrinsic semantics Type specific

Cornell CS 502 Two common pre-digital identifiers ISBN (International Standard Book Number) –Uniquely identifies every monograph (book) –One ISBN for each format HP & SS hardback HP & SS softcover X –Number is semantically meaningful (components) –International administration (>150 countries) ISSN (International Standard Serial Number) –Uniquely identifies every serial (not issue or volume) –Semantically meaningless –International administration

Cornell CS 502 URI: Universal Resource Identifier Generic syntax for identifiers of resources Defined by RFC 2396RFC 2396 Syntax: :// ? –Scheme Defines semantics of remainder of URI ftp, gopher, http, mailto, news, telnet –Authority Authority governing namespace for remainder of URI Typically Internet-based server –Path Identification of data within scope of authority –Query String of information to be interpreted by authority

Cornell CS 502 Why is RFC 2396 so big? Character encodings Partial and relative URIs

Cornell CS 502 URL: Universal Resource Locator String representation of the location for a resource that is available via the Internet Use URI syntax Scheme has function of defining the access (protocol) method. Used by client to determine the protocol to “speak”. – - open socket to an.org on port 80 and issue a GET for index.html –ftp://an.org/index.html - open socket to an.org on port 21, open ftp session, issue ftp get for index.html….

Cornell CS 502 URL Issues Persistence Location dependence Valid only at the item level –What about works, expressions, manifestations Multiple resolution –“get the one that is cheapest, most reliable, most recent, most appropriate for my hardware, etc.” Non-digital resources? Disconnection from the entity

Cornell CS 502 URC – Uniform Resource Characteristic (Catalog) Failed but interesting effort –Multiple resolution –Describe resource by its characteristics Provide adequate bundled information about a resource (metadata) to create identification block for any given resource (including locations) –Exactly what are the common set of characteristics for describing different types of resources? –Where are these characteristics stored? Robust URLs – Berkeley –Characteristic of document or metadata is computed automatically via fingerprint of its content.

Cornell CS 502 URN – Universal Resource Name “globally unique, persistent names” Independence from location and location methods ::= "urn:" ":" NID : namespace identifier NSS : namespace-specific string examples: urn:ISSN: urn:isbn: urn:doi: /140

Cornell CS 502 Handles: Names for Internet Resources Naming system for location-independent, persistent names The resource named by a Handle can be: A library item A collection of library items A catalog record A computer An address A public key for encryption etc., etc., etc.....

Cornell CS 502 / or hdl: / Examples / ;9 (date-time stamp) cornell.cs/cstr (mnemonic name) loc/a43v-8940cgr(random string) Syntax of Handles

Cornell CS 502 Example of a Handle and its Data Used to Identify Two Locations URL loc.ndlp.amrlp/ Handle Data typeHandle data RAPloc/repository-1r4589

Cornell CS 502 Use of Handles in a Digital Library Repository Handle System Search System User interface

Cornell CS 502 Scalability and Caching Client Caching Server Handle Servers Hash Cache Hash table

Cornell CS 502 Replication for Performance and Reliability Example: the Global Handle System Washington, DC Los Angeles, CA

Cornell CS 502 Global and Local Handle Servers Global Local Handle Servers

Cornell CS 502 Ways to Resolve Handles I. Resolution by Program Any program can resolve Handles by sending standard format messages to the Handle System. A set of procedures, with Java and C versions, is available to link into applications programs. They are known as the Handle Client Library.

Cornell CS 502 Ways to Resolve Handles II. Web Browsers Browsers modified to recognize Handles. This requires installation of a Handle Extension. 1. Whenever the browser expects a URL, it will recognize "hdl:". 2. The Handle is passed to the Handle System, where it is resolved and a data item of type "URL" is returned. Handle Extensions for Netscape and Internet Explorer are available for most versions of Windows.

Cornell CS 502 Ways to Resolve Handles III. Proxies Any Web browser can resolve Handles, even with no extension, via a proxy. For example, the following URL can be used to resolve the Handle loc.ndlp.amrlp/3a16616:

Cornell CS 502 Proxy Resolution WWW browser HTTP server URL to Proxy URL Resource Handle System hdl.handle.net Proxy server

Cornell CS 502 OCLC's Persistent URL (PURL) A PURL is a URL -> Is fully compatible with today's Internet browsers -> Users need no special software Has some of the desirable features of URNs Lacks some desirable features of URNs -> Resolves only to a URL -> Does not support multiple resolution Developed by OCLC Software openly available

Cornell CS 502 PURL Syntax A PURL is a URL. PURL resolvers use standard http redirects to return the actual URL. protocolresolver addressname

Cornell CS 502 PURL Namespaces A PURL provides a local (not-global namespace) is different from

Cornell CS 502 OCLC PURL Resolution WWW browser PURL server HTTP server PURL database PURL URL Resource

Cornell CS 502 Why haven’t URNs caught on? Complexity of systems One size does not fit all - special purpose URN schemes have been successful, e.g., PubMed ID, Astrophysics BibCode No guarantee of persistence – longevity is an organizational not technical issue Requires well-regulated administrative systems Absence of “killing” applications – although reference linking is emerging

Cornell CS 502 Types: Not all data and content is the same Format or Genre –How you sense it –What you can do with it –E.G. – audio, video, map, book Type –What you need to process it –What is its bit layout Compression or encoding

Cornell CS 502 Multipurpose Internet Mail Extensions RFC 822 – define textual format of messages RFC – Extend textual to allow –Character sets other than US-ASCII –Extensible set of non-ASCII types for message bodies –Definition of multi-part mail (attachments)

Cornell CS 502 MIME Types Two part type hierarchy –Top level type text audio video image application multipart –Examples text/plain image/gif application/postscript Extensions are handled by IANA

Cornell CS 502 MIME in HTTP (Content Negotiation) Accept in request-header –Accept: text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/xml text/plain and text/xml are preferred, then text/x-dvi, then text/html Content-Type in response-header –Content-Type: text/html