UKOLN is supported by: To name: persistently: ay, there’s the rub Andy Powell, UKOLN, University of Bath DCC Persistent Identifiers.

Slides:



Advertisements
Similar presentations
Agents and the DC Abstract Model Andy Powell UKOLN, University of Bath DC Agents WG Meeting DC-2005, Madrid.
Advertisements

DC Architecture WG meeting Monday Sept 12 Slot 1: Slot 2: Location: Seminar Room 4.1.E01.
A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The.
UKOLN is supported by: (Persistent) Identifiers for Concepts / Terms / Relationships Andy Powell, UKOLN, University of Bath NKOS Special.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
Metadata vocabularies and ontologies Dr. Manjula Patel Technical Research and Development
UKOLN is supported by: An overview of the OpenURL UKOLN/JIBS OpenURL Meeting London, September 2003 Andy Powell, UKOLN, University of Bath
Andy Powell, Eduserv Foundation Feb 2007 The Dublin Core Abstract Model – a packaging standard?
A centre of expertise in digital information management UKOLN is supported by: XML and the DCMI Abstract Model DC Architecture WG Meeting,
What are GUIDs and Why Do We Need Them ??? Steve Baskauf Vanderbilt Dept. of Biological Sciences
CS570 Artificial Intelligence Semantic Web & Ontology 2
An abstract model for DCMI metadata descriptions Andy Powell UKOLN, University of Bath, UK UKOLN is supported.
DNER Architecture Andy Powell UKOLN, University of Bath Web of Science Enhancements Committee, Centre Point 5 March.
An Introduction to MODS: The Metadata Object Description Schema Tech Talk By Daniel Gelaw Alemneh October 17, 2007 October 17, 2007.
Pete Johnston & Andy Powell, Eduserv Foundation 28 June 2006 Update.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
UKOLN is supported by: OAI-ORE a perspective on compound information objects ( Defining Image Access.
A New Computing Paradigm. Overview of Web Services Over 66 percent of respondents to a 2001 InfoWorld magazine poll agreed that "Web services are likely.
URI IS 373—Web Standards Todd Will. CIS Web Standards-URI 2 of 17 What’s in a name? What is a URI/URL/URN? Why are they important? What strategies.
From SHIQ and RDF to OWL: The Making of a Web Ontology Language
A centre of expertise in digital information management UKOLN is supported by: XML Schema for DC Libraries AP DC Libraries WG Meeting,
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Persistent Identifiers Reinhard.
RDF: Concepts and Abstract Syntax W3C Recommendation 10 February Michael Felderer Digital Enterprise.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Piero Attanasio mEDRA: the European DOI agency The DOI as a tool for interoperability between private and public sector Athens, 14 January.
An Introduction to the Resource Description Framework Eric Miller Online Computer Library Center, Inc. Office of Research Dublin, Ohio 元智資工所 系統實驗室 楊錫謦.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
A centre of expertise in digital information management The MEG Metadata Schemas Registry Pete Johnston, Research Officer (Interoperability),
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
DOI Workshop, Luxembourg - 20 May Identifiers in Context Andy Powell UKOLN University of Bath UKOLN.
European Endeavor Users Group Meeting Helsinki, Sept Esa-Pekka Keskitalo, System Analyst Helsinki University Library OpenURL 1.0.
A centre of expertise in digital information management RDN, e-Prints UK and NOF- Digitise: a (very) small sample of UK OAI activity Andy.
The Resource Discovery Network and OAI Andy Powell UKOLN, University of Bath UKOLN is funded by Resource: The Council.
Towards a semantic web Philip Hider. This talk  The Semantic Web vision  Scenarios  Standards  Semantic Web & RDA.
Semantic Web - an introduction By Daniel Wu (danielwujr)
DNER Architecture Andy Powell 6 March 2001 UKOLN, University of Bath UKOLN is funded by Resource: The Council for.
Accessing a national digital library: an architecture for the UK DNER Andy Powell ELAG 2001, Prague 7 June 2001 UKOLN, University of Bath
RDF, XML and interoperability Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
TDWG Life Sciences Identifiers Applicability Statement Ben Richardson Review Manager, LSID Applicability Statement Western Australian Herbarium Department.
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Introduction to Software Architecture.
SCHEMAS Workshop Bath - May 2000 Andy Powell, UKOLN Example tool/registry integration UKOLN is funded by Resource: The Council.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
A centre of expertise in digital information management UKOLN is supported by: Metadata for the People’s Network Discovery Service PNDS.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.
Pete Johnston, Eduserv Foundation 16 April 2007 An Introduction to the DCMI Abstract Model JISC.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.
Internet Literacy Evaluating Web Sites. Objective The Student will be able to evaluate internet web sites for accuracy and reliability The Student will.
UKOLN is supported by: Content packaging and MPEG-21 DID Andy Powell, UKOLN, University of Bath JISC Joint Programmes Meeting, July.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.
1 RDF, XML & interoperability Metadata : a reprise Communities, communication & XML An introduction to RDF RDF, XML and interoperability.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
DC Architecture WG meeting Wednesday Seminar Room: 5205 (2nd Floor)
A centre of expertise in digital information management UKOLN is supported by: What are the Barriers to Web Resource Preservation?
A centre of expertise in digital information management UKOLN is supported by: IEMSR, the Information Environment & Metadata Application.
Linked Data Publishing on the Semantic Web Dr Nicholas Gibbins
A centre of expertise in digital information management 10 minute practical guide to the JISC Information Environment (for publishers!)
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Building the Semantic Web
REST- Representational State Transfer Enn Õunapuu
Internet Literacy Evaluating Web Sites.
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2016)
Presentation transcript:

UKOLN is supported by: To name: persistently: ay, there’s the rub Andy Powell, UKOLN, University of Bath DCC Persistent Identifiers Workshop, University of Glasgow – June a centre of expertise in digital information management

DCC Persistent Identifiers2 Contents beginning middle end chance for discussion note: middle section will focus on technical functional requirements

DCC Persistent Identifiers3 Overall theme the only useful form of identifier is the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme the only useful form of identifier is the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme note use of “is” in the first line… “can be mapped to a URI” is not good enough! digital

DCC Persistent Identifiers4 Introduction PI meetings often try to focus on functional requirements… uniqueness, persistence, resolvability, usability, transportability, simplicity of assignment, applicability to digital and non- digital resources, cost, blah, blah, blah… difficult, because requirements are abstract… not clear how to meet them in practice - difficult to move forward forget abstract functional requirements… let’s get technical

DCC Persistent Identifiers5 Space/time continuum time Internet space my application “Internet space” represents some combination of geographic / network distance and domain / administration / application distance… “time” represents time…

DCC Persistent Identifiers6 Space/time continuum time Internet space applications that are closely related in terms of space or time likely to share understanding about identifiers – often by hardwiring knowledge into code my application other application other application

DCC Persistent Identifiers7 Space/time continuum time Internet space applications that are “distant” are less likely to share understanding about identifiers knowledge locked within domain or lost over time or, worse, both my application other application other application

DCC Persistent Identifiers8 Pushing the boundaries how do we push the boundaries of identifier understanding further out across the space/time continuum? –standards, standards, standards –go with the crowd –stop telling people existing stuff is broken… or at least, we stop pretending that we can do better! –use what already works and is widely deployed –focus on existing technical standards –stop inventing, start doing

DCC Persistent Identifiers9 W3C Web Architecture Global Identifiers - Global naming leads to global network effects. (Principle) Identify with URIs - To benefit from and increase the value of the World Wide Web, agents should provide URIs as identifiers for resources. (Good practice) URIs Identify a Single Resource - Assign distinct URIs to distinct resources. (Constraint) Avoiding URI aliases - A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource. (Good practice) Consistent URI usage - An agent that receives a URI SHOULD refer to the associated resource using the same URI, character-by- character. (Good practice) Reuse URI schemes - A specification SHOULD reuse an existing URI scheme (rather than create a new one) when it provides the desired properties of identifiers and their relation to resources. (Good practice) URI opacity - Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource. (Good practice) Global Identifiers - Global naming leads to global network effects. (Principle) Identify with URIs - To benefit from and increase the value of the World Wide Web, agents should provide URIs as identifiers for resources. (Good practice) URIs Identify a Single Resource - Assign distinct URIs to distinct resources. (Constraint) Avoiding URI aliases - A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource. (Good practice) Consistent URI usage - An agent that receives a URI SHOULD refer to the associated resource using the same URI, character-by- character. (Good practice) Reuse URI schemes - A specification SHOULD reuse an existing URI scheme (rather than create a new one) when it provides the desired properties of identifiers and their relation to resources. (Good practice) URI opacity - Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource. (Good practice)

DCC Persistent Identifiers10 URIs and XML in order for identifiers to work across the space/time continuum we need –global and unambiguous identifiers –global and unambiguous ways of exchanging identifiers between software applications the Uniform Resource Identifier is the only option for the former XML is the “best” option for the latter –and in particular the XML Schema AnyURI datatype “global” means “very widely deployed technology” – e.g. in my mum’s house! “global” means “very widely deployed technology” – e.g. in my mum’s house!

DCC Persistent Identifiers11 Use URIs the only useful form of identifier is the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme the only useful form of identifier is the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme digital

DCC Persistent Identifiers12 URI scheme registration registration of URI schemes is important registration helps to ensure uniqueness without registration the same scheme can be used in ignorance by someone, somewhere else in the space/time continuum registration doesn’t guarantee that every URI with a scheme will be unique – but it helps! without registration there are no guarantees of uniqueness or persistence

DCC Persistent Identifiers13 Use registered URI schemes the only useful form of identifier is the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme the only useful form of identifier is the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme digital

DCC Persistent Identifiers14 Semantic Web the Semantic Web relies on URIs to identify resources resources == stuff (digital/physical/conceptual things) the semantic Web is built on a global, shared body of metadata (RDF) terms in the metadata language are identified using URIs those URIs must be “resolvable”… in order that “reasoning” can be performed

DCC Persistent Identifiers15 Note: dereferencing URIs the Web Architecture talks about “dereferencing” URIs rather than “resolving” them –in many cases “dereferencing” a URI results in obtaining a “representation” of the resource –several representations may be available the Web Architecture says: only ‘http’ URIs offer simple, widely deployed dereferencing mechanism Available representation - A URI owner SHOULD provide representations of the resource it identifies (Good practice)

DCC Persistent Identifiers16 Quick quiz… what kind of identifier is this? – is an ISSN it identifies UKOLN’s Ariadne magazine

DCC Persistent Identifiers17 Quick quiz… what kind of identifier is this? – –info:lccn/n is an ‘info’ URI it identifies a Library of Congress metadata record (an authority file) but I don’t know which

DCC Persistent Identifiers18 Quick quiz… what kind of identifier is this? – –info:lccn/n – /182 is a DOI it is also a Handle it identifies the “DOI Handbook”

DCC Persistent Identifiers19 Quick quiz… what kind of identifier is this? – –info:lccn/n – /182 –79 Ceti is a Flamsteed Designation it identifies a 7 th magnitude star in the constellation of Cetus

DCC Persistent Identifiers20 Quick quiz… what kind of identifier is this? – –info:lccn/n – /182 –79 Ceti – is an ‘http’ URI a.k.a. a URL it is also a PURL it identifies a DCMI metadata term – i.e. a conceptual resource

DCC Persistent Identifiers21 Quick quiz… what kind of identifier is this? – –info:lccn/n – /182 –79 Ceti – only one of these can be understood and dereferenced by every single bit of currently deployed Internet software… Question: why would we want to use anything else? Question: why would we want to use anything else?

DCC Persistent Identifiers22 But… ‘http’ URIs are identifiers, just like any other ‘http’ URIs can identify any resource – digital, physical or conceptual ‘http’ URIs don’t have to break, they just need to be assigned/managed carefully But, ‘http’ URIs are just locators aren’t they? But, ‘http’ URIs can only be used for Web resources, accessed over HTTP, can’t they? But, ‘http’ URIs break every 30 days or something, don’t they?

DCC Persistent Identifiers23 Use ‘http’ URIs the only useful form of identifier is the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme the only useful form of identifier is the URI, the only useful form of URI is one that conforms to a registered scheme, despite appearances to the contrary… …the most useful registered scheme is the ‘http’ URI scheme

DCC Persistent Identifiers24 Case study 1 - LOM XML example from IEEE LOM… typical of many XML / identifier encodings … DOI /182 … … DOI /182 … …

DCC Persistent Identifiers25 … URI … … URI … … Case study 1 - LOM the “catalogue” indicates what kinds of identifier is being used just a string nothing in the XML schema indicates that this is a URI – some applications will be blind

DCC Persistent Identifiers26 Case study 1 - LOM a improved version might be… … … … … … where the XML schema indicates that this is of datatype AnyURI therefore all XML-aware applications will know this is a URI

DCC Persistent Identifiers27 Case study 1 - LOM a improved version might be… … doi: /182 … … doi: /182 … … the URI syntax provides the “catalogue” from the original example all XML-aware applications will know this is a URI, some will know it is a DOI

DCC Persistent Identifiers28 Case study 2 - DOI the DOI “ /182” can be encoded as a URI in several ways: – –doi: /182 –urn:doi: /182 however… –DOI-aware applications have to have knowledge of these encodings hard-coded into them (since the DOI itself is just a string) –nothing in the URI specification indicates that these URIs are equivalent –note that the 2 nd and 3 rd forms are not registered Question: which of these forms is most persistent and why?

DCC Persistent Identifiers29 Case study 3 – ‘info’ URI consider the following ‘info’ URI: –info:lccn/n ‘info’ URIs are explicitly defined to be non-dereferencable therefore, there is no documented way of finding out what this URI identifies there is no documented way of getting a representation of the resource it identifies and there is no documented way of finding out any more about it Question: how is this useful?

DCC Persistent Identifiers30 But, what happens when… …the Internet disappears? who cares! we’ll deal with it we’ll be with the crowd there’ll be a global transition everyone will need to deal with it every software component on the whole Internet will need fixing the people left behind will be the people who invented their own solutions

DCC Persistent Identifiers31 Conclusion the only useful form of identifier is the URI, the only useful form of URI is one that conforms to a registered scheme, the most useful registered scheme is the ‘http’ URI scheme URIs are only really useful if encoding syntaxes explicitly indicate “this is a URI” the only useful form of identifier is the URI, the only useful form of URI is one that conforms to a registered scheme, the most useful registered scheme is the ‘http’ URI scheme URIs are only really useful if encoding syntaxes explicitly indicate “this is a URI” digital

DCC Persistent Identifiers32 Questions and discussion?

DCC Persistent Identifiers33 Discussion What do we need to do to make ‘http’ URIs more persistent? (Are ‘http’ URIs the answer after all?) Do we have functional requirements that aren’t met by ‘http’ URIs? (When and why should we create new URI schemes?)