Globally Unique Identifiers: What, why, when, which and what now? Dave Thau University of Kansas

Slides:



Advertisements
Similar presentations
The Corporation for National Research Initiatives The Handle System Persistent, Secure, Reliable Identifier Resolution.
Advertisements

Handle System: DOI Technical Infrastructure Corporation for National Research Initiatives Larry Lannom December 10, 1997.
A vision for the future of taxonomic databases David Eades Illinois Natural History Survey Presented at the Natural History Museum, London, 17 January.
TDWG GUID-2 June 10, 2006Jessie Kennedy/Rob Gales LSID Resolution In SEEK Taxon.
What is a Flora? Peter Hovenkamp. What is not a Flora? Labwork/ecology paper Species selection on non-taxonomic criteria No identification tool Character.
GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
SDD: Structured Descriptive Data Gregor Hagedorn (Germany) Bob Morris (USA) Kevin Thiele (Australia)
ORNL DAAC Experience With Digital Object Identifiers (DOIs) Bruce Wilson, ORNL DAAC Manager for NASA Data Center Managers telecon 22 Feb 2010.
THE SCIENCETHE SEARCHTHE SOLUTION DOIs and the Secondary Publisher; a match made in heaven? Andrea Powell Product Development Director.
Active Directory: Final Solution to Enterprise System Integration
Globally Unique Identifiers and Life Science Identifiers Dave Thau University of Kansas California Academy of Sciences
Using Digital Credentials On The World-Wide Web M. Winslett.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
Plant names: obstacles and solutions
Distributed Computing COEN 317 DC2: Naming, part 1.
Power to the People: The IUB Libraries' Website Digital Asset Management System Doug Ryner, Tadas Paegle, & Julie Hardesty.
Open Name Services names.oclc.org Keith Shafer Office of Research.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi.
Piero Attanasio mEDRA: the European DOI agency The DOI as a tool for interoperability between private and public sector Athens, 14 January.
CrossRef, DOIs and Data: A Perfect Combination Ed Pentz, Executive Director, CrossRef CODATA ’06 Session K4 October 25, 2006.
Use case lessons: Components of the SEEK architecture Robert K. Peet University of North Carolina.
Limitations of spreadsheets. A few scenarios where Excel is not optimal and alternatives.
Indexing the Species Names of the World - for the World Frank Bisby (Species 2000), Michael Ruggiero (ITIS) Per de Place Bjørn (GBIF - ECAT)
Writing a research paper in science/physics education The first episode! Apisit Tongchai.
A WebQuest is an inquiry-oriented activity in which some or all of the information that learners use comes from resources on the Internet. (Dodge, 1995)
Conclusions. LSIDs suck (sadly) “suck”is a technical term.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Versioning, Extensibility & Postel’s Law Noah Mendelsohn Tufts University Web:
System Engineering Area SANA BoF Kick-Off 12 May 2004 Peter Shames NASA/JPL.
Advantages and disadvantages of current reference and digital objects linking models in scientific information space Radovan Vrana, M.Sc. Department of.
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
Distributed Computing COEN 317 DC2: Naming, part 1.
1 GBIF and Ocean Biodiversity, OBI'07 Conference, Oct 2-4, 2007, Dartmouth, Nova Scotia GBIF and Ocean Biodiversity Building the data web with OBIS Éamonn.
DOI & Crossref Arnoud de Kemp Springer-Verlag
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Universal Identifier UNIVERSAL IDENTIFIER Universal network = globally accepted method for identifying each computer. Host identifier = host is identify.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
Project ECLIPSE.  The convergence of media and technology in a global culture is changing the way we learn about the world.
TDWG Life Sciences Identifiers Applicability Statement Ben Richardson Review Manager, LSID Applicability Statement Western Australian Herbarium Department.
June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.
Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
The role of persistent identifiers in tracking taxon changes Andrew C. Jones, Richard J. White, Ewen R. Orme, School of Computer Science, Cardiff University,
LTER, PASTA, and persistent identifiers LTER IMC Water Cooler Series January 2011.
CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
Blogging. Website and blog A website, also written as web site,or simply site, is a set of related web pages typically served from a single web domain.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
CNR – National Research Council, Rome (IT) Central Library ‘G. Marconi’ National Centre for Grey Literature and National ISSN Centre CNR – National Centre.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla.
Where now for the taxon transfer schema and related work: collaboration possibilities? Jessie Kennedy.
GBIF - ECAT  Electronic Catalogue of Names of Known Organisms  Program Officer;  Per de Place Bjørn 
CEG 2400 Fall 2012 Directory Services Active Directory Tree Domain.
Life Science Identifiers Chris Wroe (based on material from myGrid team and IBM Life Sciences)
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
A centre of expertise in digital information management 10 minute practical guide to the JISC Information Environment (for publishers!)
System Engineering Area SANA BoF Kick-Off
Introduction to Persistent Identifiers
International Congress of Entomology, Orlando
Template library tool and Kestrel training
A step-by-step guide to DOI registration
PID‘s ( in theory land ) M. Dreyer.
Tech introduction.
ENDANGERED ANIMALS A RESEARCH PROJECT
Presentation transcript:

Globally Unique Identifiers: What, why, when, which and what now? Dave Thau University of Kansas

WHAT? Why? When? Which? What now? (is a GUID?)

GUIDs in the World

More GUIDs Patent numbers: (laser guided cat exercise) GenBank accession numbers: AP Digital Object Identifier: /3212 Life Science ID: urn:lsid:pdb.org:1AFT:1

Common Features of GUIDs A short name for a complex entity Useful for locating information about the entity Each name identifies only one entity There is some sense of permanence

Differences Between GUIDs Can an item have more than one GUID? –Patents, no –GenBank accession numbers, yes –Web URLs, YES Is issuance of the GUIDs at all decentralized? –Patents, no –ISBNs, yes (publishers get a block) –LSIDs, YES (there’s no central control)

What? WHY? When? Which? What now? (do we want them for taxonomic concepts?)

IDs and the TES … … Plantae Genus

Goals for a GUID Useful internally for systems dealing with data objects (e.g. databases). Useful for communicating between separate, unaffiliated systems which deal with data objects. Integration with other communities Typical GUID goals –Short –Permanent –Unique –Resolvable

Why not use… Existent database IDs (e.g. IPNI, ITIS)? –Most are currently name based –Can’t guarantee permanence Taxon_author_year_publication –Very long! –How do you represent publication? –How do you deal with non-ascii characters? –Just a name is not resolvable

What? Why? WHEN? Which? What now? (should a GUID be assigned?)

What gets a GUID? Taxonomic Concepts Publications Specimens Data Providers? Authors? Journals?

When is a GUID assigned? When a “new” concept is added –How do you define concept for the system? –When is a concept new enough to get a new GUID? –What minor changes are allowable?

Examples of a New Concept A revision adds a new species to a genus –The species is a new concept –So is the genus A revision adds a synonym to a taxon A flora misspells a scientific name

Do These Get New Concepts? Page numbers in the reference are wrong The journal title is misspelled The author is misspelled Solution: Give the data provider the choice and trust!

What? Why? When? WHICH? What now? (kind of GUID?)

Candidates Home Grown Web Service –t3_17555 GRID resource locator –ecogrid://ku.edu/tcs/t3_17555 LSID –urn:lsid:ku.edu:tcs:t3_17555 Handle System –1883/t3_17555

LSID urn:lsid:pdb.org:1AFT:1 Backed by IBM, HP, UCSD, Chiron, U. Manchester, to name a few Uses web services protocols Support for caching, authentication, and metadata about the service Completely decentralized Specification under 2 years old

The Handle System 1883/ t3_17555 Underlies DOIs –Many journals use them (Nature, ACM) –More than 11 million DOIs are in use Mature – specification 10 years old Proprietary central system assigns and resolves prefixes Doesn’t use internet standards

For Our First Prototype Functionally, they’re very similar We’re going with the Handle System –More mature –Better built in authentication methods –Easier to separate handles from issuers –More likely to be accepted by publishers

What? Why? When? Which? WHAT NOW?

Challenges and Questions Specification and Implementation! When is one concept different from another? Can there be more than 1 GUID per concept? What will encourage people to assign and utilize GUIDs?

Topics For Discussion Are GUIDs really necessary? Are there alternative GUID systems? Is the “only 1 GUID per concept” rule necessary? What will encourage people to use and assign GUIDs?