Life Science Identifiers Chris Wroe (based on material from myGrid team and IBM Life Sciences)

Slides:



Advertisements
Similar presentations
The Web Wizards Guide to Freeware/Shareware Chapter Six Open Source Software.
Advertisements

BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,
TDWG GUID-2 June 10, 2006Jessie Kennedy/Rob Gales LSID Resolution In SEEK Taxon.
Idaho National Engineering and Environmental Laboratory What is a Framework? Web Service? Why do you need them? Wayne Simpson November.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Persistent identifiers – an Overview Juha Hakala The National Library of Finland
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Globally Unique Identifiers and Life Science Identifiers Dave Thau University of Kansas California Academy of Sciences
GGF Summer School 24 th July 2004, Italy Part 3: Integrating Services Life Science Identifiers & Information model. Data and Metadata management – the.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
Supporting Customized Archival Practices Using the Producer-Archive Workflow Network (PAWN) Mike Smorul, Mike McGann, Joseph JaJa.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Configuration Management
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
MIT CSAIL/IBM Watson Research © 2004 IBM Corporation Haystack: Bringing Good Metadata to Life Dennis Quan
A Simple Introduction to Git: a distributed version-control system CS 5010 Program Design Paradigms “Bootcamp” Lesson 0.5 © Mitchell Wand, This.
Sys Prog & Scripting - HW Univ1 Systems Programming & Scripting Lecture 15: PHP Introduction.
This chapter is extracted from Sommerville’s slides. Text book chapter
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Version Control with Subversion. What is Version Control Good For? Maintaining project/file history - so you don’t have to worry about it Managing collaboration.
Module 10: Designing an AD RMS Infrastructure in Windows Server 2008.
Chapter 33 CGI Technology for Dynamic Web Documents There are two alternative forms of retrieving web documents. Instead of retrieving static HTML documents,
Ben Szekely, IBM Cambridge Adtech © 2006 IBM Corporation TDWG GUID WorkshopFebruary 1, 2006 LSID as a Technology Overview, Participation and Related Projects.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Accounting for the Grid Usage Records and a Resource Usage Service.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
LiveCycle Data Services Introduction Part 2. Part 2? This is the second in our series on LiveCycle Data Services. If you missed our first presentation,
Windows 2000 Operating System -- Active Directory Service COSC 516 Yuan YAO 08/29/2000.
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
1 GBIF and Ocean Biodiversity, OBI'07 Conference, Oct 2-4, 2007, Dartmouth, Nova Scotia GBIF and Ocean Biodiversity Building the data web with OBIS Éamonn.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
Web and Tool Integration Architectures Discussion July 8, 2005 Presenters: Doug Marcey, Carey Gire.
M O S A i C S MOSAICS Brussels 5-6 October 2005 © 2005 Belgian Science Policy. I Virginie Storms Belgian Science Policy Office Laboratory for Microbiology,
Document Management Made Simple MaCuDoc.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Jenny Walker JOIN-UP 6 th March Enabling the delivery of localized extended services the OpenURL framework Agenda The delivery of localized extended.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Active Directory. Computers in organizations Computers are linked together for communication and sharing of resources There is always a need to administer.
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
1 CLASS – Simple NOAA Archive Access Portal SNAAP Eric Kihn and Rob Prentice NGDC CLASS Developers Meeting July 14th, 2008 Simple NOAA Archive Access Portal.
Globally Unique Identifiers: What, why, when, which and what now? Dave Thau University of Kansas
September, 2005What IHE Delivers 1 Patient Index and Demographic Implementation Strategies IHE Vendors Workshop 2006 IHE IT Infrastructure Education Rick.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
ACCESSING DATA IN THE NIS USING THE KEPLER WORKFLOW SYSTEM Corinna Gries.
2005 All Hands Meeting Data & Data Integration Working Group Summary.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Working at a Small-to-Medium Business or ISP – Chapter 7
Working at a Small-to-Medium Business or ISP – Chapter 7
SDMX Information Model
Working at a Small-to-Medium Business or ISP – Chapter 7
Versioning and Variant Authoring Requirements
An Introduction to Designing and Executing Workflows with Taverna
Presentation transcript:

Life Science Identifiers Chris Wroe (based on material from myGrid team and IBM Life Sciences)

The Issue Each database on the web has: –Different policies for assigning and maintaining identifiers, dealing with versioning etc. –Different mechanism for retrieving an item given an ID. LSID designed to harmonise the retrieval of data.

What is an LSID? An OMG standard: urn:lsid:AuthorityID:NamespaceID:ObjectID:RevisionID urn:lsid:ncbi.nlm.nig.gov:GenBank:T48601:2 LSID Designator: A mandatory preface that notes that the item being identified is a life science-specific resource Authority Identifier: An Internet domain owned by the organization that assigns an LSID to a resource Namespace Identifier: The name of the resource (e.g., a database) chosen by the assigning organization Object Identifier: The unique name of an item (e.g., a gene name or a publication tracking number) as defined within the context of a given database Revision Identifier: An optional parameter to keep track of different versions of the same item

How is data retrieved? Application LSID client 1. Get me info for: urn:lsid:pdb.org:1AFT PDB pdb.org PDB Data resolver PDB Metadata resolver PDB database 2. Where can I get data and metadata for urn:lsid:pdb.org.1AFT 2. Get me the data and metadata for: urn:lsid:pdb.org:1AFT

Authority Commitments Data returned for a given LSID must always be the same Must always maintain an authority at e.g. pdb.org that can point to data and metadata resolvers.

LSID Components IBM built client and server implementations in Perl, Java, C++ Fairly simple to wrap an existing database as a source of data or metadata Client also simple to use LSID Launchpad adds LSID resolution to Internet Explorer

LSID Launchpad

Use within my Grid Needed an identifier for things such as workflows, experiments, new data results etc Everything identified with LSIDs ! LSID saves us having to invent our own conventions and code. Can pass references to data around and be reassured the other party will know how to resolve that reference

Haystack demonstrator GenBank record Provenance of results Managing collection of sequences for review

Caveats To be successful across the community, it will require widespread adoption by data providers such as Genbank, UniProt etc Status of such adoption unclear