LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005
Any idea? gi: THE LSID
Outline What is an LSID Why do we need LSIDs How does it work What are available from your LSID comrades How is it working in my Grid Questions
LSID: Life Science Identifier Clark T., Martin S., Liefeld T. Globally Distributed Object Identification for Biological Knowledgebases Briefings in Bioinformatics 5.1:59-70, March 1, A URN (Uniform Resource Name) A standard from the OMG LSR group A detailed specification:
URN URI –Uniform Resource Identifiers –Can be further classified as URL & URN URL: –Uniform Resource Locators –identifying a place where a resource may reside –a representation of a primary access mechanism URN –required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable. Tim Berners-Lee. Uniform Resource Identifiers (URI): Generic Syntax
Five part schema A five-part format: urn:lsid:Authority:Namespace:Object_ID[:Revision-ID] For example: urn:lsid:ncbi.nlm.nih.gov:pubmed: refers to a PubMed article urn:lsid:ncbi.nlm.nig.gov:genbank:T48601:2 refers to the second version of an entry in GenBank
Motivation Making your local publications globally available Persistent Open source –Anyone can become an LSID registration agency –No central third-party registration agency is required, and there are no fees to pay Linking with other database sources: NCBI protein/nucleotide DBs, PubMed, UniProt/SwissProt, GO terms ……
How does it work urn:lsid: WSDL script Operation calls http, ftp and soap Returned results Client Metadata Store Metadata Store Data Store Data Store LSID Authority LSID Authority
LSID resources Who are using them –BioMOBY( –Aventis –BioImage( –Haystack, the first Semantic Web browser, based on Eclipse (haystack.lcs.mit.edu)
my Grid An e-Science project for bioinformaticians and biologists A set of middleware services Based on 3 molecular scenarios A successful workflow workbench Taverna Hosting 1,800 bio-services We finished but we will continue
#contains_similar_sequence_to LSIDs in my Grid Motivation –Uniquely and persistently identifying my Grid internal resources –Separating data and metadata –Applying a compatible standard –Integrating with resources in the open world LSIDs and RDF (Resource Description Framework) urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:ncbi.nlm.nih.gov.lsid.biopat hways.org:genbank_gi: report sequence NA_sequence
LSIDs in action Freefluo Enactor Services LSID Assigning Service Store plug-in Metadata plug-in Metadata Store mIR Workflow design User context LSID Metadata Resolver LSID Data Resolver LSID Authority Client application 1. Data sent/ received from services 2. New LSIDs assigned to data 3. Data / Metadata stored 4. Data and metadata retrieved Taverna Workbench
View LSIDs
LSID ≠ URL An LSID is a URN –Identifying a resource by its name, instead of its location –Persistency (theoretically??) –Legacy support Multiple protocols: http, ftp, file systems, soap…
Your responsibility Unique authority id Unique object and revision ids within your namespace Never reassign an LSID Persistently identifying your data
What is not working Security Access control LSID synonyms
Questions? Thank you!