Andy Jenkinson, EBI The DAS Protocol
Summary of Topics Technical overview Principles of communication Pros and cons DAS capabilities
DAS Architecture A client asks for data from many servers HTTP requests identically structured URLs, the same parameters Each server behaves in the same way pre-defined set of behaviours e.g. provide a sequence, provide annotations of a sequence Each server provides different data in the same format DAS-XML
DAS Concepts Reference object usually a sequence e.g. chromosome X or NT_ Annotation information attached to a location within a segment e.g. substitution at residue 326 of BRCA1
DAS Concepts Reference server server that provides core reference object data e.g. GRCh37 sequence data Annotation server server that provides annotations of reference objects Segment part of a reference object e.g. bases 100 to 200 of chromosome X ties together annotation and reference servers
Architectural Overview
DAS Concepts Commands A request for a certain type of data e.g. sequence, features, sources DAS Registry Catalogue of DAS sources Can be queried programmatically Validates adherence to the protocol
The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML Keyword: constrained
The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML
The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML Data transport Standard HTTP Includes compression Some additional headers, e.g. to indicate DAS version
The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML Well-defined query URLs A client can issue a command ^^^^^^^^^^^^^^^^^^^^^^^ ^^^ ^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^ site prefix das source command arguments
The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML XML format server responds with a simple XML document exon
Why DAS? Fast, targeted queries suitable for visual display Based on existing simple tech XML/HTTP/CGI dumb server, clever client - relatively low knowledge barrier for bioinformaticians with data to expose Scalable integrators (client software) get more data for zero cost
Why not DAS? One-dimensional queries query only by sequence position not by developmental stage, tissue type, etc (yet) Constrained generic format clients arent tailored to each data source possible data types are to some extent limited Not semantically rich ontology support optional
Commands: the basics Sequence give me the DNA sequence for a given segment of a reference object e.g. bases 100k – 200k of chromosome 15 Features give me all annotations offered by the data source that are attached to a given segment of the sequence
The sequence command /das/ /sequence? Parameters: segment=ID:start,end (one or more) ID of reference object Example: /das/ /sequence?segment=X:100,200 ;segment=Y:500,600
The sequence command Response: <SEQUENCE id="X start="100 stop="200 version="1.0> cctgagccagcagtggcaacccaatggggtccctttcca... <SEQUENCE id=Y start=500 stop=600 version="1.0> ctggacagcccggaaaatgagctcctcatctctaaccca...
The features command /das/ /features? Parameters: segment=ID:start,end (one or more) type=foo (zero or more) category=bar (zero or more) Example: /das/ /features?segment=X:100,200 ;segment=Y:500,600 ;type=SNP
The features command Response: SNP sequencing
Other Commands Stylesheet hints on how to render different types of feature e.g. exons as blue boxes, SNPs as red triangles /das/ /stylesheet Types lists the types of feature available /das/ /types
The stylesheet command red black...
So far…
Extensions Expanded for non-positional annotations e.g. annotating a gene or protein (rather than sequence) e.g. references, text mining annotations dont have start/end New data types protein 3D structures pairwise/multiple alignments interactions commands: structure/alignment/interaction
Metadata Can make a client that knows how to query a server and parse the response BUT something missing… which data sources are available on a server? which commands does a source support? what kind of reference objects does it know about?
The sources command /das/sources Lists a servers data sources For each source: text description list of capabilities (commands) list of coordinate systems (type of reference object) etc
DAS Registry Can get a list of sources from a server and find out how to query it BUT lots of DAS servers can only use ones we know about need a Yellow Pages (directory) location description capabilities reliability
DAS Registry third component of DAS catalogue of DAS sources Human interface validate, register, search, view statistics Programmatic interface
SOA Registry Find Client Bind Server Publish
Example client behaviour
Links DAS Homepage DAS Specification DAS in Ensembl: Mailing list: